Skip to content

feat: add tenacity retry in openserach#10917

Merged
edwinjosechittilappilly merged 5 commits intomainfrom
fix-open-search-retry
Dec 10, 2025
Merged

feat: add tenacity retry in openserach#10917
edwinjosechittilappilly merged 5 commits intomainfrom
fix-open-search-retry

Conversation

@edwinjosechittilappilly
Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly commented Dec 5, 2025

This pull request refactors the embedding generation logic in the _add_documents_to_vector_store method of opensearch_multimodal.py to improve reliability and efficiency, especially when handling rate limits and different embedding model providers. The main changes include switching to the tenacity library for robust, rate-limit-aware retries and optimizing concurrency based on the embedding model type.

Embedding and Retry Logic Improvements:

  • Replaced manual retry and threading logic with tenacity-based decorators, providing separate retry strategies for rate limit errors (longer backoff, more attempts) and other retryable errors (shorter backoff, fewer attempts).
  • Added explicit error handling and logging for failed embedding attempts, ensuring that failures are clearly reported after all retries.

Concurrency and Model-Specific Handling:

  • Implemented sequential embedding with inter-request delays for IBM/Watsonx models to avoid exceeding rate limits, and parallel embedding for other models using a thread pool.
  • Dynamically determined concurrency settings (max_workers) based on the number of text chunks and model type, improving performance while respecting provider constraints.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Embedding generation now features automatic rate-limit detection and recovery with intelligent exponential backoff to maintain reliability during high-demand periods
    • Implemented distinct error-handling strategies tailored for rate-limit scenarios versus other transient failures, each with optimized recovery and retry behavior
    • Added comprehensive monitoring and logging for embedding operations, providing visibility into failure recovery and automatic retry actions

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Refactored embedding generation in OpenSearch multimodal component to replace concurrent ThreadPoolExecutor with a rate-limit-aware, multi-tier retry mechanism using tenacity. Introduces sequential processing for IBM/watsonx models with inter-request delays and parallel processing for others, with distinct retry policies for rate-limit versus generic errors. Existing ingestion, indexing, and mapping logic remains unchanged.

Changes

Cohort / File(s) Summary
Embedding retry mechanism
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py
Replaced concurrent ThreadPoolExecutor embedding with tenacity-wrapped retry logic; added embed_chunk_with_retry function; implements rate-limit errors (5 attempts, exponential backoff to 30s) and generic retryable errors (3 attempts, shorter backoff); enforces sequential processing for IBM/watsonx models with inter-request delays; enhanced logging for retry events; preserves dimension calculation, vector field naming, mapping creation, and bulk ingestion flow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify tenacity retry configuration (exponential backoff curves, attempt thresholds) and correctness for both rate-limit and generic error paths
  • Confirm sequential vs. parallel execution logic for model-specific handling (IBM/watsonx)
  • Validate error handling coverage and logging adequacy for failure scenarios
  • Assess impact on embedding generation performance and latency from sequential processing

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • phact
  • lucaseduoli
  • erichare

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 warning, 3 inconclusive)
Check name Status Explanation Resolution
Test File Naming And Structure ⚠️ Warning PR introduces retry logic, rate-limit handling, and concurrency patterns but existing tests lack coverage for these critical new features. Add six comprehensive unit tests covering rate-limit retry logic, exponential backoff, sequential/parallel concurrency patterns, failure scenarios, and retry recovery.
Test Coverage For New Implementations ❓ Inconclusive Cannot locate PR changes in current repository state despite comprehensive search. Access to correct branch/commit containing opensearch_multimodal.py changes and associated test files is required. Verify correct repository branch/commit is checked out and that files mentioned in PR summary are accessible for test coverage assessment.
Test Quality And Coverage ❓ Inconclusive Repository files could not be located to assess test coverage for the tenacity retry mechanism changes in opensearch_multimodal.py. Provide access to the modified opensearch_multimodal.py file and corresponding test files to verify test coverage and quality.
Excessive Mock Usage Warning ❓ Inconclusive Test files related to opensearch_multimodal.py changes could not be located in the repository to assess mock usage patterns. Provide access to test files for the opensearch_multimodal.py component to evaluate whether mocks are used excessively or appropriately for testing embedding and retry logic.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding tenacity-based retry mechanisms to opensearch embedding generation.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 5, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 5, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the embedding generation logic in the OpenSearch multimodal component to improve reliability and handle rate limits more effectively. The changes replace manual retry/threading logic with the tenacity library for robust retry behavior and implement model-specific concurrency strategies.

Key Changes:

  • Introduced tenacity-based retry decorators with separate strategies for rate limit errors (5 attempts, exponential backoff 2-30s) and other errors (3 attempts, exponential backoff 1-8s)
  • Implemented sequential embedding with 0.6s delays for IBM/Watsonx models and parallel embedding for other models
  • Enhanced error logging with detailed retry information and failure tracking

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


if is_ibm:
# Sequential processing with inter-request delay for IBM models
inter_request_delay = 0.6 # ~1.67 req/s, safely under 2 req/s limit
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions a '2 req/s limit' for IBM models, but this constraint is not documented elsewhere in the code or PR description. Consider adding a reference to the IBM/Watsonx API documentation or rate limit specification to help future maintainers understand the basis for this value.

Suggested change
inter_request_delay = 0.6 # ~1.67 req/s, safely under 2 req/s limit
inter_request_delay = 0.6 # ~1.67 req/s, safely under 2 req/s limit (see IBM/Watsonx rate limits: https://cloud.ibm.com/docs/watsonx?topic=watsonx-llm-api-reference#rate-limits)

Copilot uses AI. Check for mistakes.
Comment on lines +882 to +886
"""Check if exception is retryable but not a rate limit error."""
# Retry on most exceptions except for specific non-retryable ones
# Add other non-retryable exceptions here if needed
return not is_rate_limit_error(exception)

Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is_other_retryable_error returns True for all non-rate-limit exceptions, including those that should not be retried (e.g., authentication errors, validation errors, or permanent failures). This could lead to unnecessary retry attempts on non-recoverable errors. Consider explicitly checking for retryable error types or patterns, such as timeout errors or temporary service unavailability (5xx status codes), and returning False for known non-retryable errors.

Suggested change
"""Check if exception is retryable but not a rate limit error."""
# Retry on most exceptions except for specific non-retryable ones
# Add other non-retryable exceptions here if needed
return not is_rate_limit_error(exception)
"""Check if exception is retryable but not a rate limit error.
Retry only on transient errors (timeouts, connection errors, 5xx except 429).
Do not retry on authentication, validation, or other permanent errors (4xx except 429).
"""
# If it's a rate limit error, handled separately
if is_rate_limit_error(exception):
return False
# Check for OpenSearch RequestError with 4xx status codes (except 429)
if isinstance(exception, RequestError):
status_code = getattr(exception, "status_code", None)
if status_code is not None:
# 400, 401, 403, 404, etc. are not retryable
if status_code in {400, 401, 403, 404, 422}:
return False
# 429 is handled above, 5xx are retryable
if 500 <= status_code < 600:
return True
# If status_code is not set, fall back to message
error_str = str(exception).lower()
if any(code in error_str for code in ["400", "401", "403", "404", "422"]):
return False
if any(code in error_str for code in ["500", "502", "503", "504"]):
return True
# Check for common transient error types
if isinstance(exception, (TimeoutError, ConnectionError)):
return True
# Check for 5xx in exception message
error_str = str(exception).lower()
if any(code in error_str for code in ["500", "502", "503", "504"]):
return True
# Authentication, permission, validation errors (not retryable)
if any(term in error_str for term in ["authentication", "unauthorized", "forbidden", "permission", "invalid", "validation"]):
return False
# Default: do not retry
return False

Copilot uses AI. Check for mistakes.

# For IBM models, use sequential processing with rate limiting
# For other models, use parallel processing
vectors: list[list[float]] = [None] * len(texts)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectors list is initialized with None values, but the code doesn't validate that all None values are replaced with actual embeddings before proceeding. If any chunk fails to embed (despite retries) and raises an exception that's caught elsewhere, the resulting list could contain None values. Consider adding validation after the embedding loop to ensure all elements are populated, or handle the case where vectors[idx] might remain None.

Copilot uses AI. Check for mistakes.
)

def is_rate_limit_error(exception: Exception) -> bool:
"""Check if exception is a rate limit error (429)."""
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string-based error detection is fragile and may miss rate limit errors that use different formatting or phrasing. Consider checking for specific exception types (e.g., HTTPError with status code 429) or exception attributes instead of relying solely on string matching. This would make the error detection more reliable and maintainable.

Suggested change
"""Check if exception is a rate limit error (429)."""
"""Check if exception is a rate limit error (HTTP 429)."""
# Check for OpenSearch RequestError with status_code 429
if isinstance(exception, RequestError):
# Some RequestError instances have a status_code attribute
status_code = getattr(exception, "status_code", None)
if status_code == 429:
return True
# Fallback: string matching for other cases

Copilot uses AI. Check for mistakes.
Comment on lines 866 to +874
self.log(metadatas)

# Generate embeddings (threaded for concurrency) with retries
def embed_chunk(chunk_text: str) -> list[float]:
return selected_embedding.embed_documents([chunk_text])[0]

vectors: list[list[float]] | None = None
last_exception: Exception | None = None
delay = 1.0
attempts = 0
max_attempts = 3

while attempts < max_attempts:
attempts += 1
# Generate embeddings with rate-limit-aware retry logic using tenacity
from tenacity import (
retry,
retry_if_exception,
stop_after_attempt,
wait_exponential,
)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tenacity import is placed within the method body rather than at the module level. This violates Python's PEP 8 style guide, which recommends placing imports at the top of the file. Move this import to the module-level imports section to improve code organization and reduce import overhead on repeated method calls.

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 5, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py (3)

868-910: Tighten tenacity retry predicates and verify API assumptions

The retry predicates are very broad right now:

  • is_other_retryable_error returns True for any exception that isn’t detected as a rate‑limit, so you’ll retry on ValueError, TypeError, etc., which are usually permanent/logic errors rather than transient issues. That means 3 unnecessary attempts on many hard failures.
  • The predicates also implicitly include non-application exceptions if they bubble up (e.g., KeyboardInterrupt/SystemExit), which you typically don’t want to retry.

Consider constraining retries to known transient classes (network/HTTP/timeouts from the embedding provider) – for example, via retry_if_exception_type or a predicate that checks isinstance(e, (ConnectionError, TimeoutError, OpenAIError, ...)) and excludes obvious programming errors. You can still keep a separate predicate for 429s.

Also, before_sleep assumes retry_state.next_action.sleep and retry_state.outcome.exception() are always present/valid for your tenacity version. Please double‑check these attributes in the tenacity version used in this project and adjust or guard against None if needed.

Finally, you may want to import tenacity at module level rather than inside _add_documents_to_vector_store to avoid repeated local imports and to make dependency usage more discoverable.


911-927: Nested retry decorators work but are non‑obvious; consider clarifying or simplifying

Stacking two @retry decorators like:

@retry_on_rate_limit
@retry_on_other_errors
def _embed(...):
    ...

does achieve the intended behavior (rate‑limit errors handled by the outer policy, all other exceptions by the inner one), but it’s subtle and non‑obvious to future readers.

Two lightweight options:

  • Add a brief comment above _embed explaining how the two retry layers interact (outer handles 429/rate‑limit with 5 attempts and long backoff; inner handles non‑429 with 3 attempts and short backoff).
  • Or wrap this into a small helper (e.g., _embed_with_retry(text: str)) with a single try/except that routes exceptions through the appropriate tenacity Retrying instance, making the control flow clearer.

The current implementation is logically sound; this is mostly about maintainability and reducing cognitive load for the next person reading this.


936-960: Concurrency block is reasonable; add minor robustness for vectors/max_workers

The concurrency logic for IBM vs non‑IBM models looks good overall, but there are a couple of small robustness nits:

  • vectors: list[list[float]] = [None] * len(texts) conflicts with the type hint (it’s actually list[None | list[float]] until filled). If you run static typing, this will be flagged; you could initialize with a more accurate type (e.g., vectors: list[list[float] | None] = [None] * len(texts) and narrow later) or build vectors by appending in order.
  • max_workers = min(max(len(texts), 1), 8) works, but reads a bit oddly. Something like max_workers = max(1, min(len(texts), 8)) is equivalent and clearer about the invariant 1 <= max_workers <= 8.
  • Given you already short‑circuit on if not docs: return earlier, texts should never be empty here; adding a quick if not texts: return just before initializing vectors would make this block safer against any future refactor that might desync docs and texts.

These are minor readability/defensiveness tweaks; the core concurrency behavior looks fine.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a053bd and 744011a.

📒 Files selected for processing (1)
  • src/lfx/src/lfx/components/elastic/opensearch_multimodal.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py (3)
src/backend/tests/unit/components/embeddings/test_embeddings_with_models.py (2)
  • embed_documents (25-27)
  • embed_documents (240-241)
src/lfx/src/lfx/base/embeddings/embeddings_class.py (1)
  • embed_documents (36-45)
src/backend/tests/unit/components/vectorstores/test_opensearch_multimodal.py (1)
  • embed_documents (34-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: CodeQL analysis (python)
  • GitHub Check: Agent
  • GitHub Check: Update Starter Projects
  • GitHub Check: Update Component Index

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 15%
15.45% (4253/27512) 8.63% (1815/21019) 9.74% (591/6064)

Unit Test Results

Tests Skipped Failures Errors Time
1671 0 💤 0 ❌ 0 🔥 21.067s ⏱️

@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 32.55%. Comparing base (3681cb2) to head (c8db2de).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #10917      +/-   ##
==========================================
- Coverage   32.56%   32.55%   -0.01%     
==========================================
  Files        1371     1371              
  Lines       63493    63542      +49     
  Branches     9383     9397      +14     
==========================================
+ Hits        20675    20686      +11     
- Misses      41778    41816      +38     
  Partials     1040     1040              
Flag Coverage Δ
backend 51.59% <ø> (+0.06%) ⬆️
frontend 14.31% <ø> (ø)
lfx 39.95% <ø> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions bot added the lgtm This PR has been approved by a maintainer label Dec 6, 2025
@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Dec 10, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 10, 2025
Merged via the queue into main with commit 840ec0f Dec 10, 2025
22 of 24 checks passed
@edwinjosechittilappilly edwinjosechittilappilly deleted the fix-open-search-retry branch December 10, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants