chore: merge branch release-1.8.1 into main#12185
Conversation
… (#11975) * feat: add runtime port validation for Kubernetes service discovery * test: add unit tests for runtime port validation in Settings * fix: improve runtime port validation to handle exceptions and edge cases Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@logspace.ai>
* feat: add documentation link to Guardrails component * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* feat: traces v0 v0 for traces includes: - filters: status, token usage range and datatime - accordian rows per trace Could add: - more filter options. Ecamples: session_id, trace_id and latency range * fix: token range * feat: create sidebar buttons for logs and trace add sidebar buttons for logs and trace remove lods canvas control * fix: fix duplicate trace ID insertion hopefully fix duplicate trace ID insertion on windows * fix: update tests and alembic tables for uts update tests and alembic tables for uts * chore: add session_id * chore: allo grouping by session_id and flow_id * chore: update race input output * chore: change run name to flow_name - flow_id was flow_name - trace_id now flow_name - flow_id * facelift * clean up and add testcases * clean up and add testcases * merge Alembic detected multiple heads * [autofix.ci] apply automated fixes * improve testcases * remodel files * chore: address gabriel simple changes address gabriel simple changes in traces.py and native.py * clean up and testcases * chore: address OTel and PG status comments #11689 (comment) #11689 (comment) * chore: OTel span naming convention model name is now set using name = f"{operation} {model_name}" if model_name else operation * add traces * feat: use uv sources for CPU-only PyTorch (#11884) * feat: use uv sources for CPU-only PyTorch Configure [tool.uv.sources] with pytorch-cpu index to avoid ~6GB CUDA dependencies in Docker images. This replaces hardcoded wheel URLs with a cleaner index-based approach. - Add pytorch-cpu index with explicit = true - Add torch/torchvision to [tool.uv.sources] - Add explicit torch/torchvision deps to trigger source override - Regenerate lockfile without nvidia/cuda/triton packages - Add required-environments for multi-platform support * fix: update regex to only replace name in [project] section The previous regex matched all lines starting with `name = "..."`, which incorrectly renamed the UV index `pytorch-cpu` to `langflow-nightly` during nightly builds. This caused `uv lock` to fail with: "Package torch references an undeclared index: pytorch-cpu" The new regex specifically targets the name field within the [project] section only, avoiding unintended replacements in other sections like [[tool.uv.index]]. * style: fix ruff quote style * fix: remove required-environments to fix Python 3.13 macOS x86_64 CI The required-environments setting was causing hard failures when packages like torch didn't have wheels for specific platform/Python combinations. Without this setting, uv resolves optimistically and handles missing wheels gracefully at runtime instead of failing during resolution. --------- * LE-270: Hydration and Console Log error (#11628) * LE-270: add fix hydration issues * LE-270: fix disable field on max token on language model --------- * test: add wait for selector in mcp server tests (#11883) * Add wait for selector in mcp server tests * [autofix.ci] apply automated fixes * Add more awit for selectors * [autofix.ci] apply automated fixes --------- * fix: reduce visual lag in frontend (#11686) * Reduce lag in frontend by batching react events and reducing minimval visual build time * Cleanup * [autofix.ci] apply automated fixes * add tests and improve code read * [autofix.ci] apply automated fixes * Remove debug log --------- * feat: lazy load imports for language model component (#11737) * Lazy load imports for language model component Ensures that only the necessary dependencies are required. For example, if OpenAI provider is used, it will now only import langchain_openai, rather than requiring langchain_anthropic, langchain_ibm, etc. * Add backwards-compat functions * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Add exception handling * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * comp index * docs: azure default temperature (#11829) * change-azure-openai-default-temperature-to-1.0 * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) * [autofix.ci] apply automated fixes --------- * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * fix unit test? * add no-group dev to docker builds * [autofix.ci] apply automated fixes --------- * feat: generate requirements.txt from dependencies (#11810) * Base script to generate requirements Dymanically picks dependency for LanguageM Comp. Requires separate change to remove eager loading. * Lazy load imports for language model component Ensures that only the necessary dependencies are required. For example, if OpenAI provider is used, it will now only import langchain_openai, rather than requiring langchain_anthropic, langchain_ibm, etc. * Add backwards-compat functions * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Add exception handling * Add CLI command to create reqs * correctly exclude langchain imports * Add versions to reqs * dynamically resolve provider imports for language model comp * Lazy load imports for reqs, some ruff fixes * Add dynamic resolves for embedding model comp * Add install hints * Add missing provider tests; add warnings in reqs script * Add a few warnings and fix install hint * update comments add logging * Package hints, warnings, comments, tests * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) * Add alias for watsonx * Fix anthropic for basic prompt, azure mapping * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * ruff * [autofix.ci] apply automated fixes * test formatting * ruff * [autofix.ci] apply automated fixes --------- * fix: add handle to file input to be able to receive text (#11825) * changed base file and file components to support muitiple files and files from messages * update component index * update input file component to clear value and show placeholder * updated starter projects * [autofix.ci] apply automated fixes * updated base file, file and video file to share robust file verification method * updated component index * updated templates * fix whitespaces * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * add file upload test for files fed through the handle * [autofix.ci] apply automated fixes * added tests and fixed things pointed out by revies * update component index * fixed test * ruff fixes * Update component_index.json * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) * updated component index * updated component index * removed handle from file input * Added functionality to use multiple files on the File Path, and to allow files on the langflow file system. * [autofix.ci] apply automated fixes * fixed lfx test * build component index --------- * docs: Add AGENTS.md development guide (#11922) * add AGENTS.md rule to project * change to agents-example * remove agents.md * add example description * chore: address cris I1 comment address cris I1 comment * chore: address cris I5 address cris I5 * chore: address cris I6 address cris I6 * chore: address cris R7 address cris R7 * fix testcase * chore: address cris R2 address cris R2 * restructure insight page into sidenav * added header and total run node * restructing branch * chore: address gab otel model changes address gab otel model changes will need no migration tables * chore: update alembic migration tables update alembic migration tables after model changes * add empty state for gropu sessions * remove invalid mock * test: update and add backend tests update and add backend tests * chore: address backend code rabbit comments address backend code rabbit comments * chore: address code rabbit frontend comments address code rabbit frontend comments * chore: test_native_tracer minor fix address c1 test_native_tracer minor fix address c1 * chore: address C2 + C3 address C2 + C3 * chore: address H1-H5 address H1-H5 * test: update test_native_tracer update test_native_tracer * fixes * chore: address M2 address m2 * chore: address M1 address M1 * dry changes, factorization * chore: fix 422 spam and clean comments fix 422 spam and clean comments * chore: address M12 address M12 * chore: address M3 address M3 * chore: address M4 address M4 * chore: address M5 address M5 * chore: clean up for M7, M9, M11 clean up for M7, M9, M11 * chore: address L2,L4,L5,L6 + any test address L2,L4,L5 and L6 + any test * chore: alembic + comment clean up alembic + comment clean up * chore: remove depricated test_traces file remove depricated test_traces file. test have all been moved to test_traces_api.py * fix datetime * chore: fix test_trace_api ge=0 is allowed now fix test_trace_api ge=0 is allowed now * chore: remove unused traces cost flow remove unused traces cost flow * fix traces test * fix traces test * fix traces test * fix traces test * fix traces test * chore: address gabriels otel coment address gabriels otel coment latest --------- Co-authored-by: Olayinka Adelakun <olayinkaadelakun@Olayinkas-MacBook-Pro.local> Co-authored-by: Olayinka Adelakun <olayinkaadelakun@mac.war.can.ibm.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Ram Gopal Srikar Katakam <44802869+RamGopalSrikar@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: olayinkaadelakun <olayinka.adelakun@ibm.com> Co-authored-by: Jordan Frazier <122494242+jordanrfrazier@users.noreply.github.com> Co-authored-by: cristhianzl <cristhian.lousa@gmail.com> Co-authored-by: Hamza Rashid <74062092+HzaRashid@users.noreply.github.com> Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Co-authored-by: Lucas Oliveira <62335616+lucaseduoli@users.noreply.github.com> Co-authored-by: Edwin Jose <edwin.jose@datastax.com> Co-authored-by: Himavarsha <40851462+HimavarshaVS@users.noreply.github.com>
#11982) fix(test): Fix superuser timeout test errors by replacing heavy client fixture (#11972) * fix super user timeout test error * fix fixture db test * remove canary test * [autofix.ci] apply automated fixes * flaky test --------- Co-authored-by: Cristhian Zanforlin Lousa <cristhian.lousa@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
…ics module (#11974) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
…2002) * fix: add ondelete=CASCADE to TraceBase.flow_id to match migration The migration file creates the trace table's flow_id foreign key with ondelete="CASCADE", but the model was missing this parameter. This mismatch caused the migration validator to block startup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: add defensive migration to ensure trace.flow_id has CASCADE Adds a migration that ensures the trace.flow_id foreign key has ondelete=CASCADE. While the original migration already creates it with CASCADE, this provides a safety net for any databases that may have gotten into an inconsistent state. * fix: dynamically find FK constraint name in migration The original migration did not name the FK constraint, so it gets an auto-generated name that varies by database. This fix queries the database to find the actual constraint name before dropping it. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…mprove button functionality (#12000) * fix: Update ButtonSendWrapper to handle building state and improve button functionality * fix(frontend): rename stop button title to avoid Playwright selector conflict The "Stop building" title caused getByRole('button', { name: 'Stop' }) to match two elements, breaking Playwright tests in shards 19, 20, 22, 25. Renamed to "Cancel" to avoid the collision with the no-input stop button.
pydantic fail because output is list, instead of a dict Co-authored-by: Olayinka Adelakun <olayinkaadelakun@Olayinkas-MacBook-Pro.local>
* Update guardrails.py Changing the heuristic threshold icons. The field was using the default icons. I added icons related to the security theme. * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Viktor Avelino <64113566+viktoravelino@users.noreply.github.com>
…#12028) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
fix reset button Co-authored-by: Olayinka Adelakun <olayinkaadelakun@Olayinkas-MacBook-Pro.local>
* fix: Handle message inputs when ingesting knowledge * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) * Update test_ingestion.py * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
…on (#11985) * fix(ui): add error handling for invalid JSON uploads via upload button * feat(frontend): added new test for file upload * feat(frontend): added new test for file upload
* fix: LM span is now properly parent of ChatOpenAI Before LM span and ChatOpenAI span where both considered parents so they where being counted twice in token counts and other sumations Now LM span is properly the parent of ChatOpenAI span so they are not accidently counted twice * chore: clean up comments clean up comments * chore: incase -> incase incase -> incase
* fix: LM span is now properly parent of ChatOpenAI Before LM span and ChatOpenAI span where both considered parents so they where being counted twice in token counts and other sumations Now LM span is properly the parent of ChatOpenAI span so they are not accidently counted twice * chore: clean up comments clean up comments * chore: incase -> incase incase -> incase * design fix * fix testcases * fix header * fix testcase --------- Co-authored-by: Adam Aghili <Adam.Aghili@ibm.com> Co-authored-by: Olayinka Adelakun <olayinkaadelakun@Olayinkas-MacBook-Pro.local> Co-authored-by: Olayinka Adelakun <olayinkaadelakun@mac.war.can.ibm.com>
* fix: update layout and variant for file previews in chat messages * fix: update background color to 'bg-muted' in chat header and input wrapper components * refactor(CanvasControls): remove unused inspection panel logic and clean up code * fix: remove 'bg-muted' class from chat header and add 'bg-primary-foreground' to chat sidebar * fix: add Escape key functionality to close sidebar
#12040) fix: playground does not scroll down to the latest user message upon sending (Regression) (#12006) * fixes scroll is on input message * feat: re-engage Safari sticky scroll mode when user sends message Add custom event 'langflow-scroll-to-bottom' to force SafariScrollFix back into sticky mode when user sends a new message. This ensures the chat scrolls to bottom even if user had scrolled up, fixing behavior where Safari's scroll fix would remain disengaged after manual scrolling. Co-authored-by: Deon Sanchez <69873175+deon-sanchez@users.noreply.github.com>
#12039) fix: knowledge Base Table — Row Icon Appears Clipped/Cut for Some Entries (#12009) * removed book and added file. makes more sense * feat: add accent-blue color to design system and update knowledge base file icon - Add accent-blue color variables to light and dark themes in CSS - Register accent-blue in Tailwind config with DEFAULT and foreground variants - Update knowledge base file icon fallback color from hardcoded text-blue-500 to text-accent-blue-foreground Co-authored-by: Deon Sanchez <69873175+deon-sanchez@users.noreply.github.com>
* fixes to the mcp modal for style * style: convert double quotes to single quotes in baseModal component * style: convert double quotes to single quotes in addMcpServerModal component Co-authored-by: Deon Sanchez <69873175+deon-sanchez@users.noreply.github.com>
* fix: change loop description (#12018) * docs: simplify Loop component description in starter project and component index * [autofix.ci] apply automated fixes * style: format Loop component description to comply with line length limits * fixed component index * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * [autofix.ci] apply automated fixes --------- Co-authored-by: Deon Sanchez <69873175+deon-sanchez@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
…#12036) * feat: add mutual exclusivity between ChatInput and Webhook components * [autofix.ci] apply automated fixes * refactor: address PR feedback - add comprehensive tests and constants * [autofix.ci] apply automated fixes * refactor: address PR feedback - add comprehensive tests and constants * [autofix.ci] apply automated fixes --------- Co-authored-by: Janardan S Kavia <janardanskavia@Janardans-MacBook-Pro.local> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* Only process dict template fields In json_schema_from_flow, guard access to template field properties by checking isinstance(field_data, dict) before calling .get(). This replaces the previous comparison to the string "Component" and prevents attribute errors when template entries are non-dict values, ensuring only dict-type fields with show=True and not advanced are included in the generated schema. * Check and handle MCP server URL changes When skipping creation of an existing MCP server for a user's starter projects, first compute the expected project URL and compare it to URLs found in the existing config args. If the URL matches, keep skipping and log that the server is correctly configured; if the URL differs (e.g., port changed on restart), log the difference and allow the flow to update the server configuration. Adds URL extraction and improved debug messages to support automatic updates when server endpoints change. --------- Co-authored-by: Ram Gopal Srikar Katakam <44802869+RamGopalSrikar@users.noreply.github.com>
…2044) Langflow breaks when we click on the last level of the chain. Co-authored-by: Olayinka Adelakun <olayinkaadelakun@mac.war.can.ibm.com>
…8.0 (#12052) * fix: improve knowledge base UI consistency and pagination handling - Change quote style from double to single quotes throughout knowledge base components - Update "Hide Sources" button label to "Hide Configuration" for clarity - Restructure SourceChunksPage layout to use xl:container for consistent spacing - Add controlled page input state with validation on blur and Enter key - Synchronize page input field with pagination controls to prevent state drift - Reset page input to "1" when changing page * refactor: extract page input commit logic into reusable function Extract page input validation and commit logic from handlePageInputBlur and handlePageInputKeyDown into a shared commitPageInput function to eliminate code duplication.
…12043) * fix(ui): ensure session deletion properly clears backend and cache * fix: resolved PR comments and add new regression test * fix: resolved PR comments and add new regression test * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR bumps versions to 1.8.1, adds Changes
Sequence DiagramsequenceDiagram
participant Client as API Client
participant KBEndpoint as KB Endpoint
participant KBHelper as KBStorageHelper
participant FileSystem as File System
participant Chroma as Chroma Resources
participant SQLite as SQLite Locks
Client->>KBEndpoint: DELETE /knowledge_bases/{kb_name}
KBEndpoint->>KBHelper: delete_storage(kb_path, kb_name)
activate KBHelper
KBHelper->>Chroma: release_chroma_resources(kb_path)
activate Chroma
Chroma->>Chroma: Clear registry entries
Chroma->>Chroma: Trigger garbage collection
deactivate Chroma
KBHelper->>SQLite: _remove_sqlite_lock_files(kb_path)
activate SQLite
SQLite->>FileSystem: Remove .wal/.shm/.journal files
deactivate SQLite
loop Retry with exponential backoff (MAX_DELETE_RETRIES)
KBHelper->>FileSystem: Remove directory
alt Success
FileSystem-->>KBHelper: Directory removed
KBHelper->>KBHelper: Return True
else File-in-use error
FileSystem-->>KBHelper: Error (transient)
KBHelper->>KBHelper: Wait, retry
else Persistent failure
KBHelper->>FileSystem: Rename directory for deferred cleanup
FileSystem-->>KBHelper: Renamed
KBHelper->>KBHelper: Return True/False
end
end
deactivate KBHelper
KBHelper-->>KBEndpoint: Success/Failure status
KBEndpoint-->>Client: 200/500 Response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 3 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is ❌ Your project status has failed because the head coverage (44.41%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #12185 +/- ##
==========================================
+ Coverage 38.39% 38.50% +0.10%
==========================================
Files 1630 1630
Lines 80290 80458 +168
Branches 12120 12152 +32
==========================================
+ Hits 30830 30981 +151
- Misses 47724 47726 +2
- Partials 1736 1751 +15
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
erichare
left a comment
There was a problem hiding this comment.
LGTM! Did basic sanity checks, confirmed that backend functionality from 1.8.1 and from main exists
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/backend/tests/unit/test_knowledge_bases_api.py (1)
274-295:⚠️ Potential issue | 🟡 Minor
mock_delete.calledis too weak for bulk-delete validation.At Line 295, this can pass even if only one KB deletion happened. The test should verify both expected deletions were attempted.
Suggested fix
+from unittest.mock import AsyncMock, MagicMock, call, patch ... - assert mock_delete.called + assert mock_delete.call_count == 2 + mock_delete.assert_has_calls( + [ + call(kb_user_path / "KB1", "KB1"), + call(kb_user_path / "KB2", "KB2"), + ], + any_order=True, + )As per coding guidelines: "Verify tests cover both positive and negative scenarios where appropriate."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/tests/unit/test_knowledge_bases_api.py` around lines 274 - 295, The test_bulk_delete_knowledge_bases currently only asserts mock_delete.called which is too weak; update the assertion to verify both expected deletions were attempted by checking mock_delete.call_count == 2 and that the calls include deletions for "KB1" and "KB2" (use mock_delete.assert_has_calls or inspect mock_delete.call_args_list) targeting KBStorageHelper.delete_storage with the appropriate arguments for those two KB names in the test_bulk_delete_knowledge_bases test.src/backend/base/langflow/api/v1/knowledge_bases.py (2)
517-518:⚠️ Potential issue | 🔴 CriticalThis
finallyblock can turn a 404 into a 500.If
_resolve_kb_path()raises on Line 518,kb_pathis never bound and Line 596 then raisesUnboundLocalErrorduring cleanup.Possible fix
async def get_knowledge_base_chunks( kb_name: str, current_user: CurrentActiveUser, page: Annotated[int, Query(ge=1)] = 1, limit: Annotated[int, Query(ge=1, le=100)] = 50, search: Annotated[str, Query(description="Filter chunks whose text contains this substring")] = "", ) -> PaginatedChunkResponse: """Get chunks from a specific knowledge base with pagination.""" + kb_path: Path | None = None try: kb_path = _resolve_kb_path(kb_name, current_user) @@ finally: client = None chroma = None - KBStorageHelper.release_chroma_resources(kb_path) + if kb_path is not None: + KBStorageHelper.release_chroma_resources(kb_path)Also applies to: 593-596
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/api/v1/knowledge_bases.py` around lines 517 - 518, The finally block can raise UnboundLocalError if _resolve_kb_path(kb_name, current_user) fails because kb_path is not bound; initialize kb_path = None before the try that calls _resolve_kb_path and update the finally/cleanup logic (in the same function) to guard any use of kb_path (e.g., deletion/cleanup steps) with an if kb_path is not None check so a raised 404/error from _resolve_kb_path does not become a 500 during cleanup.
57-61:⚠️ Potential issue | 🟠 MajorGuard the rollback path before touching
kb_path.If setup fails before Line 61 finishes assigning
kb_path, the handler crashes on Line 130 withUnboundLocalError. Line 131 also ignores aFalsereturn fromdelete_storage(), so failed creates can leave a half-created KB behind.Possible fix
async def create_knowledge_base( request: CreateKnowledgeBaseRequest, current_user: CurrentActiveUser, ) -> KnowledgeBaseInfo: """Create a new knowledge base with embedding configuration.""" + kb_name = request.name.strip().replace(" ", "_") + kb_path: Path | None = None try: kb_root_path = KBStorageHelper.get_root_path() kb_user = current_user.username - kb_name = request.name.strip().replace(" ", "_") kb_path = kb_root_path / kb_user / kb_name @@ except Exception as e: # Clean up if something went wrong - if kb_path.exists(): - KBStorageHelper.delete_storage(kb_path, kb_name) + cleanup_failed = False + if kb_path is not None and kb_path.exists(): + cleanup_failed = not KBStorageHelper.delete_storage(kb_path, kb_name) await logger.aerror("Error creating knowledge base: %s", e) - raise HTTPException(status_code=500, detail="Internal error creating knowledge base") from e + detail = "Internal error creating knowledge base" + if cleanup_failed: + detail += f". Cleanup for '{kb_name}' also failed." + raise HTTPException(status_code=500, detail=detail) from eAlso applies to: 128-133
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/api/v1/knowledge_bases.py` around lines 57 - 61, The handler constructs kb_path using KBStorageHelper.get_root_path(), current_user.username and request.name but if setup fails before kb_path is fully assigned the rollback code references an uninitialized kb_path and may crash; modify the create handler so kb_path is initialized to None before the try, only attempt rollback/delete_storage(kb_path) if kb_path is not None and exists, and handle delete_storage's boolean return (log or raise on False) instead of ignoring it; update references to kb_path, KBStorageHelper.get_root_path(), and delete_storage(...) accordingly to ensure safe cleanup when setup fails.
🧹 Nitpick comments (13)
src/frontend/src/CustomNodes/hooks/use-fetch-data-on-mount.ts (1)
58-58: Missing dependencies in useEffect may cause stale closure issues.The dependency array is empty but the effect closes over
node,name,nodeId,setNodeClass,postTemplateValue, andsetErrorData. While this is intentional "on mount" behavior (as the hook name suggests), consider adding an eslint-disable comment to suppress the exhaustive-deps warning and document the intent.📝 Suggested documentation of intent
fetchData(); - }, []); + // eslint-disable-next-line react-hooks/exhaustive-deps -- Intentionally run only on mount + }, []);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/frontend/src/CustomNodes/hooks/use-fetch-data-on-mount.ts` at line 58, The useEffect in the useFetchDataOnMount hook intentionally runs only on mount but currently omits dependencies (node, name, nodeId, setNodeClass, postTemplateValue, setErrorData) and will trigger an exhaustive-deps lint warning; update the effect by adding a clear inline comment explaining that the effect must run only once on mount and then add an eslint-disable-next-line react-hooks/exhaustive-deps (or eslint-disable for that specific line) immediately above the useEffect to suppress the warning so the intentional mount-only behavior is documented and the linter is satisfied.src/frontend/tests/core/regression/session-deletion-data-leakage.spec.ts (2)
77-80: Consider adding a test docstring per coding guidelines.The test name is descriptive, but coding guidelines require documenting each test with a clear comment explaining its purpose, the scenario being tested, and expected outcomes.
📝 Suggested documentation
test( "should prevent data leakage when default session is deleted and recreated", { tag: ["@release", "@regression"] }, async ({ page }) => { + /** + * Purpose: Verify that deleting a session clears its messages and prevents data leakage. + * Scenario: Send a message in the default session, delete the session, then send a new message. + * Expected: The original message should not appear after deletion; only the new message should be visible. + */ test.skip(Based on learnings: "Document each frontend test with a clear docstring/comment explaining its purpose, the scenario being tested, and expected outcomes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/frontend/tests/core/regression/session-deletion-data-leakage.spec.ts` around lines 77 - 80, Add a docstring comment immediately above the test(...) declaration for "should prevent data leakage when default session is deleted and recreated" that briefly describes the purpose (verify no cross-session data leakage), the scenario steps (delete the default session, recreate it, perform actions that previously leaked data), and the expected outcomes (no residual data from prior session persists and assertions that confirm isolation); place this comment directly above the test(...) block so it documents the test function in src/frontend/tests/core/regression/session-deletion-data-leakage.spec.ts.
47-63: Consider replacingwaitForTimeoutwith condition-based waits.Playwright's
waitForTimeoutis discouraged because fixed delays make tests slower and flakier. The helper could use more robust waits:
- Line 49: Wait for the more button to be visible after hover
- Line 56: Already followed by a
waitFor, so this delay may be unnecessary- Line 63: Wait for the session element to be removed from DOM
♻️ Proposed refactor using condition-based waits
// Hover to make the more button visible await selector.hover(); - await page.waitForTimeout(500); // Wait for hover effects // Click the more options button const moreButton = selector.locator('[aria-label="More options"]'); + await moreButton.waitFor({ state: "visible", timeout: 5000 }); await moreButton.click({ timeout: 5000 }); - // Wait for the menu to open - await page.waitForTimeout(500); // Wait for delete option to be visible and click it await page .getByTestId("delete-session-option") .waitFor({ state: "visible", timeout: 5000 }); await page.getByTestId("delete-session-option").click(); - await page.waitForTimeout(1000); + // Wait for the session to be removed + await selector.waitFor({ state: "detached", timeout: 5000 }); break;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/frontend/tests/core/regression/session-deletion-data-leakage.spec.ts` around lines 47 - 63, Replace fixed sleeps with condition-based Playwright waits: after selector.hover() replace page.waitForTimeout(500) with waiting for the more options button to be visible (e.g., await selector.locator('[aria-label="More options"]').waitFor({ state: "visible", timeout: 5000 }) or use expect(moreButton).toBeVisible()); remove the redundant page.waitForTimeout(500) before waiting for delete option since you already call getByTestId("delete-session-option").waitFor(...); and after clicking the delete control replace page.waitForTimeout(1000) with a wait for the session row to be removed (e.g., await selector.waitFor({ state: "detached", timeout: 5000 }) or expect(selector).not.toBeVisible()). Ensure you update the code paths around selector.hover(), moreButton (locator '[aria-label="More options"]'), and getByTestId("delete-session-option") to use these condition-based waits.src/frontend/src/modals/exportModal/index.tsx (1)
61-71: PreferconstsinceflowToExportis never reassigned.The variable
flowToExportis declared withletbut is never reassigned in this function. It's either passed directly todownloadFlowor passed throughremoveApiKeys(which returns a new object). Usingconstis preferred here to signal immutability.♻️ Suggested fix
- let flowToExport: FlowType = { + const flowToExport: FlowType = {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/frontend/src/modals/exportModal/index.tsx` around lines 61 - 71, Replace the mutable declaration of flowToExport with an immutable one: change the `let flowToExport` declaration in the export modal to `const` since `flowToExport` is never reassigned (it is only passed to `removeApiKeys` or `downloadFlow`); update the line where `flowToExport` is created in the function handling export (referencing `flowToExport`, `currentFlow`, `removeApiKeys`, and `downloadFlow`) to use `const` to signal immutability.src/frontend/src/utils/__tests__/removeApiKeys.test.ts (1)
3-79: Good test coverage without mocking!The tests effectively verify the core behavior of
removeApiKeys:
- Preserving
api_keyfields with variable-like names- Clearing
api_keyfields with raw secrets- Preserving non-password fields
Consider adding edge case tests for completeness:
🧪 Suggested additional test cases
it("clears api_key when load_from_db is true but value is a raw secret", () => { // Edge case: load_from_db might be true but value could still be a raw secret // depending on how data was set. Current implementation preserves if load_from_db=true. const flow = makeFlow({ api_key: { name: "api_key", value: "sk-secret-123", password: true, load_from_db: true, }, }); const result = removeApiKeys(flow); const template = result.data!.nodes[0].data.node!.template; // Document expected behavior: load_from_db=true preserves the value expect(template.api_key.value).toBe("sk-secret-123"); expect(template.api_key.load_from_db).toBe(true); }); it("preserves api_key with lowercase variable-like name", () => { const flow = makeFlow({ api_key: { name: "api_key", value: "openai_api_key", // lowercase variable name password: true, load_from_db: false, }, }); const result = removeApiKeys(flow); const template = result.data!.nodes[0].data.node!.template; expect(template.api_key.value).toBe("openai_api_key"); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/frontend/src/utils/__tests__/removeApiKeys.test.ts` around lines 3 - 79, Add the two suggested edge-case tests to src/frontend/src/utils/__tests__/removeApiKeys.test.ts: create cases using makeFlow and removeApiKeys that (1) pass an api_key with password: true and load_from_db: true but value looks like a raw secret to assert current behavior (preserved value and load_from_db true) and (2) pass an api_key with a lowercase variable-like value (e.g., "openai_api_key") to assert it is preserved; place assertions against result.data.nodes[0].data.node.template to mirror existing tests.src/frontend/src/utils/reactflowUtils.ts (1)
473-486: Note: Only fields named exactlyapi_keyare preserved.The special handling applies only when
key === "api_key". Other password fields with different names (e.g.,openai_api_key,anthropic_api_key) will still have their values cleared even if they contain variable-like names.If this is intentional (and the test confirms it is), this is fine. Otherwise, consider whether the check should apply to any password field whose value looks like a variable name:
♻️ Alternative: Apply to all password fields with variable-like values
if (field.password) { - // Preserve env/global variable names for api_key so imported flows - // can still resolve credentials, but strip any raw secrets. - if ( - key === "api_key" && - ((typeof field.value === "string" && - looksLikeVariableName(field.value)) || - field.load_from_db === true) - ) { + // Preserve env/global variable names so imported flows + // can still resolve credentials, but strip any raw secrets. + if ( + (typeof field.value === "string" && + looksLikeVariableName(field.value)) || + field.load_from_db === true + ) { continue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/frontend/src/utils/reactflowUtils.ts` around lines 473 - 486, The current branch inside the field.password handling only preserves values when key === "api_key"; update that condition so variable-like or DB-loaded credentials are preserved for other API key-style fields too (e.g., openai_api_key, anthropic_api_key) — replace the strict equality check with a broader predicate (for example check that the key endsWith("_api_key") or simply drop the key check and allow any password field where looksLikeVariableName(field.value) or field.load_from_db === true to continue) while keeping the rest of the logic that sets field.value = "" and field.load_from_db = false for raw secrets; refer to the field.password branch and the helpers looksLikeVariableName and field.load_from_db to locate and modify the condition.src/backend/tests/unit/test_knowledge_bases_api.py (1)
262-273: Assert delete arguments, not only invocation count.At Line 272,
assert_called_once()misses whether the correctkb_path/kb_namewas passed todelete_storage.Suggested tightening
- mock_delete.assert_called_once() + mock_delete.assert_called_once_with( + tmp_path / "activeuser" / "To_Delete", + "To_Delete", + )As per coding guidelines: "Verify tests cover both positive and negative scenarios where appropriate."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/tests/unit/test_knowledge_bases_api.py` around lines 262 - 273, In test_delete_knowledge_base, replace the loose mock assertion with a precise argument check: instead of mock_delete.assert_called_once(), assert that KBStorageHelper.delete_storage was called with the expected kb path/name by using mock_delete.assert_called_once_with(...) — e.g., pass the constructed path (tmp_path / "activeuser" / "To_Delete") or the expected string/Path that the implementation uses; update the expectation to match the actual signature used by KBStorageHelper.delete_storage in the code under test so the test verifies both invocation and correct arguments.src/backend/tests/unit/api/v1/test_mcp_projects.py (1)
1115-1161: Consider extracting common monkeypatch setup to reduce duplication.This test duplicates the URL-related monkeypatches from
_prepare_installed_check_env. While the different path setup is intentional (non-existent directories), the URL-related patches could be extracted to a smaller helper or the existing helper could accept an optional parameter to skip directory creation.♻️ Suggested refactor to reduce duplication
+def _patch_mcp_url_helpers(monkeypatch): + """Patch URL helper functions for MCP tests.""" + monkeypatch.setattr("langflow.api.v1.mcp_projects.should_use_mcp_composer", lambda project: False) + + async def fake_streamable(project_id): + return f"https://langflow.local/api/v1/mcp/project/{project_id}/streamable" + + async def fake_sse(project_id): + return f"https://langflow.local/api/v1/mcp/project/{project_id}/sse" + + monkeypatch.setattr("langflow.api.v1.mcp_projects.get_project_streamable_http_url", fake_streamable) + monkeypatch.setattr("langflow.api.v1.mcp_projects.get_project_sse_url", fake_sse)Then use this helper in both
_prepare_installed_check_envand thetest_should_report_available_false_when_app_directory_does_not_existtest.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/tests/unit/api/v1/test_mcp_projects.py` around lines 1115 - 1161, Extract the duplicated URL monkeypatch setup into a reusable helper (or extend the existing _prepare_installed_check_env to accept a flag to skip creating app directories) so tests can reuse the same stream/SSE monkeypatches; specifically move the fake_streamable and fake_sse functions and the monkeypatch.setattr calls for get_project_streamable_http_url and get_project_sse_url into that helper, and in test_should_report_available_false_when_app_directory_does_not_exist call the helper (or call the extended _prepare_installed_check_env with skip-directory-creation) while keeping the unique fake_get_config_path monkeypatch in the test.src/backend/base/langflow/initial_setup/starter_projects/Youtube Analysis.json (1)
661-661: Remove the deadmemory_inputsscaffolding from this snapshot.
set_advanced_true()andmemory_inputsare still evaluated at class-definition time even though*memory_inputsis commented out. That adds unnecessary work on load and makes the exported starter project drift from the inputs it actually exposes.Possible cleanup inside the embedded AgentComponent code
-def set_advanced_true(component_input): - component_input.advanced = True - return component_input @@ - memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs] @@ - # removed memory inputs from agent component - # *memory_inputs,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/initial_setup/starter_projects/Youtube` Analysis.json at line 661, Remove the dead scaffolding by deleting the set_advanced_true function and the memory_inputs attribute from the AgentComponent class (they are evaluated at import time even though *memory_inputs is commented out); update any references within the snippet to stop using memory_inputs (e.g., the declaration memory_inputs = [...] and its helper function set_advanced_true) so class-definition work is not performed on load and the exported starter project matches the actual inputs.src/backend/base/langflow/initial_setup/starter_projects/Search agent.json (1)
1100-1100: Drop the unused memory-input helpers from this embedded agent too.Same issue here:
set_advanced_true()andmemory_inputsare still built even though the inputs are no longer exposed. Keeping them in the snapshot just adds dead setup work and extra drift across starter projects.Possible cleanup inside the embedded AgentComponent code
-def set_advanced_true(component_input): - component_input.advanced = True - return component_input @@ - memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs] @@ - # removed memory inputs from agent component - # *memory_inputs,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/initial_setup/starter_projects/Search` agent.json at line 1100, The embedded AgentComponent still defines the helper function set_advanced_true and builds memory_inputs = [set_advanced_true(...)] even though memory inputs are no longer used; remove the dead helper and attribute to avoid unnecessary setup. Delete the set_advanced_true function and the memory_inputs class attribute from the AgentComponent definition (and any references to it) and ensure AgentComponent.inputs no longer depends on memory_inputs; run tests/lint to catch any leftover references that need removal.src/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.json (1)
2786-2786: Normalizemax_tokensinLanguageModelComponent.build_modelbefore callingget_llm.In this block,
build_modelstill passes rawmax_tokens(getattr(self, "max_tokens", None)). If the UI sends0/empty, this can propagate an invalid token limit. The Agent block already normalizes this pattern; applying the same logic here avoids provider-side errors.♻️ Proposed patch (inside the embedded Python code)
class LanguageModelComponent(LCModelComponent): @@ + def _get_max_tokens_value(self): + val = getattr(self, "max_tokens", None) + if val in {"", 0}: + return None + return val + def build_model(self) -> LanguageModel: return get_llm( model=self.model, user_id=self.user_id, api_key=self.api_key, temperature=self.temperature, stream=self.stream, - max_tokens=getattr(self, "max_tokens", None), + max_tokens=self._get_max_tokens_value(), watsonx_url=getattr(self, "base_url_ibm_watsonx", None), watsonx_project_id=getattr(self, "project_id", None), ollama_base_url=getattr(self, "ollama_base_url", None), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/initial_setup/starter_projects/Instagram` Copywriter.json at line 2786, The build_model method in LanguageModelComponent currently forwards raw max_tokens (getattr(self, "max_tokens", None)) to get_llm which can propagate invalid values like 0; change LanguageModelComponent.build_model to normalize max_tokens first (e.g., read max_tokens = getattr(self, "max_tokens", None) and set max_tokens = None if max_tokens in (0, "", None) or not int-positive) and then pass that normalized max_tokens into get_llm so providers never receive an invalid token limit.src/backend/tests/unit/test_kb_storage_deletion.py (2)
299-305: Assert that the happy-path handlers actually delete storage.Both success cases only check the 200 response/
deleted_count, so they would still pass if the route stopped callingKBStorageHelper.delete_storageand just reported success. Add a spy on the helper or assert the KB directories are gone after the request.Also applies to: 334-350
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/tests/unit/test_kb_storage_deletion.py` around lines 299 - 305, Update the test_should_delete_kb_successfully (and the similar test at 334-350) to verify the storage was actually removed rather than only checking the HTTP response: either attach a spy/mock to KBStorageHelper.delete_storage to assert it was called with the expected user/KB name, or perform filesystem assertions against tmp_path (e.g., assert that (tmp_path / "activeuser" / "My_KB") no longer exists after the DELETE request); ensure you reference the test function test_should_delete_kb_successfully and the KBStorageHelper.delete_storage helper so the test fails if deletion is not performed.
227-246: Avoid duplicating the retry schedule in the assertion.This hard-codes
[1.0, 2.0, 4.0], so the test will start failing whenever the configured backoff changes even if the implementation is still correct. Build the expected sequence from the same retry/backoff configuration the helper uses.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/tests/unit/test_kb_storage_deletion.py` around lines 227 - 246, The test currently hard-codes the sleep sequence [1.0, 2.0, 4.0]; instead compute the expected backoff sequence from KBStorageHelper's retry/backoff configuration used by delete_storage: read KBStorageHelper.RETRIES and KBStorageHelper.BACKOFF_SECONDS (or the actual constant names used in the class), then build expected = [initial_backoff * (2 ** i) for i in range(KBStorageHelper.RETRIES - 1)] (or the equivalent formula if the helper uses different names like INITIAL_BACKOFF/BACKOFF_MULTIPLIER), and assert sleep_values == expected so the test adapts to config changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/base/langflow/api/v1/knowledge_bases.py`:
- Around line 637-639: The bulk-delete currently treats a False return from
KBStorageHelper.delete_storage(kb_path, kb_name) as a silent success; update the
try block handling around the delete loop so that when delete_storage returns
False you record that KB as failed (e.g., add kb_name or a failure tuple to a
failed_kbs list or increment a failed_count) and do not increment deleted_count,
and include that failure in the final response construction (the same response
path that uses deleted_count/failed list later), ensuring any exception from
delete_storage is still caught by the existing except but non-exceptional False
returns are reported back to the caller.
In `@src/backend/base/langflow/initial_setup/starter_projects/Basic`
Prompting.json:
- Line 1040: The starter metadata's last_tested_version is stale (still "1.8.0")
for this JSON that now includes the new LanguageModelComponent; locate the JSON
object's metadata key last_tested_version and update its string value to "1.8.1"
so the bundled starter reflects the correct release; ensure you only change the
metadata field and do not modify the LanguageModelComponent class or other
fields.
In `@src/backend/base/langflow/initial_setup/starter_projects/Market`
Research.json:
- Line 1352: The model prompt currently asks the LLM to "Output (only JSON
schema)" which makes it echo the schema instead of returning data; update the
schema_info text in json_response() (the variable schema_info) to instruct the
agent to return JSON data that conforms to the provided schema (e.g., "Return
only JSON data that conforms to the schema, do NOT return the schema itself").
Also broaden the JSON extraction in build_structured_output_base() by changing
json_pattern to match both objects and arrays (e.g., allow \[...\] as well as
{...}) and ensure the fallback parsing logic accepts and preserves top-level
arrays so prose-wrapped arrays are extracted and validated just like single
objects (see json_pattern and the JSON parsing/fallback branches in
build_structured_output_base()).
In `@src/backend/base/langflow/initial_setup/starter_projects/News`
Aggregator.json:
- Line 1176: The footer field "last_tested_version" in this JSON is stale (still
"1.6.0") after you refreshed the embedded AgentComponent (see "code_hash":
"40d1976f4718"); update the "last_tested_version" property to the actual
LangFlow version used when validating this starter snapshot so the metadata
matches the embedded code (replace the existing "1.6.0" value with the correct
version string).
In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json:
- Line 1893: The serialized template still exports the Embedding Model node as a
plain DropdownInput string which conflicts with EmbeddingModelComponent
expecting self.model to be a list; update the template so the node uses a
ModelInput (not DropdownInput) matching EmbeddingModelComponent.inputs
(model_type="embedding", input value as a list[dict] with keys "provider" and
"name" like the runtime ModelInput payload) so update_build_config() can resolve
provider and build_embeddings() will receive a non-empty list; locate the
EmbeddingModelComponent.inputs block and replace the stale
DropdownInput/Dropdown value for the model node with the proper ModelInput/list
payload in the serialized JSON template.
- Line 1893: The watsonx UI toggles truncate_input_tokens and input_text are
never added to the instantiated embedding kwargs, so update both _build_kwargs
and _build_kwargs_for_model to read self.truncate_input_tokens and
self.input_text (with hasattr checks) and, when those values are present, add
them to the returned kwargs using the param names defined in
metadata["param_mapping"] (i.e., if param_mapping maps "truncate_input_tokens"
or "input_text" to a target kwarg name, set
kwargs[param_mapping["truncate_input_tokens"]] = int(self.truncate_input_tokens)
and kwargs[param_mapping["input_text"]] = bool(self.input_text)); keep this
logic alongside the existing Watson-specific block so update_build_config,
_build_kwargs, and _build_kwargs_for_model remain consistent.
In `@src/backend/base/langflow/initial_setup/starter_projects/Pokédex` Agent.json:
- Line 1241: The JSON footer's last_tested_version value is stale after
refreshing the embedded AgentComponent snapshot (code_hash "40d1976f4718");
update the "last_tested_version" field in this starter's metadata to the correct
version string used when validating this snapshot (i.e., bump the footer value
to match the snapshot/validation version) so the starter metadata and embedded
code_hash stay consistent, and verify there are no other duplicate
last_tested_version entries remaining.
In `@src/backend/base/langflow/initial_setup/starter_projects/Price` Deal
Finder.json:
- Line 1768: Structured-response failures come from
build_structured_output_base() only searching for object braces "{...}" so valid
JSON arrays are missed and json_response() instructs the model to "Return it as
valid JSON" in a way that can force a schema-only reply; fix by updating
build_structured_output_base (json_pattern and parsing logic) to detect and
extract both JSON objects and arrays (e.g., try full-string json.loads first,
then regex for r"\{.*\}" OR r"\[.*\]" and attempt to json.loads the match), and
also relax the schema prompt built in json_response() (the schema_info variable)
to ask the model to return the JSON schema OR a JSON instance but not to return
only the schema text so downstream parsing/validation can accept arrays like
[{"name":...}, ...]; target functions/vars: build_structured_output_base,
json_pattern, json_response, schema_info, combined_instructions.
In `@src/backend/base/langflow/initial_setup/starter_projects/Simple` Agent.json:
- Line 1100: The json parsing in build_structured_output_base and the
json_response flow is too narrow (only matches object fragments) causing valid
JSON arrays or wrapped JSON to be rejected; update build_structured_output_base
(and any related regex/logic used by json_response) to accept both object and
array JSON fragments by changing the json_pattern to match either {…} or […],
try parsing the full content first, then search for either array or object
fragments using re.DOTALL (e.g., r"(\{.*\}|\[.*\])"), and when a fragment is
found json.loads() it and proceed; ensure the function still returns parsed
lists unchanged (and downstream json_response handles list results correctly)
and keep existing validation via _preprocess_schema/build_model_from_schema and
error fallbacks intact (targets: functions build_structured_output_base,
json_response, and the json_pattern variable).
In `@src/frontend/tests/core/regression/session-deletion-data-leakage.spec.ts`:
- Around line 192-195: The current assertion using
expect(responseText?.toLowerCase()).not.toContain("victor") can false-fail when
the LLM mentions "Victor" while explicitly denying knowledge; update the
assertion to allow either a clear denial phrase or absence of the name: create a
boolean like indicatesNoKnowledge that checks
responseText?.toLowerCase().includes("don't know") || .includes("haven't told")
|| .includes("not sure"), then assert expect(indicatesNoKnowledge ||
!responseText?.toLowerCase().includes("victor")).toBe(true); reference the
existing responseText variable in the test to implement this replacement.
In `@src/lfx/pyproject.toml`:
- Line 15: The pyproject FastAPI dependency bump may break compatibility: update
pyproject.toml (the "fastapi>=0.135.0,<1.0.0" entry) and the project metadata to
require a Python version that FastAPI v0.135+ supports (remove/raise support for
3.8/3.9 via python_requires), verify and migrate all Pydantic usage to Pydantic
v2 APIs, replace deprecated ORJSONResponse/UJSONResponse usages (search for
ORJSONResponse and UJSONResponse classes) with supported response classes or
custom JSONResponse, and fix strict Content-Type handling by adding explicit
content-type checks or configuring request handling middleware/route decorators
to accept requests without Content-Type (search handlers/middleware that parse
JSON bodies); run the test suite and update dependency specs to reflect Pydantic
v2 and the new Python minimum.
---
Outside diff comments:
In `@src/backend/base/langflow/api/v1/knowledge_bases.py`:
- Around line 517-518: The finally block can raise UnboundLocalError if
_resolve_kb_path(kb_name, current_user) fails because kb_path is not bound;
initialize kb_path = None before the try that calls _resolve_kb_path and update
the finally/cleanup logic (in the same function) to guard any use of kb_path
(e.g., deletion/cleanup steps) with an if kb_path is not None check so a raised
404/error from _resolve_kb_path does not become a 500 during cleanup.
- Around line 57-61: The handler constructs kb_path using
KBStorageHelper.get_root_path(), current_user.username and request.name but if
setup fails before kb_path is fully assigned the rollback code references an
uninitialized kb_path and may crash; modify the create handler so kb_path is
initialized to None before the try, only attempt
rollback/delete_storage(kb_path) if kb_path is not None and exists, and handle
delete_storage's boolean return (log or raise on False) instead of ignoring it;
update references to kb_path, KBStorageHelper.get_root_path(), and
delete_storage(...) accordingly to ensure safe cleanup when setup fails.
In `@src/backend/tests/unit/test_knowledge_bases_api.py`:
- Around line 274-295: The test_bulk_delete_knowledge_bases currently only
asserts mock_delete.called which is too weak; update the assertion to verify
both expected deletions were attempted by checking mock_delete.call_count == 2
and that the calls include deletions for "KB1" and "KB2" (use
mock_delete.assert_has_calls or inspect mock_delete.call_args_list) targeting
KBStorageHelper.delete_storage with the appropriate arguments for those two KB
names in the test_bulk_delete_knowledge_bases test.
---
Nitpick comments:
In `@src/backend/base/langflow/initial_setup/starter_projects/Instagram`
Copywriter.json:
- Line 2786: The build_model method in LanguageModelComponent currently forwards
raw max_tokens (getattr(self, "max_tokens", None)) to get_llm which can
propagate invalid values like 0; change LanguageModelComponent.build_model to
normalize max_tokens first (e.g., read max_tokens = getattr(self, "max_tokens",
None) and set max_tokens = None if max_tokens in (0, "", None) or not
int-positive) and then pass that normalized max_tokens into get_llm so providers
never receive an invalid token limit.
In `@src/backend/base/langflow/initial_setup/starter_projects/Search` agent.json:
- Line 1100: The embedded AgentComponent still defines the helper function
set_advanced_true and builds memory_inputs = [set_advanced_true(...)] even
though memory inputs are no longer used; remove the dead helper and attribute to
avoid unnecessary setup. Delete the set_advanced_true function and the
memory_inputs class attribute from the AgentComponent definition (and any
references to it) and ensure AgentComponent.inputs no longer depends on
memory_inputs; run tests/lint to catch any leftover references that need
removal.
In `@src/backend/base/langflow/initial_setup/starter_projects/Youtube`
Analysis.json:
- Line 661: Remove the dead scaffolding by deleting the set_advanced_true
function and the memory_inputs attribute from the AgentComponent class (they are
evaluated at import time even though *memory_inputs is commented out); update
any references within the snippet to stop using memory_inputs (e.g., the
declaration memory_inputs = [...] and its helper function set_advanced_true) so
class-definition work is not performed on load and the exported starter project
matches the actual inputs.
In `@src/backend/tests/unit/api/v1/test_mcp_projects.py`:
- Around line 1115-1161: Extract the duplicated URL monkeypatch setup into a
reusable helper (or extend the existing _prepare_installed_check_env to accept a
flag to skip creating app directories) so tests can reuse the same stream/SSE
monkeypatches; specifically move the fake_streamable and fake_sse functions and
the monkeypatch.setattr calls for get_project_streamable_http_url and
get_project_sse_url into that helper, and in
test_should_report_available_false_when_app_directory_does_not_exist call the
helper (or call the extended _prepare_installed_check_env with
skip-directory-creation) while keeping the unique fake_get_config_path
monkeypatch in the test.
In `@src/backend/tests/unit/test_kb_storage_deletion.py`:
- Around line 299-305: Update the test_should_delete_kb_successfully (and the
similar test at 334-350) to verify the storage was actually removed rather than
only checking the HTTP response: either attach a spy/mock to
KBStorageHelper.delete_storage to assert it was called with the expected user/KB
name, or perform filesystem assertions against tmp_path (e.g., assert that
(tmp_path / "activeuser" / "My_KB") no longer exists after the DELETE request);
ensure you reference the test function test_should_delete_kb_successfully and
the KBStorageHelper.delete_storage helper so the test fails if deletion is not
performed.
- Around line 227-246: The test currently hard-codes the sleep sequence [1.0,
2.0, 4.0]; instead compute the expected backoff sequence from KBStorageHelper's
retry/backoff configuration used by delete_storage: read KBStorageHelper.RETRIES
and KBStorageHelper.BACKOFF_SECONDS (or the actual constant names used in the
class), then build expected = [initial_backoff * (2 ** i) for i in
range(KBStorageHelper.RETRIES - 1)] (or the equivalent formula if the helper
uses different names like INITIAL_BACKOFF/BACKOFF_MULTIPLIER), and assert
sleep_values == expected so the test adapts to config changes.
In `@src/backend/tests/unit/test_knowledge_bases_api.py`:
- Around line 262-273: In test_delete_knowledge_base, replace the loose mock
assertion with a precise argument check: instead of
mock_delete.assert_called_once(), assert that KBStorageHelper.delete_storage was
called with the expected kb path/name by using
mock_delete.assert_called_once_with(...) — e.g., pass the constructed path
(tmp_path / "activeuser" / "To_Delete") or the expected string/Path that the
implementation uses; update the expectation to match the actual signature used
by KBStorageHelper.delete_storage in the code under test so the test verifies
both invocation and correct arguments.
In `@src/frontend/src/CustomNodes/hooks/use-fetch-data-on-mount.ts`:
- Line 58: The useEffect in the useFetchDataOnMount hook intentionally runs only
on mount but currently omits dependencies (node, name, nodeId, setNodeClass,
postTemplateValue, setErrorData) and will trigger an exhaustive-deps lint
warning; update the effect by adding a clear inline comment explaining that the
effect must run only once on mount and then add an eslint-disable-next-line
react-hooks/exhaustive-deps (or eslint-disable for that specific line)
immediately above the useEffect to suppress the warning so the intentional
mount-only behavior is documented and the linter is satisfied.
In `@src/frontend/src/modals/exportModal/index.tsx`:
- Around line 61-71: Replace the mutable declaration of flowToExport with an
immutable one: change the `let flowToExport` declaration in the export modal to
`const` since `flowToExport` is never reassigned (it is only passed to
`removeApiKeys` or `downloadFlow`); update the line where `flowToExport` is
created in the function handling export (referencing `flowToExport`,
`currentFlow`, `removeApiKeys`, and `downloadFlow`) to use `const` to signal
immutability.
In `@src/frontend/src/utils/__tests__/removeApiKeys.test.ts`:
- Around line 3-79: Add the two suggested edge-case tests to
src/frontend/src/utils/__tests__/removeApiKeys.test.ts: create cases using
makeFlow and removeApiKeys that (1) pass an api_key with password: true and
load_from_db: true but value looks like a raw secret to assert current behavior
(preserved value and load_from_db true) and (2) pass an api_key with a lowercase
variable-like value (e.g., "openai_api_key") to assert it is preserved; place
assertions against result.data.nodes[0].data.node.template to mirror existing
tests.
In `@src/frontend/src/utils/reactflowUtils.ts`:
- Around line 473-486: The current branch inside the field.password handling
only preserves values when key === "api_key"; update that condition so
variable-like or DB-loaded credentials are preserved for other API key-style
fields too (e.g., openai_api_key, anthropic_api_key) — replace the strict
equality check with a broader predicate (for example check that the key
endsWith("_api_key") or simply drop the key check and allow any password field
where looksLikeVariableName(field.value) or field.load_from_db === true to
continue) while keeping the rest of the logic that sets field.value = "" and
field.load_from_db = false for raw secrets; refer to the field.password branch
and the helpers looksLikeVariableName and field.load_from_db to locate and
modify the condition.
In `@src/frontend/tests/core/regression/session-deletion-data-leakage.spec.ts`:
- Around line 77-80: Add a docstring comment immediately above the test(...)
declaration for "should prevent data leakage when default session is deleted and
recreated" that briefly describes the purpose (verify no cross-session data
leakage), the scenario steps (delete the default session, recreate it, perform
actions that previously leaked data), and the expected outcomes (no residual
data from prior session persists and assertions that confirm isolation); place
this comment directly above the test(...) block so it documents the test
function in
src/frontend/tests/core/regression/session-deletion-data-leakage.spec.ts.
- Around line 47-63: Replace fixed sleeps with condition-based Playwright waits:
after selector.hover() replace page.waitForTimeout(500) with waiting for the
more options button to be visible (e.g., await
selector.locator('[aria-label="More options"]').waitFor({ state: "visible",
timeout: 5000 }) or use expect(moreButton).toBeVisible()); remove the redundant
page.waitForTimeout(500) before waiting for delete option since you already call
getByTestId("delete-session-option").waitFor(...); and after clicking the delete
control replace page.waitForTimeout(1000) with a wait for the session row to be
removed (e.g., await selector.waitFor({ state: "detached", timeout: 5000 }) or
expect(selector).not.toBeVisible()). Ensure you update the code paths around
selector.hover(), moreButton (locator '[aria-label="More options"]'), and
getByTestId("delete-session-option") to use these condition-based waits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7f54c188-3c57-4ee6-a113-038a00aaebb8
⛔ Files ignored due to path filters (2)
src/frontend/package-lock.jsonis excluded by!**/package-lock.jsonuv.lockis excluded by!**/*.lock
📒 Files selected for processing (70)
.secrets.baselinedocker/build_and_push.Dockerfiledocker/build_and_push_backend.Dockerfiledocker/build_and_push_base.Dockerfiledocker/build_and_push_ep.Dockerfiledocker/build_and_push_with_extras.Dockerfilepyproject.tomlsrc/backend/base/langflow/api/utils/core.pysrc/backend/base/langflow/api/utils/kb_helpers.pysrc/backend/base/langflow/api/v1/knowledge_bases.pysrc/backend/base/langflow/api/v1/mcp_projects.pysrc/backend/base/langflow/initial_setup/starter_projects/Basic Prompt Chaining.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Basic Prompting.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Blog Writer.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Custom Component Generator.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Document Q&A.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Market Research.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Meeting Summary.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Memory Chatbot.jsonsrc/backend/base/langflow/initial_setup/starter_projects/News Aggregator.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Price Deal Finder.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Research Agent.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.jsonsrc/backend/base/langflow/initial_setup/starter_projects/SEO Keyword Generator.jsonsrc/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Search agent.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Sequential Tasks Agents.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Simple Agent.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Text Sentiment Analysis.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Travel Planning Agents.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Twitter Thread Generator.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Vector Store RAG.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Youtube Analysis.jsonsrc/backend/base/langflow/processing/process.pysrc/backend/base/langflow/utils/kb_constants.pysrc/backend/base/pyproject.tomlsrc/backend/tests/unit/api/v1/test_mcp_projects.pysrc/backend/tests/unit/test_kb_storage_deletion.pysrc/backend/tests/unit/test_knowledge_bases_api.pysrc/frontend/package.jsonsrc/frontend/src/CustomNodes/hooks/use-fetch-data-on-mount.tssrc/frontend/src/components/core/parameterRenderComponent/components/inputGlobalComponent/index.tsxsrc/frontend/src/components/ui/__tests__/dialog.test.tsxsrc/frontend/src/constants/constants.tssrc/frontend/src/icons/AstraDB/AstraDB.jsxsrc/frontend/src/modals/exportModal/index.tsxsrc/frontend/src/utils/__tests__/removeApiKeys.test.tssrc/frontend/src/utils/reactflowUtils.tssrc/frontend/tests/assets/outdated_flow.jsonsrc/frontend/tests/core/regression/session-deletion-data-leakage.spec.tssrc/lfx/pyproject.tomlsrc/lfx/src/lfx/_assets/component_index.jsonsrc/lfx/src/lfx/_assets/stable_hash_history.jsonsrc/lfx/src/lfx/base/models/unified_models.pysrc/lfx/src/lfx/components/llm_operations/lambda_filter.pysrc/lfx/src/lfx/components/llm_operations/llm_conditional_router.pysrc/lfx/src/lfx/components/models_and_agents/agent.pysrc/lfx/src/lfx/components/models_and_agents/embedding_model.pysrc/lfx/src/lfx/components/models_and_agents/language_model.pysrc/lfx/src/lfx/processing/process.pysrc/lfx/tests/unit/inputs/test_max_tokens_propagation.py
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from lfx.base.models.model import LCModelComponent\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_language_model_options,\n get_llm,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import LanguageModel\nfrom lfx.field_typing.range_spec import RangeSpec\nfrom lfx.inputs.inputs import BoolInput, DropdownInput, StrInput\nfrom lfx.io import IntInput, MessageInput, ModelInput, MultilineInput, SecretStrInput, SliderInput\n\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass LanguageModelComponent(LCModelComponent):\n display_name = \"Language Model\"\n description = \"Runs a language model given a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-models\"\n icon = \"brain-circuit\"\n category = \"models\"\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Language Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n required=False,\n show=True,\n real_time_refresh=True,\n advanced=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n StrInput(\n name=\"project_id\",\n display_name=\"watsonx Project ID\",\n info=\"The project ID associated with the foundation model (IBM watsonx.ai only)\",\n show=False,\n required=False,\n ),\n StrInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n ),\n MessageInput(\n name=\"input_value\",\n display_name=\"Input\",\n info=\"The input text to send to the model\",\n ),\n MultilineInput(\n name=\"system_message\",\n display_name=\"System Message\",\n info=\"A system message that helps set the behavior of the assistant\",\n advanced=False,\n ),\n BoolInput(\n name=\"stream\",\n display_name=\"Stream\",\n info=\"Whether to stream the response\",\n value=False,\n advanced=True,\n ),\n SliderInput(\n name=\"temperature\",\n display_name=\"Temperature\",\n value=0.1,\n info=\"Controls randomness in responses\",\n range_spec=RangeSpec(min=0, max=1, step=0.01),\n advanced=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Max Tokens\",\n info=\"Maximum number of tokens to generate. Field name varies by provider.\",\n advanced=True,\n range_spec=RangeSpec(min=1, max=128000, step=1, step_type=\"int\"),\n ),\n ]\n\n def build_model(self) -> LanguageModel:\n return get_llm(\n model=self.model,\n user_id=self.user_id,\n api_key=self.api_key,\n temperature=self.temperature,\n stream=self.stream,\n max_tokens=getattr(self, \"max_tokens\", None),\n watsonx_url=getattr(self, \"base_url_ibm_watsonx\", None),\n watsonx_project_id=getattr(self, \"project_id\", None),\n ollama_base_url=getattr(self, \"ollama_base_url\", None),\n )\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"language_model_options\",\n get_options_func=get_language_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Hide all provider-specific fields by default\n for field in [\"api_key\", \"base_url_ibm_watsonx\", \"project_id\", \"ollama_base_url\"]:\n if field in build_config:\n build_config[field][\"show\"] = False\n build_config[field][\"required\"] = False\n\n # Show/configure provider-specific fields based on selected model\n # Get current model value - from field_value if model is being changed, otherwise from build_config\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n if isinstance(current_model_value, list) and len(current_model_value) > 0:\n selected_model = current_model_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n if provider:\n # Apply provider variable configuration (required_for_component, advanced, env var fallback)\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n return build_config\n" | ||
| "value": "from lfx.base.models.model import LCModelComponent\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_language_model_options,\n get_llm,\n get_provider_for_model_name,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import LanguageModel\nfrom lfx.field_typing.range_spec import RangeSpec\nfrom lfx.inputs.inputs import BoolInput, DropdownInput, StrInput\nfrom lfx.io import IntInput, MessageInput, ModelInput, MultilineInput, SecretStrInput, SliderInput\n\nDEFAULT_OLLAMA_URL = \"http://localhost:11434\"\n\n\nclass LanguageModelComponent(LCModelComponent):\n display_name = \"Language Model\"\n description = \"Runs a language model given a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-models\"\n icon = \"brain-circuit\"\n category = \"models\"\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Language Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n required=False,\n show=True,\n real_time_refresh=True,\n advanced=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n StrInput(\n name=\"project_id\",\n display_name=\"watsonx Project ID\",\n info=\"The project ID associated with the foundation model (IBM watsonx.ai only)\",\n show=False,\n required=False,\n ),\n StrInput(\n name=\"ollama_base_url\",\n display_name=\"Ollama API URL\",\n info=f\"Endpoint of the Ollama API (Ollama only). Defaults to {DEFAULT_OLLAMA_URL}\",\n value=DEFAULT_OLLAMA_URL,\n show=False,\n real_time_refresh=True,\n ),\n MessageInput(\n name=\"input_value\",\n display_name=\"Input\",\n info=\"The input text to send to the model\",\n ),\n MultilineInput(\n name=\"system_message\",\n display_name=\"System Message\",\n info=\"A system message that helps set the behavior of the assistant\",\n advanced=False,\n ),\n BoolInput(\n name=\"stream\",\n display_name=\"Stream\",\n info=\"Whether to stream the response\",\n value=False,\n advanced=True,\n ),\n SliderInput(\n name=\"temperature\",\n display_name=\"Temperature\",\n value=0.1,\n info=\"Controls randomness in responses\",\n range_spec=RangeSpec(min=0, max=1, step=0.01),\n advanced=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Max Tokens\",\n info=\"Maximum number of tokens to generate. Field name varies by provider.\",\n advanced=True,\n range_spec=RangeSpec(min=1, max=128000, step=1, step_type=\"int\"),\n ),\n ]\n\n def build_model(self) -> LanguageModel:\n return get_llm(\n model=self.model,\n user_id=self.user_id,\n api_key=self.api_key,\n temperature=self.temperature,\n stream=self.stream,\n max_tokens=getattr(self, \"max_tokens\", None),\n watsonx_url=getattr(self, \"base_url_ibm_watsonx\", None),\n watsonx_project_id=getattr(self, \"project_id\", None),\n ollama_base_url=getattr(self, \"ollama_base_url\", None),\n )\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"language_model_options\",\n get_options_func=get_language_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n provider = \"\"\n if isinstance(current_model_value, list) and current_model_value:\n selected_model = current_model_value[0]\n provider = (selected_model.get(\"provider\") or \"\").strip()\n if not provider and selected_model.get(\"name\"):\n provider = get_provider_for_model_name(str(selected_model[\"name\"]))\n\n if provider:\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n return build_config\n" |
There was a problem hiding this comment.
Update the starter’s compatibility marker.
This file now embeds new LanguageModelComponent logic, but last_tested_version at Line 1578 still says 1.8.0. That leaves the bundled starter metadata stale for the 1.8.1 release.
Suggested metadata update
- "last_tested_version": "1.8.0",
+ "last_tested_version": "1.8.1",Also applies to: 1578-1578
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/base/langflow/initial_setup/starter_projects/Basic`
Prompting.json at line 1040, The starter metadata's last_tested_version is stale
(still "1.8.0") for this JSON that now includes the new LanguageModelComponent;
locate the JSON object's metadata key last_tested_version and update its string
value to "1.8.1" so the bundled starter reflects the correct release; ensure you
only change the metadata field and do not modify the LanguageModelComponent
class or other fields.
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from __future__ import annotations\n\nimport json\nimport re\nfrom typing import TYPE_CHECKING\n\nfrom pydantic import ValidationError\n\nfrom lfx.components.models_and_agents.memory import MemoryComponent\n\nif TYPE_CHECKING:\n from langchain_core.tools import Tool\n\nfrom lfx.base.agents.agent import LCToolsAgentComponent\nfrom lfx.base.agents.events import ExceptionWithMessageError\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_language_model_options,\n get_llm,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.components.helpers import CurrentDateComponent\nfrom lfx.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom lfx.custom.custom_component.component import get_component_toolkit\nfrom lfx.field_typing.range_spec import RangeSpec\nfrom lfx.helpers.base_model import build_model_from_schema\nfrom lfx.inputs.inputs import BoolInput, DropdownInput, ModelInput, StrInput\nfrom lfx.io import IntInput, MessageTextInput, MultilineInput, Output, SecretStrInput, TableInput\nfrom lfx.log.logger import logger\nfrom lfx.schema.data import Data\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.schema.message import Message\nfrom lfx.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Language Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n StrInput(\n name=\"project_id\",\n display_name=\"watsonx Project ID\",\n info=\"The project ID associated with the foundation model (IBM watsonx.ai only)\",\n show=False,\n required=False,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n MessageTextInput(\n name=\"context_id\",\n display_name=\"Context ID\",\n info=\"The context ID of the chat. Adds an extra layer to the local memory.\",\n value=\"\",\n advanced=True,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Max Tokens\",\n info=\"Maximum number of tokens to generate. Field name varies by provider.\",\n advanced=True,\n range_spec=RangeSpec(min=1, max=128000, step=1, step_type=\"int\"),\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent.get_base_inputs(),\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n from langchain_core.tools import StructuredTool\n\n max_tokens_val = getattr(self, \"max_tokens\", None)\n if max_tokens_val in {\"\", 0}:\n max_tokens_val = None\n llm_model = get_llm(\n model=self.model,\n user_id=self.user_id,\n api_key=self.api_key,\n max_tokens=max_tokens_val,\n watsonx_url=getattr(self, \"base_url_ibm_watsonx\", None),\n watsonx_project_id=getattr(self, \"project_id\", None),\n )\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n await logger.adebug(f\"Retrieved {len(self.chat_history)} chat history messages\")\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n\n # Set shared callbacks for tracing the tools used by the agent\n self.set_tools_callbacks(self.tools, self._get_shared_callbacks())\n\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n await logger.aerror(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n except Exception as e:\n await logger.aerror(f\"Unexpected error: {e!s}\")\n raise\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\n \"true\",\n \"1\",\n \"t\",\n \"y\",\n \"yes\",\n ]\n processed_schema.append(processed_field)\n return processed_schema\n\n async def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n return json_data\n\n # Use BaseModel validation with schema\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n await logger.aerror(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n await logger.aerror(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n await logger.aerror(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n RuntimeError,\n ) as e:\n await logger.aerror(f\"Error with structured agent result: {e}\")\n raise\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n NotImplementedError,\n AttributeError,\n ) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = await self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n await logger.aerror(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(\n session_id=self.graph.session_id,\n context_id=self.context_id,\n order=\"Ascending\",\n n_messages=self.n_messages,\n )\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self,\n build_config: dotdict,\n field_value: list[dict],\n field_name: str | None = None,\n ) -> dotdict:\n # Update model options with caching (for all field changes)\n # Agents require tool calling, so filter for only tool-calling capable models\n def get_tool_calling_model_options(user_id=None):\n return get_language_model_options(user_id=user_id, tool_calling=True)\n\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=dict(build_config),\n cache_key_prefix=\"language_model_options_tool_calling\",\n get_options_func=get_tool_calling_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n build_config = dotdict(build_config)\n\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n if field_name == \"model\":\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Show/hide provider-specific fields based on selected model\n # Get current model value - from field_value if model is being changed, otherwise from build_config\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n if isinstance(current_model_value, list) and len(current_model_value) > 0:\n selected_model = current_model_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Hide provider-specific fields by default before applying provider config\n for field in [\"base_url_ibm_watsonx\", \"project_id\"]:\n if field in build_config:\n build_config[field][\"show\"] = False\n build_config[field][\"required\"] = False\n\n # Apply provider variable configuration (advanced, required, info, env var fallback)\n if provider:\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"model\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\",\n tool_description=description,\n # here we do not use the shared callbacks as we are exposing the agent as a tool\n callbacks=self.get_langchain_callbacks(),\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n\n return tools\n" | ||
| "value": "from __future__ import annotations\n\nimport json\nimport re\nfrom typing import TYPE_CHECKING\n\nfrom pydantic import ValidationError\n\nfrom lfx.components.models_and_agents.memory import MemoryComponent\n\nif TYPE_CHECKING:\n from langchain_core.tools import Tool\n\nfrom lfx.base.agents.agent import LCToolsAgentComponent\nfrom lfx.base.agents.events import ExceptionWithMessageError\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_language_model_options,\n get_llm,\n get_provider_for_model_name,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.components.helpers import CurrentDateComponent\nfrom lfx.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom lfx.custom.custom_component.component import get_component_toolkit\nfrom lfx.field_typing.range_spec import RangeSpec\nfrom lfx.helpers.base_model import build_model_from_schema\nfrom lfx.inputs.inputs import BoolInput, DropdownInput, ModelInput, StrInput\nfrom lfx.io import IntInput, MessageTextInput, MultilineInput, Output, SecretStrInput, TableInput\nfrom lfx.log.logger import logger\nfrom lfx.schema.data import Data\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.schema.message import Message\nfrom lfx.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Language Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n StrInput(\n name=\"project_id\",\n display_name=\"watsonx Project ID\",\n info=\"The project ID associated with the foundation model (IBM watsonx.ai only)\",\n show=False,\n required=False,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n MessageTextInput(\n name=\"context_id\",\n display_name=\"Context ID\",\n info=\"The context ID of the chat. Adds an extra layer to the local memory.\",\n value=\"\",\n advanced=True,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Max Tokens\",\n info=\"Maximum number of tokens to generate. Field name varies by provider.\",\n advanced=True,\n range_spec=RangeSpec(min=1, max=128000, step=1, step_type=\"int\"),\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent.get_base_inputs(),\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n ]\n\n def _get_max_tokens_value(self):\n \"\"\"Return the user-supplied max_tokens or None when unset/zero.\"\"\"\n val = getattr(self, \"max_tokens\", None)\n if val in {\"\", 0}:\n return None\n return val\n\n def _get_llm(self):\n \"\"\"Override parent to include max_tokens from the Agent's input field.\"\"\"\n return get_llm(\n model=self.model,\n user_id=self.user_id,\n api_key=getattr(self, \"api_key\", None),\n max_tokens=self._get_max_tokens_value(),\n watsonx_url=getattr(self, \"base_url_ibm_watsonx\", None),\n watsonx_project_id=getattr(self, \"project_id\", None),\n )\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n from langchain_core.tools import StructuredTool\n\n llm_model = self._get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n await logger.adebug(f\"Retrieved {len(self.chat_history)} chat history messages\")\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n\n # Set shared callbacks for tracing the tools used by the agent\n self.set_tools_callbacks(self.tools, self._get_shared_callbacks())\n\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n await logger.aerror(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n except Exception as e:\n await logger.aerror(f\"Unexpected error: {e!s}\")\n raise\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\n \"true\",\n \"1\",\n \"t\",\n \"y\",\n \"yes\",\n ]\n processed_schema.append(processed_field)\n return processed_schema\n\n async def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n return json_data\n\n # Use BaseModel validation with schema\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n await logger.aerror(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n await logger.aerror(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n await logger.aerror(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n RuntimeError,\n ) as e:\n await logger.aerror(f\"Error with structured agent result: {e}\")\n raise\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n NotImplementedError,\n AttributeError,\n ) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = await self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n await logger.aerror(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(\n session_id=self.graph.session_id,\n context_id=self.context_id,\n order=\"Ascending\",\n n_messages=self.n_messages,\n )\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self,\n build_config: dotdict,\n field_value: list[dict],\n field_name: str | None = None,\n ) -> dotdict:\n # Update model options with caching (for all field changes)\n # Agents require tool calling, so filter for only tool-calling capable models\n def get_tool_calling_model_options(user_id=None):\n return get_language_model_options(user_id=user_id, tool_calling=True)\n\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=dict(build_config),\n cache_key_prefix=\"language_model_options_tool_calling\",\n get_options_func=get_tool_calling_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n build_config = dotdict(build_config)\n\n if field_name == \"model\":\n build_config = self.update_input_types(build_config)\n\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n provider = \"\"\n if isinstance(current_model_value, list) and current_model_value:\n selected_model = current_model_value[0]\n provider = (selected_model.get(\"provider\") or \"\").strip()\n if not provider and selected_model.get(\"name\"):\n provider = get_provider_for_model_name(str(selected_model[\"name\"]))\n\n if provider:\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n if field_name == \"model\":\n default_keys = [\n \"code\",\n \"_type\",\n \"model\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\",\n tool_description=description,\n # here we do not use the shared callbacks as we are exposing the agent as a tool\n callbacks=self.get_langchain_callbacks(),\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n\n return tools\n" |
There was a problem hiding this comment.
The structured-output path is steering the agent toward the wrong payload.
In json_response(), schema_info asks for Output (only JSON schema), so the model is being told to echo the schema instead of returning extracted company data. Then build_structured_output_base() only tries to recover an object-shaped JSON block on fallback, which rejects prose-wrapped arrays when multiple objects are returned. The prompt needs to ask for JSON that conforms to the schema, and the fallback extraction needs to accept array payloads too.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/base/langflow/initial_setup/starter_projects/Market`
Research.json at line 1352, The model prompt currently asks the LLM to "Output
(only JSON schema)" which makes it echo the schema instead of returning data;
update the schema_info text in json_response() (the variable schema_info) to
instruct the agent to return JSON data that conforms to the provided schema
(e.g., "Return only JSON data that conforms to the schema, do NOT return the
schema itself"). Also broaden the JSON extraction in
build_structured_output_base() by changing json_pattern to match both objects
and arrays (e.g., allow \[...\] as well as {...}) and ensure the fallback
parsing logic accepts and preserves top-level arrays so prose-wrapped arrays are
extracted and validated just like single objects (see json_pattern and the JSON
parsing/fallback branches in build_structured_output_base()).
| "legacy": false, | ||
| "metadata": { | ||
| "code_hash": "108da32d83f1", | ||
| "code_hash": "40d1976f4718", |
There was a problem hiding this comment.
Bump last_tested_version with this snapshot refresh.
Line 1176 refreshes the embedded AgentComponent hash, but Line 2185 still says last_tested_version: 1.6.0. Please update that footer to the version this starter was actually validated against so the metadata matches the embedded code.
Also applies to: 2185-2185
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/base/langflow/initial_setup/starter_projects/News`
Aggregator.json at line 1176, The footer field "last_tested_version" in this
JSON is stale (still "1.6.0") after you refreshed the embedded AgentComponent
(see "code_hash": "40d1976f4718"); update the "last_tested_version" property to
the actual LangFlow version used when validating this starter snapshot so the
metadata matches the embedded code (replace the existing "1.6.0" value with the
correct version string).
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from typing import Any\n\nfrom lfx.base.embeddings.embeddings_class import EmbeddingsWithModels\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_class,\n get_embedding_model_options,\n get_unified_models_detailed,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\n\n Returns an EmbeddingsWithModels wrapper that contains:\n - The primary embedding instance (for the selected model)\n - available_models dict mapping all available model names to their instances\n \"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_class(embedding_class_name)\n\n # Build kwargs using parameter mapping for primary instance\n kwargs = self._build_kwargs(model, metadata)\n primary_instance = embedding_class(**kwargs)\n\n # Get all available embedding models for this provider\n available_models_dict = self._build_available_models(\n provider=provider,\n embedding_class=embedding_class,\n metadata=metadata,\n api_key=api_key,\n )\n\n # Wrap with EmbeddingsWithModels to provide available_models metadata\n return EmbeddingsWithModels(\n embeddings=primary_instance,\n available_models=available_models_dict,\n )\n\n def _build_available_models(\n self,\n provider: str,\n embedding_class: type,\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Embeddings]:\n \"\"\"Build a dictionary of all available embedding model instances for the provider.\n\n Args:\n provider: The provider name (e.g., \"OpenAI\", \"Ollama\")\n embedding_class: The embedding class to instantiate\n metadata: Metadata containing param_mapping\n api_key: The API key for the provider\n\n Returns:\n Dict mapping model names to their embedding instances\n \"\"\"\n available_models_dict: dict[str, Embeddings] = {}\n\n # Get all embedding models for this provider from unified models\n all_embedding_models = get_unified_models_detailed(\n providers=[provider],\n model_type=\"embeddings\",\n include_deprecated=False,\n include_unsupported=False,\n )\n\n if not all_embedding_models:\n return available_models_dict\n\n # Extract models from the provider data\n for provider_data in all_embedding_models:\n if provider_data.get(\"provider\") != provider:\n continue\n\n for model_data in provider_data.get(\"models\", []):\n model_name = model_data.get(\"model_name\")\n if not model_name:\n continue\n\n # Create a model dict compatible with _build_kwargs\n model_dict = {\n \"name\": model_name,\n \"provider\": provider,\n \"metadata\": metadata, # Reuse the same metadata/param_mapping\n }\n\n try:\n # Build kwargs for this model\n model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)\n # Create the embedding instance\n available_models_dict[model_name] = embedding_class(**model_kwargs)\n except Exception: # noqa: BLE001\n # Skip models that fail to instantiate\n # This handles cases where specific models have incompatible parameters\n logger.debug(\"Failed to instantiate embedding model %s: skipping\", model_name, exc_info=True)\n continue\n\n return available_models_dict\n\n def _build_kwargs_for_model(\n self,\n model: dict[str, Any],\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary for a specific model using parameter mapping.\n\n This is similar to _build_kwargs but uses the provided api_key directly\n instead of looking it up again.\n\n Args:\n model: Model dict with name and provider\n metadata: Metadata containing param_mapping\n api_key: The API key to use\n\n Returns:\n kwargs dict for embedding class instantiation\n \"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n provider = model.get(\"provider\")\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n\n # Add API key if mapped\n if \"api_key\" in param_mapping and api_key:\n kwargs[param_mapping[\"api_key\"]] = api_key\n\n # Optional parameters with their values\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n" | ||
| "value": "from typing import Any\n\nfrom lfx.base.embeddings.embeddings_class import EmbeddingsWithModels\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_api_key_for_provider,\n get_embedding_class,\n get_embedding_model_options,\n get_provider_for_model_name,\n get_unified_models_detailed,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\nfrom lfx.log.logger import logger\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n provider = \"\"\n if isinstance(current_model_value, list) and current_model_value:\n selected_model = current_model_value[0]\n provider = (selected_model.get(\"provider\") or \"\").strip()\n if not provider and selected_model.get(\"name\"):\n provider = get_provider_for_model_name(str(selected_model[\"name\"]))\n\n if provider:\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n # Embedding-specific WatsonX toggles not covered by provider metadata\n is_watsonx = provider == \"IBM WatsonX\"\n if \"truncate_input_tokens\" in build_config:\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n if \"input_text\" in build_config:\n build_config[\"input_text\"][\"show\"] = is_watsonx\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\n\n Returns an EmbeddingsWithModels wrapper that contains:\n - The primary embedding instance (for the selected model)\n - available_models dict mapping all available model names to their instances\n \"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_class(embedding_class_name)\n\n # Build kwargs using parameter mapping for primary instance\n kwargs = self._build_kwargs(model, metadata)\n primary_instance = embedding_class(**kwargs)\n\n # Get all available embedding models for this provider\n available_models_dict = self._build_available_models(\n provider=provider,\n embedding_class=embedding_class,\n metadata=metadata,\n api_key=api_key,\n )\n\n # Wrap with EmbeddingsWithModels to provide available_models metadata\n return EmbeddingsWithModels(\n embeddings=primary_instance,\n available_models=available_models_dict,\n )\n\n def _build_available_models(\n self,\n provider: str,\n embedding_class: type,\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Embeddings]:\n \"\"\"Build a dictionary of all available embedding model instances for the provider.\n\n Args:\n provider: The provider name (e.g., \"OpenAI\", \"Ollama\")\n embedding_class: The embedding class to instantiate\n metadata: Metadata containing param_mapping\n api_key: The API key for the provider\n\n Returns:\n Dict mapping model names to their embedding instances\n \"\"\"\n available_models_dict: dict[str, Embeddings] = {}\n\n # Get all embedding models for this provider from unified models\n all_embedding_models = get_unified_models_detailed(\n providers=[provider],\n model_type=\"embeddings\",\n include_deprecated=False,\n include_unsupported=False,\n )\n\n if not all_embedding_models:\n return available_models_dict\n\n # Extract models from the provider data\n for provider_data in all_embedding_models:\n if provider_data.get(\"provider\") != provider:\n continue\n\n for model_data in provider_data.get(\"models\", []):\n model_name = model_data.get(\"model_name\")\n if not model_name:\n continue\n\n # Create a model dict compatible with _build_kwargs\n model_dict = {\n \"name\": model_name,\n \"provider\": provider,\n \"metadata\": metadata, # Reuse the same metadata/param_mapping\n }\n\n try:\n # Build kwargs for this model\n model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)\n # Create the embedding instance\n available_models_dict[model_name] = embedding_class(**model_kwargs)\n except Exception: # noqa: BLE001\n # Skip models that fail to instantiate\n # This handles cases where specific models have incompatible parameters\n logger.debug(\"Failed to instantiate embedding model %s: skipping\", model_name, exc_info=True)\n continue\n\n return available_models_dict\n\n def _build_kwargs_for_model(\n self,\n model: dict[str, Any],\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary for a specific model using parameter mapping.\n\n This is similar to _build_kwargs but uses the provided api_key directly\n instead of looking it up again.\n\n Args:\n model: Model dict with name and provider\n metadata: Metadata containing param_mapping\n api_key: The API key to use\n\n Returns:\n kwargs dict for embedding class instantiation\n \"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n provider = model.get(\"provider\")\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n\n # Add API key if mapped\n if \"api_key\" in param_mapping and api_key:\n kwargs[param_mapping[\"api_key\"]] = api_key\n\n # Optional parameters with their values\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n" |
There was a problem hiding this comment.
Regenerate the serialized Embedding Model template.
This code now treats self.model as a ModelInput payload (list[dict]) and hard-fails otherwise, but Lines 1951-1975 still export the node as a DropdownInput with a plain string value ("text-embedding-3-small"). In the JSON as committed, that stale template shape will bypass provider resolution in update_build_config() and then hit ValueError("Model must be a non-empty list") in build_embeddings() unless the node is re-saved/re-exported with the new template.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json
at line 1893, The serialized template still exports the Embedding Model node as
a plain DropdownInput string which conflicts with EmbeddingModelComponent
expecting self.model to be a list; update the template so the node uses a
ModelInput (not DropdownInput) matching EmbeddingModelComponent.inputs
(model_type="embedding", input value as a list[dict] with keys "provider" and
"name" like the runtime ModelInput payload) so update_build_config() can resolve
provider and build_embeddings() will receive a non-empty list; locate the
EmbeddingModelComponent.inputs block and replace the stale
DropdownInput/Dropdown value for the model node with the proper ModelInput/list
payload in the serialized JSON template.
The new watsonx embedding controls are currently no-ops.
truncate_input_tokens and input_text are exposed in inputs and toggled in update_build_config(), but neither _build_kwargs() nor _build_kwargs_for_model() ever reads them. Right now those settings are silently ignored for every embedding instance this component builds.
Please mirror the same addition in both helper methods.
Possible fix
optional_params = {
"api_base": self.api_base if self.api_base else None,
"dimensions": int(self.dimensions) if self.dimensions else None,
"chunk_size": int(self.chunk_size) if self.chunk_size else None,
"request_timeout": float(self.request_timeout) if self.request_timeout else None,
"max_retries": int(self.max_retries) if self.max_retries else None,
"show_progress_bar": self.show_progress_bar if hasattr(self, "show_progress_bar") else None,
"model_kwargs": self.model_kwargs if self.model_kwargs else None,
+ "truncate_input_tokens": (
+ int(self.truncate_input_tokens) if getattr(self, "truncate_input_tokens", None) else None
+ ),
+ "input_text": getattr(self, "input_text", None),
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json
at line 1893, The watsonx UI toggles truncate_input_tokens and input_text are
never added to the instantiated embedding kwargs, so update both _build_kwargs
and _build_kwargs_for_model to read self.truncate_input_tokens and
self.input_text (with hasattr checks) and, when those values are present, add
them to the returned kwargs using the param names defined in
metadata["param_mapping"] (i.e., if param_mapping maps "truncate_input_tokens"
or "input_text" to a target kwarg name, set
kwargs[param_mapping["truncate_input_tokens"]] = int(self.truncate_input_tokens)
and kwargs[param_mapping["input_text"]] = bool(self.input_text)); keep this
logic alongside the existing Watson-specific block so update_build_config,
_build_kwargs, and _build_kwargs_for_model remain consistent.
| "legacy": false, | ||
| "metadata": { | ||
| "code_hash": "108da32d83f1", | ||
| "code_hash": "40d1976f4718", |
There was a problem hiding this comment.
Bump last_tested_version with this snapshot refresh.
Line 1241 updates the embedded AgentComponent hash, but Line 1796 still advertises last_tested_version: 1.4.3. Please update the footer to the version this starter was actually validated against so the starter metadata does not drift from the embedded code.
Also applies to: 1796-1796
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/base/langflow/initial_setup/starter_projects/Pokédex` Agent.json
at line 1241, The JSON footer's last_tested_version value is stale after
refreshing the embedded AgentComponent snapshot (code_hash "40d1976f4718");
update the "last_tested_version" field in this starter's metadata to the correct
version string used when validating this snapshot (i.e., bump the footer value
to match the snapshot/validation version) so the starter metadata and embedded
code_hash stay consistent, and verify there are no other duplicate
last_tested_version entries remaining.
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from __future__ import annotations\n\nimport json\nimport re\nfrom typing import TYPE_CHECKING\n\nfrom pydantic import ValidationError\n\nfrom lfx.components.models_and_agents.memory import MemoryComponent\n\nif TYPE_CHECKING:\n from langchain_core.tools import Tool\n\nfrom lfx.base.agents.agent import LCToolsAgentComponent\nfrom lfx.base.agents.events import ExceptionWithMessageError\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_language_model_options,\n get_llm,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.components.helpers import CurrentDateComponent\nfrom lfx.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom lfx.custom.custom_component.component import get_component_toolkit\nfrom lfx.field_typing.range_spec import RangeSpec\nfrom lfx.helpers.base_model import build_model_from_schema\nfrom lfx.inputs.inputs import BoolInput, DropdownInput, ModelInput, StrInput\nfrom lfx.io import IntInput, MessageTextInput, MultilineInput, Output, SecretStrInput, TableInput\nfrom lfx.log.logger import logger\nfrom lfx.schema.data import Data\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.schema.message import Message\nfrom lfx.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Language Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n StrInput(\n name=\"project_id\",\n display_name=\"watsonx Project ID\",\n info=\"The project ID associated with the foundation model (IBM watsonx.ai only)\",\n show=False,\n required=False,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n MessageTextInput(\n name=\"context_id\",\n display_name=\"Context ID\",\n info=\"The context ID of the chat. Adds an extra layer to the local memory.\",\n value=\"\",\n advanced=True,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Max Tokens\",\n info=\"Maximum number of tokens to generate. Field name varies by provider.\",\n advanced=True,\n range_spec=RangeSpec(min=1, max=128000, step=1, step_type=\"int\"),\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent.get_base_inputs(),\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n from langchain_core.tools import StructuredTool\n\n max_tokens_val = getattr(self, \"max_tokens\", None)\n if max_tokens_val in {\"\", 0}:\n max_tokens_val = None\n llm_model = get_llm(\n model=self.model,\n user_id=self.user_id,\n api_key=self.api_key,\n max_tokens=max_tokens_val,\n watsonx_url=getattr(self, \"base_url_ibm_watsonx\", None),\n watsonx_project_id=getattr(self, \"project_id\", None),\n )\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n await logger.adebug(f\"Retrieved {len(self.chat_history)} chat history messages\")\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n\n # Set shared callbacks for tracing the tools used by the agent\n self.set_tools_callbacks(self.tools, self._get_shared_callbacks())\n\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n await logger.aerror(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n except Exception as e:\n await logger.aerror(f\"Unexpected error: {e!s}\")\n raise\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\n \"true\",\n \"1\",\n \"t\",\n \"y\",\n \"yes\",\n ]\n processed_schema.append(processed_field)\n return processed_schema\n\n async def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n return json_data\n\n # Use BaseModel validation with schema\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n await logger.aerror(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n await logger.aerror(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n await logger.aerror(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n RuntimeError,\n ) as e:\n await logger.aerror(f\"Error with structured agent result: {e}\")\n raise\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n NotImplementedError,\n AttributeError,\n ) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = await self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n await logger.aerror(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(\n session_id=self.graph.session_id,\n context_id=self.context_id,\n order=\"Ascending\",\n n_messages=self.n_messages,\n )\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self,\n build_config: dotdict,\n field_value: list[dict],\n field_name: str | None = None,\n ) -> dotdict:\n # Update model options with caching (for all field changes)\n # Agents require tool calling, so filter for only tool-calling capable models\n def get_tool_calling_model_options(user_id=None):\n return get_language_model_options(user_id=user_id, tool_calling=True)\n\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=dict(build_config),\n cache_key_prefix=\"language_model_options_tool_calling\",\n get_options_func=get_tool_calling_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n build_config = dotdict(build_config)\n\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n if field_name == \"model\":\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Show/hide provider-specific fields based on selected model\n # Get current model value - from field_value if model is being changed, otherwise from build_config\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n if isinstance(current_model_value, list) and len(current_model_value) > 0:\n selected_model = current_model_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Hide provider-specific fields by default before applying provider config\n for field in [\"base_url_ibm_watsonx\", \"project_id\"]:\n if field in build_config:\n build_config[field][\"show\"] = False\n build_config[field][\"required\"] = False\n\n # Apply provider variable configuration (advanced, required, info, env var fallback)\n if provider:\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"model\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\",\n tool_description=description,\n # here we do not use the shared callbacks as we are exposing the agent as a tool\n callbacks=self.get_langchain_callbacks(),\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n\n return tools\n" | ||
| "value": "from __future__ import annotations\n\nimport json\nimport re\nfrom typing import TYPE_CHECKING\n\nfrom pydantic import ValidationError\n\nfrom lfx.components.models_and_agents.memory import MemoryComponent\n\nif TYPE_CHECKING:\n from langchain_core.tools import Tool\n\nfrom lfx.base.agents.agent import LCToolsAgentComponent\nfrom lfx.base.agents.events import ExceptionWithMessageError\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_language_model_options,\n get_llm,\n get_provider_for_model_name,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.components.helpers import CurrentDateComponent\nfrom lfx.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom lfx.custom.custom_component.component import get_component_toolkit\nfrom lfx.field_typing.range_spec import RangeSpec\nfrom lfx.helpers.base_model import build_model_from_schema\nfrom lfx.inputs.inputs import BoolInput, DropdownInput, ModelInput, StrInput\nfrom lfx.io import IntInput, MessageTextInput, MultilineInput, Output, SecretStrInput, TableInput\nfrom lfx.log.logger import logger\nfrom lfx.schema.data import Data\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.schema.message import Message\nfrom lfx.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Language Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n StrInput(\n name=\"project_id\",\n display_name=\"watsonx Project ID\",\n info=\"The project ID associated with the foundation model (IBM watsonx.ai only)\",\n show=False,\n required=False,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n MessageTextInput(\n name=\"context_id\",\n display_name=\"Context ID\",\n info=\"The context ID of the chat. Adds an extra layer to the local memory.\",\n value=\"\",\n advanced=True,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Max Tokens\",\n info=\"Maximum number of tokens to generate. Field name varies by provider.\",\n advanced=True,\n range_spec=RangeSpec(min=1, max=128000, step=1, step_type=\"int\"),\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent.get_base_inputs(),\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n ]\n\n def _get_max_tokens_value(self):\n \"\"\"Return the user-supplied max_tokens or None when unset/zero.\"\"\"\n val = getattr(self, \"max_tokens\", None)\n if val in {\"\", 0}:\n return None\n return val\n\n def _get_llm(self):\n \"\"\"Override parent to include max_tokens from the Agent's input field.\"\"\"\n return get_llm(\n model=self.model,\n user_id=self.user_id,\n api_key=getattr(self, \"api_key\", None),\n max_tokens=self._get_max_tokens_value(),\n watsonx_url=getattr(self, \"base_url_ibm_watsonx\", None),\n watsonx_project_id=getattr(self, \"project_id\", None),\n )\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n from langchain_core.tools import StructuredTool\n\n llm_model = self._get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n await logger.adebug(f\"Retrieved {len(self.chat_history)} chat history messages\")\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n\n # Set shared callbacks for tracing the tools used by the agent\n self.set_tools_callbacks(self.tools, self._get_shared_callbacks())\n\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n await logger.aerror(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n except Exception as e:\n await logger.aerror(f\"Unexpected error: {e!s}\")\n raise\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\n \"true\",\n \"1\",\n \"t\",\n \"y\",\n \"yes\",\n ]\n processed_schema.append(processed_field)\n return processed_schema\n\n async def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n return json_data\n\n # Use BaseModel validation with schema\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n await logger.aerror(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n await logger.aerror(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n await logger.aerror(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n RuntimeError,\n ) as e:\n await logger.aerror(f\"Error with structured agent result: {e}\")\n raise\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n NotImplementedError,\n AttributeError,\n ) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = await self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n await logger.aerror(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(\n session_id=self.graph.session_id,\n context_id=self.context_id,\n order=\"Ascending\",\n n_messages=self.n_messages,\n )\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self,\n build_config: dotdict,\n field_value: list[dict],\n field_name: str | None = None,\n ) -> dotdict:\n # Update model options with caching (for all field changes)\n # Agents require tool calling, so filter for only tool-calling capable models\n def get_tool_calling_model_options(user_id=None):\n return get_language_model_options(user_id=user_id, tool_calling=True)\n\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=dict(build_config),\n cache_key_prefix=\"language_model_options_tool_calling\",\n get_options_func=get_tool_calling_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n build_config = dotdict(build_config)\n\n if field_name == \"model\":\n build_config = self.update_input_types(build_config)\n\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n provider = \"\"\n if isinstance(current_model_value, list) and current_model_value:\n selected_model = current_model_value[0]\n provider = (selected_model.get(\"provider\") or \"\").strip()\n if not provider and selected_model.get(\"name\"):\n provider = get_provider_for_model_name(str(selected_model[\"name\"]))\n\n if provider:\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n if field_name == \"model\":\n default_keys = [\n \"code\",\n \"_type\",\n \"model\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\",\n tool_description=description,\n # here we do not use the shared callbacks as we are exposing the agent as a tool\n callbacks=self.get_langchain_callbacks(),\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n\n return tools\n" |
There was a problem hiding this comment.
Structured-response mode will reject valid schema-based outputs.
Line 1768 embeds an AgentComponent whose json_response() prompt tells the model to return only the JSON schema, while build_structured_output_base() can only recover {...} fragments if the model wraps JSON in prose. With output_schema enabled, that makes common valid responses—especially arrays like [{"name": "..."}, {"name": "..."}]—fail validation even when the model returned usable JSON.
💡 Suggested fix inside the embedded `AgentComponent`
- json_pattern = r"\{.*\}"
schema_error_msg = "Try setting an output schema"
@@
- schema_info = (
- "You are given some text that may include format instructions, "
- "explanations, or other content alongside a JSON schema.\n\n"
- "Your task:\n"
- "- Extract only the JSON schema.\n"
- "- Return it as valid JSON.\n"
- "- Do not include format instructions, explanations, or extra text.\n\n"
- "Input:\n"
- f"{json.dumps(schema_dict, indent=2)}\n\n"
- "Output (only JSON schema):"
- )
+ schema_info = (
+ "Return only valid JSON that matches this schema.\n"
+ "Do not include explanations, markdown fences, or extra text.\n\n"
+ f"Schema:\n{json.dumps(schema_dict, indent=2)}"
+ )
@@
try:
json_data = json.loads(content)
except json.JSONDecodeError:
- json_match = re.search(json_pattern, content, re.DOTALL)
- if json_match:
- try:
- json_data = json.loads(json_match.group())
- except json.JSONDecodeError:
- return {"content": content, "error": schema_error_msg}
- else:
+ decoder = json.JSONDecoder()
+ for start_char in ("[", "{"):
+ start = content.find(start_char)
+ if start == -1:
+ continue
+ try:
+ json_data, _ = decoder.raw_decode(content[start:])
+ break
+ except json.JSONDecodeError:
+ continue
+ if json_data is None:
return {"content": content, "error": schema_error_msg}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/base/langflow/initial_setup/starter_projects/Price` Deal
Finder.json at line 1768, Structured-response failures come from
build_structured_output_base() only searching for object braces "{...}" so valid
JSON arrays are missed and json_response() instructs the model to "Return it as
valid JSON" in a way that can force a schema-only reply; fix by updating
build_structured_output_base (json_pattern and parsing logic) to detect and
extract both JSON objects and arrays (e.g., try full-string json.loads first,
then regex for r"\{.*\}" OR r"\[.*\]" and attempt to json.loads the match), and
also relax the schema prompt built in json_response() (the schema_info variable)
to ask the model to return the JSON schema OR a JSON instance but not to return
only the schema text so downstream parsing/validation can accept arrays like
[{"name":...}, ...]; target functions/vars: build_structured_output_base,
json_pattern, json_response, schema_info, combined_instructions.
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from __future__ import annotations\n\nimport json\nimport re\nfrom typing import TYPE_CHECKING\n\nfrom pydantic import ValidationError\n\nfrom lfx.components.models_and_agents.memory import MemoryComponent\n\nif TYPE_CHECKING:\n from langchain_core.tools import Tool\n\nfrom lfx.base.agents.agent import LCToolsAgentComponent\nfrom lfx.base.agents.events import ExceptionWithMessageError\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_language_model_options,\n get_llm,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.components.helpers import CurrentDateComponent\nfrom lfx.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom lfx.custom.custom_component.component import get_component_toolkit\nfrom lfx.field_typing.range_spec import RangeSpec\nfrom lfx.helpers.base_model import build_model_from_schema\nfrom lfx.inputs.inputs import BoolInput, DropdownInput, ModelInput, StrInput\nfrom lfx.io import IntInput, MessageTextInput, MultilineInput, Output, SecretStrInput, TableInput\nfrom lfx.log.logger import logger\nfrom lfx.schema.data import Data\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.schema.message import Message\nfrom lfx.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Language Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n StrInput(\n name=\"project_id\",\n display_name=\"watsonx Project ID\",\n info=\"The project ID associated with the foundation model (IBM watsonx.ai only)\",\n show=False,\n required=False,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n MessageTextInput(\n name=\"context_id\",\n display_name=\"Context ID\",\n info=\"The context ID of the chat. Adds an extra layer to the local memory.\",\n value=\"\",\n advanced=True,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Max Tokens\",\n info=\"Maximum number of tokens to generate. Field name varies by provider.\",\n advanced=True,\n range_spec=RangeSpec(min=1, max=128000, step=1, step_type=\"int\"),\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent.get_base_inputs(),\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n from langchain_core.tools import StructuredTool\n\n max_tokens_val = getattr(self, \"max_tokens\", None)\n if max_tokens_val in {\"\", 0}:\n max_tokens_val = None\n llm_model = get_llm(\n model=self.model,\n user_id=self.user_id,\n api_key=self.api_key,\n max_tokens=max_tokens_val,\n watsonx_url=getattr(self, \"base_url_ibm_watsonx\", None),\n watsonx_project_id=getattr(self, \"project_id\", None),\n )\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n await logger.adebug(f\"Retrieved {len(self.chat_history)} chat history messages\")\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n\n # Set shared callbacks for tracing the tools used by the agent\n self.set_tools_callbacks(self.tools, self._get_shared_callbacks())\n\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n await logger.aerror(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n except Exception as e:\n await logger.aerror(f\"Unexpected error: {e!s}\")\n raise\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\n \"true\",\n \"1\",\n \"t\",\n \"y\",\n \"yes\",\n ]\n processed_schema.append(processed_field)\n return processed_schema\n\n async def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n return json_data\n\n # Use BaseModel validation with schema\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n await logger.aerror(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n await logger.aerror(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n await logger.aerror(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n RuntimeError,\n ) as e:\n await logger.aerror(f\"Error with structured agent result: {e}\")\n raise\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n NotImplementedError,\n AttributeError,\n ) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = await self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n await logger.aerror(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(\n session_id=self.graph.session_id,\n context_id=self.context_id,\n order=\"Ascending\",\n n_messages=self.n_messages,\n )\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self,\n build_config: dotdict,\n field_value: list[dict],\n field_name: str | None = None,\n ) -> dotdict:\n # Update model options with caching (for all field changes)\n # Agents require tool calling, so filter for only tool-calling capable models\n def get_tool_calling_model_options(user_id=None):\n return get_language_model_options(user_id=user_id, tool_calling=True)\n\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=dict(build_config),\n cache_key_prefix=\"language_model_options_tool_calling\",\n get_options_func=get_tool_calling_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n build_config = dotdict(build_config)\n\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n if field_name == \"model\":\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Show/hide provider-specific fields based on selected model\n # Get current model value - from field_value if model is being changed, otherwise from build_config\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n if isinstance(current_model_value, list) and len(current_model_value) > 0:\n selected_model = current_model_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Hide provider-specific fields by default before applying provider config\n for field in [\"base_url_ibm_watsonx\", \"project_id\"]:\n if field in build_config:\n build_config[field][\"show\"] = False\n build_config[field][\"required\"] = False\n\n # Apply provider variable configuration (advanced, required, info, env var fallback)\n if provider:\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"model\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\",\n tool_description=description,\n # here we do not use the shared callbacks as we are exposing the agent as a tool\n callbacks=self.get_langchain_callbacks(),\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n\n return tools\n" | ||
| "value": "from __future__ import annotations\n\nimport json\nimport re\nfrom typing import TYPE_CHECKING\n\nfrom pydantic import ValidationError\n\nfrom lfx.components.models_and_agents.memory import MemoryComponent\n\nif TYPE_CHECKING:\n from langchain_core.tools import Tool\n\nfrom lfx.base.agents.agent import LCToolsAgentComponent\nfrom lfx.base.agents.events import ExceptionWithMessageError\nfrom lfx.base.models.unified_models import (\n apply_provider_variable_config_to_build_config,\n get_language_model_options,\n get_llm,\n get_provider_for_model_name,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.components.helpers import CurrentDateComponent\nfrom lfx.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom lfx.custom.custom_component.component import get_component_toolkit\nfrom lfx.field_typing.range_spec import RangeSpec\nfrom lfx.helpers.base_model import build_model_from_schema\nfrom lfx.inputs.inputs import BoolInput, DropdownInput, ModelInput, StrInput\nfrom lfx.io import IntInput, MessageTextInput, MultilineInput, Output, SecretStrInput, TableInput\nfrom lfx.log.logger import logger\nfrom lfx.schema.data import Data\nfrom lfx.schema.dotdict import dotdict\nfrom lfx.schema.message import Message\nfrom lfx.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Language Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n StrInput(\n name=\"project_id\",\n display_name=\"watsonx Project ID\",\n info=\"The project ID associated with the foundation model (IBM watsonx.ai only)\",\n show=False,\n required=False,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n MessageTextInput(\n name=\"context_id\",\n display_name=\"Context ID\",\n info=\"The context ID of the chat. Adds an extra layer to the local memory.\",\n value=\"\",\n advanced=True,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Max Tokens\",\n info=\"Maximum number of tokens to generate. Field name varies by provider.\",\n advanced=True,\n range_spec=RangeSpec(min=1, max=128000, step=1, step_type=\"int\"),\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent.get_base_inputs(),\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n ]\n\n def _get_max_tokens_value(self):\n \"\"\"Return the user-supplied max_tokens or None when unset/zero.\"\"\"\n val = getattr(self, \"max_tokens\", None)\n if val in {\"\", 0}:\n return None\n return val\n\n def _get_llm(self):\n \"\"\"Override parent to include max_tokens from the Agent's input field.\"\"\"\n return get_llm(\n model=self.model,\n user_id=self.user_id,\n api_key=getattr(self, \"api_key\", None),\n max_tokens=self._get_max_tokens_value(),\n watsonx_url=getattr(self, \"base_url_ibm_watsonx\", None),\n watsonx_project_id=getattr(self, \"project_id\", None),\n )\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n from langchain_core.tools import StructuredTool\n\n llm_model = self._get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n await logger.adebug(f\"Retrieved {len(self.chat_history)} chat history messages\")\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n\n # Set shared callbacks for tracing the tools used by the agent\n self.set_tools_callbacks(self.tools, self._get_shared_callbacks())\n\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n await logger.aerror(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n except Exception as e:\n await logger.aerror(f\"Unexpected error: {e!s}\")\n raise\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\n \"true\",\n \"1\",\n \"t\",\n \"y\",\n \"yes\",\n ]\n processed_schema.append(processed_field)\n return processed_schema\n\n async def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n return json_data\n\n # Use BaseModel validation with schema\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n await logger.aerror(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n await logger.aerror(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n await logger.aerror(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n await logger.aerror(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n RuntimeError,\n ) as e:\n await logger.aerror(f\"Error with structured agent result: {e}\")\n raise\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (\n ExceptionWithMessageError,\n ValueError,\n TypeError,\n NotImplementedError,\n AttributeError,\n ) as e:\n await logger.aerror(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = await self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n await logger.aerror(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(\n session_id=self.graph.session_id,\n context_id=self.context_id,\n order=\"Ascending\",\n n_messages=self.n_messages,\n )\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self,\n build_config: dotdict,\n field_value: list[dict],\n field_name: str | None = None,\n ) -> dotdict:\n # Update model options with caching (for all field changes)\n # Agents require tool calling, so filter for only tool-calling capable models\n def get_tool_calling_model_options(user_id=None):\n return get_language_model_options(user_id=user_id, tool_calling=True)\n\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=dict(build_config),\n cache_key_prefix=\"language_model_options_tool_calling\",\n get_options_func=get_tool_calling_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n build_config = dotdict(build_config)\n\n if field_name == \"model\":\n build_config = self.update_input_types(build_config)\n\n current_model_value = field_value if field_name == \"model\" else build_config.get(\"model\", {}).get(\"value\")\n provider = \"\"\n if isinstance(current_model_value, list) and current_model_value:\n selected_model = current_model_value[0]\n provider = (selected_model.get(\"provider\") or \"\").strip()\n if not provider and selected_model.get(\"name\"):\n provider = get_provider_for_model_name(str(selected_model[\"name\"]))\n\n if provider:\n build_config = apply_provider_variable_config_to_build_config(build_config, provider)\n\n if field_name == \"model\":\n default_keys = [\n \"code\",\n \"_type\",\n \"model\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\",\n tool_description=description,\n # here we do not use the shared callbacks as we are exposing the agent as a tool\n callbacks=self.get_langchain_callbacks(),\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n\n return tools\n" |
There was a problem hiding this comment.
Mirror the structured-output fix in this Agent snapshot too.
Line 1100 embeds the same json_response()/build_structured_output_base() pair as the other starter: the prompt steers the model toward echoing the schema, and the fallback parser only accepts object fragments. output_schema mode will break here for the same valid array and wrapped-JSON responses unless the fix is applied in this copy as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/base/langflow/initial_setup/starter_projects/Simple` Agent.json
at line 1100, The json parsing in build_structured_output_base and the
json_response flow is too narrow (only matches object fragments) causing valid
JSON arrays or wrapped JSON to be rejected; update build_structured_output_base
(and any related regex/logic used by json_response) to accept both object and
array JSON fragments by changing the json_pattern to match either {…} or […],
try parsing the full content first, then search for either array or object
fragments using re.DOTALL (e.g., r"(\{.*\}|\[.*\])"), and when a fragment is
found json.loads() it and proceed; ensure the function still returns parsed
lists unchanged (and downstream json_response handles list results correctly)
and keep existing validation via _preprocess_schema/build_model_from_schema and
error fallbacks intact (targets: functions build_structured_output_base,
json_response, and the json_pattern variable).
| // Verify the AI does NOT remember "Victor" from the deleted session | ||
| // The response should indicate it doesn't know the name | ||
| expect(responseText?.toLowerCase()).not.toContain("victor"); | ||
| }, |
There was a problem hiding this comment.
Assertion may have edge-case false positives.
The current assertion checks that the response doesn't contain "victor". However, the LLM might respond with phrases like "I don't know anyone named Victor" or "You haven't told me your name is Victor", which would cause a false test failure.
Consider a more robust assertion approach, though this may be acceptable given the low likelihood and the test's intent.
💡 Alternative assertion approach
One option is to verify the response contains phrases indicating lack of knowledge:
// Verify the AI indicates it doesn't know the name
const indicatesNoKnowledge =
responseText?.toLowerCase().includes("don't know") ||
responseText?.toLowerCase().includes("haven't told") ||
responseText?.toLowerCase().includes("not sure");
expect(indicatesNoKnowledge || !responseText?.toLowerCase().includes("victor")).toBe(true);However, this adds complexity. The current approach may be sufficient if the LLM reliably doesn't echo back the name when it has no context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/frontend/tests/core/regression/session-deletion-data-leakage.spec.ts`
around lines 192 - 195, The current assertion using
expect(responseText?.toLowerCase()).not.toContain("victor") can false-fail when
the LLM mentions "Victor" while explicitly denying knowledge; update the
assertion to allow either a clear denial phrase or absence of the name: create a
boolean like indicatesNoKnowledge that checks
responseText?.toLowerCase().includes("don't know") || .includes("haven't told")
|| .includes("not sure"), then assert expect(indicatesNoKnowledge ||
!responseText?.toLowerCase().includes("victor")).toBe(true); reference the
existing responseText variable in the test to implement this replacement.
| "pydantic>=2.0.0,<3.0.0", | ||
| "pillow>=10.0.0,<13.0.0", | ||
| "fastapi>=0.115.13,<1.0.0", | ||
| "fastapi>=0.135.0,<1.0.0", |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the latest version of FastAPI and what breaking changes were introduced after 0.115?
💡 Result:
Latest FastAPI version
FastAPI 0.135.1 (released March 1, 2026) is the latest version on PyPI. (pypi.org)
Breaking changes introduced after FastAPI 0.115.x
From FastAPI’s official release notes, these versions introduced “Breaking Changes” sections after 0.115:
- 0.125.0 (2025-12-17): Dropped support for Python 3.8. (fastapi.tiangolo.com)
- 0.127.0 (2025-12-21): Added deprecation warnings when using
pydantic.v1. (fastapi.tiangolo.com) - 0.128.0 (2025-12-27): Dropped support for
pydantic.v1. (fastapi.tiangolo.com) - 0.129.0 (2026-02-12): Dropped support for Python 3.9. (fastapi.tiangolo.com)
- 0.131.0 (2026-02-22): Deprecated
ORJSONResponseandUJSONResponse. (fastapi.tiangolo.com) - 0.132.0 (2026-02-23): Enabled default strict
Content-Typechecking for JSON requests (strict_content_type=Trueby default), rejecting JSON requests without a valid JSONContent-Typeunless disabled. (fastapi.tiangolo.com)
Citations:
- 1: https://pypi.org/project/fastapi/
- 2: https://fastapi.tiangolo.com/release-notes/
- 3: https://fastapi.tiangolo.com/release-notes/
- 4: https://fastapi.tiangolo.com/release-notes/
- 5: https://fastapi.tiangolo.com/release-notes/
- 6: https://fastapi.tiangolo.com/release-notes/
- 7: https://fastapi.tiangolo.com/release-notes/
Verify FastAPI version compatibility and breaking changes.
The FastAPI minimum version has been bumped significantly from 0.135.0 to 0.135.0, introducing several breaking changes that require verification:
- Python 3.8 and 3.9 support dropped (in v0.125.0 and v0.129.0 respectively)
- Pydantic v1 support removed (in v0.128.0) — codebase must use Pydantic v2
- Strict Content-Type checking enabled by default (in v0.132.0) — JSON requests without valid
Content-Typeheaders will be rejected unless explicitly disabled ORJSONResponseandUJSONResponsedeprecated (in v0.131.0)
Ensure the codebase and its Python version requirement meet these constraints and handle strict Content-Type validation appropriately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lfx/pyproject.toml` at line 15, The pyproject FastAPI dependency bump may
break compatibility: update pyproject.toml (the "fastapi>=0.135.0,<1.0.0" entry)
and the project metadata to require a Python version that FastAPI v0.135+
supports (remove/raise support for 3.8/3.9 via python_requires), verify and
migrate all Pydantic usage to Pydantic v2 APIs, replace deprecated
ORJSONResponse/UJSONResponse usages (search for ORJSONResponse and UJSONResponse
classes) with supported response classes or custom JSONResponse, and fix strict
Content-Type handling by adding explicit content-type checks or configuring
request handling middleware/route decorators to accept requests without
Content-Type (search handlers/middleware that parse JSON bodies); run the test
suite and update dependency specs to reflect Pydantic v2 and the new Python
minimum.
merge branch release-1.8.1 into main
Summary by CodeRabbit
Release Notes v1.8.1
Bug Fixes
Chores