fix: Knowledge base component refactor by erichare · Pull Request #9543 · langflow-ai/langflow

erichare · 2025-08-26T16:21:00Z

This pull request refactors the knowledge base ingestion and retrieval components in Langflow, moving them out of the data module into a dedicated knowledge_bases module, and introduces dynamic/lazy importing for these components. It also updates the starter project configuration and improves input/output handling and error management for knowledge base operations.

Component Refactoring and Module Organization:

The KBIngestionComponent and KBRetrievalComponent have been renamed to KnowledgeIngestionComponent and KnowledgeRetrievalComponent, respectively, and moved from langflow.components.data to a new langflow.components.knowledge_bases module for better code organization. [1] [2]
Knowledge base utility imports have been updated to use the new module path, reflecting the refactor. [1] [2]

Dynamic Import and API Improvements:

The new knowledge_bases/__init__.py uses Python’s __getattr__ and __dir__ to lazily import components only when accessed, improving startup time and modularity.

Input/Output Handling Enhancements:

The ingestion component now uses HandleInput for the input dataframe, supports both Data and DataFrame types (including lists), and automatically converts Data objects to dataframes, making it more flexible for various input sources. [1] [2]
The retrieval component adds an advanced option to include embeddings in the output and refines metadata inclusion logic for more granular control over returned data. Output method names have been clarified. [1] [2] [3] [4]

Error Handling Improvements:

Error management in the ingestion component now raises exceptions instead of returning error data objects, ensuring failures are handled consistently and transparently.

Starter Project Updates:

The starter project JSON for Knowledge Ingestion has been updated to use the new component names, handle IDs, and input types, reflecting the refactored API and improved input flexibility. [1] [2] [3] [4] [5] [6] [7] [8]

Summary by CodeRabbit

New Features
- Added “Knowledge Bases” category to the sidebar.
- Knowledge Retrieval now supports optional embeddings and improved query tool mode.
Refactor
- Renamed components: KBIngestion → KnowledgeIngestion (icon: upload), KBRetrieval → KnowledgeRetrieval (icon: download).
- Knowledge Ingestion accepts Data, DataFrame, and lists; supports real-time refresh when selecting knowledge bases.
- Consolidated Knowledge Base components under a dedicated package for cleaner access.
Chores
- Updated starter projects to use the new Knowledge Ingestion and Retrieval components.

coderabbitai · 2025-08-26T16:21:09Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Removes KBIngestion/KBRetrieval re-exports from components.data; introduces knowledge_bases package with lazy-loaded exports. Renames and relocates ingestion/retrieval components, updates their inputs, methods, and UI metadata; adjusts embedding/metadata retrieval behavior. Updates starter project JSONs, tests, and a frontend sidebar category.

Changes

Cohort / File(s)	Summary
Data package exports cleanup `src/backend/base/langflow/components/data/__init__.py`	Drops re-exports/imports of KBIngestionComponent and KBRetrievalComponent from data package. all pruned accordingly.
Knowledge bases package API (lazy loading) `src/backend/base/langflow/components/knowledge_bases/__init__.py`	Adds lazy-loading public API for KnowledgeIngestionComponent and KnowledgeRetrievalComponent via getattr, all, dir, and TYPE_CHECKING stubs.
Knowledge ingestion component refactor `src/backend/base/langflow/components/knowledge_bases/ingestion.py`	Renames KBIngestionComponent → KnowledgeIngestionComponent; icon/name updated. Input changes: DataFrameInput → HandleInput (supports Data/DataFrame, lists), real_time_refresh on KB input. Logic accepts Data/List[Data], converts to DataFrame; on errors now raises RuntimeError.
Knowledge retrieval component refactor `src/backend/base/langflow/components/knowledge_bases/retrieval.py`	Renames KBRetrievalComponent → KnowledgeRetrievalComponent; icon/name updated. Method get_chroma_kb_data → retrieve_data; output port renamed. Adds include_embeddings input; search_query tool_mode enabled; embeddings fetched conditionally and attached when requested; import path for utilities updated.
Starter projects updates (ingestion) `src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json`	Node/type/module renamed to KnowledgeIngestion; inputs schema updated (HandleInput, KB dialog real_time_refresh, table schema changes); IDs/edges regenerated; utility import paths updated.
Starter projects updates (retrieval) `src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json`	Node/type/module renamed to KnowledgeRetrieval; output/method renamed to retrieve_data; adds include_embeddings; rewires edges and updates metadata/paths.
Tests: kb utils import move `src/backend/tests/unit/base/data/test_kb_utils.py`	Updates imports from base.data.kb_utils → base.knowledge_bases.knowledge_base_utils; test logic unchanged.
Tests: ingestion component `src/backend/tests/unit/components/knowledge_bases/test_ingestion.py`	Updates to KnowledgeIngestionComponent and new module/utility paths; patches/mocks redirected; test class renamed.
Tests: retrieval component `src/backend/tests/unit/components/knowledge_bases/test_retrieval.py`	Updates to KnowledgeRetrievalComponent and new module/utility paths; method calls renamed to retrieve_data; patches adjusted; test class/methods renamed.
Frontend sidebar category `src/frontend/src/utils/styleUtils.ts`	Adds “Knowledge Bases” category to SIDEBAR_CATEGORIES.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Importer
  participant KBPkg as components.knowledge_bases.__init__
  participant Mod as ingestion/retrieval submodules

  Dev->>KBPkg: from ...knowledge_bases import KnowledgeIngestionComponent
  alt attribute not yet loaded
    KBPkg->>KBPkg: __getattr__("KnowledgeIngestionComponent")
    KBPkg->>Mod: import_mod("KnowledgeIngestionComponent", "ingestion", parent)
    Mod-->>KBPkg: class KnowledgeIngestionComponent
    KBPkg->>KBPkg: cache in globals
  end
  KBPkg-->>Dev: KnowledgeIngestionComponent

sequenceDiagram
  autonumber
  participant UI as Flow/UI
  participant Ingest as KnowledgeIngestionComponent
  participant Utils as knowledge_base_utils
  participant Store as Vector Store

  UI->>Ingest: run(input_df=Data|DataFrame|List[Data], kb, config)
  Ingest->>Ingest: normalize input_df to DataFrame
  Ingest->>Utils: get_knowledge_bases(), settings, user
  Ingest->>Store: ingest DataFrame into KB collection
  Note over Ingest: On error: raise RuntimeError
  Store-->>Ingest: ack
  Ingest-->>UI: status/result DataFrame

sequenceDiagram
  autonumber
  participant UI as Flow/UI
  participant Retr as KnowledgeRetrievalComponent
  participant Utils as knowledge_base_utils
  participant VS as Vector Store

  UI->>Retr: retrieve_data(kb, search_query, include_metadata, include_embeddings)
  Retr->>Utils: load KB config, embeddings provider
  Retr->>VS: query (similarity_search[_with_score])
  alt include_metadata or include_embeddings
    Retr->>VS: collection.get(ids, include metadatas[, embeddings])
    VS-->>Retr: metadatas[, embeddings]
  end
  Retr->>Retr: build rows: content + optional metadata + optional _embeddings
  Retr-->>UI: DataFrame "Results"

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: Make knowledge bases user-stored and support global vars #9458: Touches the same ingestion/retrieval components with renames/moves and API changes in knowledge_bases.
feat: Add support for Ingestion and Retrieval of Knowledge Bases #9088: Earlier work placing KBIngestion/KBRetrieval under components.data; this PR supersedes by relocating to components.knowledge_bases.
ref: URL and File components with Dataframe output #8117: Migrates kb_utils imports to knowledge_bases.knowledge_base_utils, matching this PR’s utility path changes.

Suggested labels

refactor, size:XL, lgtm

Suggested reviewers

rodrigosnader
edwinjosechittilappilly

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-kb-adjustments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

src/backend/base/langflow/components/knowledge_bases/ingestion.py (4)

526-553: Potential UnboundLocalError: embedding_model/api_key used before assignment.

If embedding_metadata.json does not exist (or lacks keys), embedding_model and/or api_key are undefined when passed to _create_vector_store. This violates the PR objective of “raise on failures” and will surface as an unhelpful UnboundLocalError.

Apply this diff to initialize, validate, and fail fast with a clear error:

@@ async def build_kb_info(self) -> Data:
-            # Read the embedding info from the knowledge base folder
+            # Read the embedding info from the knowledge base folder
             kb_path = await self._kb_path()
@@
-            metadata_path = kb_path / "embedding_metadata.json"
+            metadata_path = kb_path / "embedding_metadata.json"
+            embedding_model: str | None = None
+            api_key: str | None = None
@@
-            if metadata_path.exists():
+            if metadata_path.exists():
                 settings_service = get_settings_service()
                 metadata = json.loads(metadata_path.read_text())
-                embedding_model = metadata.get("embedding_model")
+                embedding_model = metadata.get("embedding_model")
                 try:
                     api_key = decrypt_api_key(metadata["api_key"], settings_service)
                 except (InvalidToken, TypeError, ValueError) as e:
                     logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}")
@@
-            if self.api_key:
+            if self.api_key:
                 api_key = self.api_key
                 self._save_embedding_metadata(
                     kb_path=kb_path,
                     embedding_model=embedding_model,
                     api_key=api_key,
                 )
+
+            # Validate presence of embedding model (and api_key if required)
+            if not embedding_model:
+                raise ValueError(
+                    "Embedding metadata not found for knowledge base. "
+                    "Create the KB via the 'Create new knowledge' dialog or ensure embedding_metadata.json exists."
+                )

342-381: Swallowing exceptions in _create_vector_store hides ingestion failures.

The PR states ingestion should raise on failures. _create_vector_store logs errors but does not propagate them, so build_kb_info may report success with zero documents.

Propagate exceptions after logging:

@@ async def _create_vector_store(self, df_source: pd.DataFrame, config_list: list[dict[str, Any]], embedding_model: str, api_key: str) -> None:
-        try:
+        try:
@@
-        except (OSError, ValueError, RuntimeError) as e:
-            self.log(f"Error creating vector store: {e}")
+        except (OSError, ValueError, RuntimeError) as e:
+            self.log(f"Error creating vector store: {e}")
+            raise

416-454: Confusing/incorrect use of content vs identifier columns when building text and IDs.

The comment says “Build content text from identifier columns,” but the code uses content_cols. Then page_content is reassigned from identifier_cols and reused for hashing. This conflates concerns and is error-prone.

Build text_content from content_cols for embeddings.
Build id_source from identifier_cols if present; else fall back to text_content.
Hash id_source for _id.

@@     async def _convert_df_to_data_objects(self, df_source: pd.DataFrame, config_list: list[dict[str, Any]]) -> list[Data]:
-        for _, row in df_source.iterrows():
-            # Build content text from identifier columns using list comprehension
-            identifier_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])]
-
-            # Join all parts into a single string
-            page_content = " ".join(identifier_parts)
-
-            # Build metadata from NON-vectorized columns only (simple key-value pairs)
-            data_dict = {
-                "text": page_content,  # Main content for vectorization
-            }
-
-            # Add identifier columns if they exist
-            if identifier_cols:
-                identifier_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])]
-                page_content = " ".join(identifier_parts)
+        for _, row in df_source.iterrows():
+            # Build the text content from content (vectorized) columns
+            content_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])]
+            text_content = " ".join(content_parts)
+
+            # Determine identifier source (prefer identifier cols; else fall back to text_content)
+            if identifier_cols:
+                id_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])]
+                id_source = " ".join(id_parts)
+            else:
+                id_source = text_content
+
+            # Build metadata from NON-vectorized columns only (simple key-value pairs)
+            data_dict = {
+                "text": text_content,  # Main content to embed
+            }
@@
-            # Hash the page_content for unique ID
-            page_content_hash = hashlib.sha256(page_content.encode()).hexdigest()
+            # Hash the identifier source for unique ID
+            page_content_hash = hashlib.sha256(id_source.encode()).hexdigest()
             data_dict["_id"] = page_content_hash

635-642: Catching the wrong timeout exception during embedding validation.

asyncio.wait_for raises asyncio.TimeoutError, not built-in TimeoutError. The current except block won’t trigger.

Fix the except clause:

-                except TimeoutError as e:
+                except asyncio.TimeoutError as e:
                     msg = "Embedding validation timed out. Please verify network connectivity and key."
                     raise ValueError(msg) from e

src/backend/base/langflow/components/knowledge_bases/retrieval.py (1)

239-257: Bug: Constructing Data with arbitrary kwargs will likely fail validation

Data expects its payload under the data field (see DataFrame.to_data_list usage across the codebase). Instantiating as Data(**kwargs) risks Pydantic validation errors and silent schema drift. Wrap the row dict in Data(data=...).

Apply this diff:
-            data_list.append(Data(**kwargs))
+            data_list.append(Data(data=kwargs))

src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json (1)

853-869: Potential UnboundLocalError and weak defaults in build_kb_info

Inside the embedded component code (template.code.value), build_kb_info uses embedding_model and api_key variables that may be unset if metadata_path doesn’t exist or decrypt fails. That can throw NameError/UnboundLocalError and makes error handling brittle.

Patch the method to initialize variables and fail loudly when metadata is missing; also normalize SecretStr to str:

@@
-    async def build_kb_info(self) -> Data:
+    async def build_kb_info(self) -> Data:
@@
-            # Read the embedding info from the knowledge base folder
+            # Read the embedding info from the knowledge base folder
             kb_path = await self._kb_path()
             if not kb_path:
                 msg = "Knowledge base path is not set. Please create a new knowledge base first."
                 raise ValueError(msg)
             metadata_path = kb_path / "embedding_metadata.json"
 
-            # If the API key is not provided, try to read it from the metadata file
-            if metadata_path.exists():
-                settings_service = get_settings_service()
-                metadata = json.loads(metadata_path.read_text())
-                embedding_model = metadata.get("embedding_model")
-                try:
-                    api_key = decrypt_api_key(metadata["api_key"], settings_service)
-                except (InvalidToken, TypeError, ValueError) as e:
-                    logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}")
+            # Initialize defaults
+            embedding_model: str | None = None
+            api_key: str | None = None
+
+            # If the API key is not provided, try to read it from the metadata file
+            if metadata_path.exists():
+                settings_service = get_settings_service()
+                metadata = json.loads(metadata_path.read_text())
+                embedding_model = metadata.get("embedding_model")
+                try:
+                    enc = metadata.get("api_key")
+                    api_key = decrypt_api_key(enc, settings_service) if enc else None
+                except (InvalidToken, TypeError, ValueError) as e:
+                    logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}")
+                    api_key = None
+            else:
+                msg = "Embedding metadata not found. Create the knowledge base first from the dropdown dialog."
+                raise ValueError(msg)
 
             # Check if a custom API key was provided, update metadata if so
-            if self.api_key:
-                api_key = self.api_key
+            if self.api_key:
+                api_key = self.api_key.get_secret_value() if hasattr(self.api_key, "get_secret_value") else str(self.api_key)
                 self._save_embedding_metadata(
                     kb_path=kb_path,
                     embedding_model=embedding_model,
                     api_key=api_key,
                 )
 
+            if not embedding_model:
+                raise ValueError("Embedding model unavailable. Metadata is missing or invalid.")
+
             # Create vector store following Local DB component pattern
             await self._create_vector_store(df_source, config_list, embedding_model=embedding_model, api_key=api_key)

🧹 Nitpick comments (19)

src/backend/tests/unit/base/data/test_kb_utils.py (4)
5-5: Rename test class to reflect the new module name.

Keeps naming consistent with the new package.
-class TestKBUtils:
+class TestKnowledgeBaseUtils:
45-47: Strengthen the assertion to match the comment (tokenization doesn’t stem “cats”→“cat”).

As written, >= 0.0 always passes and doesn’t validate the stated behavior.
-        # Third document contains both "cats" and "dogs", but case-insensitive matching should work
-        # Note: "cats" != "cat" exactly, so this tests the term matching behavior
-        assert scores[2] >= 0.0
+        # Third document contains "cats" and "dogs" (plural), which should NOT match "cat"/"dog"
+        # given simple whitespace tokenization without stemming.
+        assert scores[2] == 0.0
178-181: Make “different scores” assertions robust to edge cases and float quirks.

Direct list inequality can be brittle if parameters incidentally yield identical vectors; check for any meaningful element-wise difference instead.
-        # Scores should be different with different parameters
-        assert scores_default != scores_k1
-        assert scores_default != scores_b
+        # Scores should be different with different parameters (tolerate float quirks)
+        assert any(abs(a - b) > 1e-12 for a, b in zip(scores_default, scores_k1))
+        assert any(abs(a - b) > 1e-12 for a, b in zip(scores_default, scores_b))
1-459: (Optional) Consider relocating/renaming this test to mirror the new package path.

For long-term maintainability, consider moving the file to:

src/backend/tests/unit/base/knowledge_bases/test_knowledge_base_utils.py

This keeps tests aligned with the refactor’s package structure.

If you’d like, I can draft the minimal PR-wide file move plan (including updating any import paths) to keep everything consistent.
src/frontend/src/utils/styleUtils.ts (1)
214-218: Add icon/color mappings for the new 'knowledge_bases' category to keep UI consistent.

You added the category to SIDEBAR_CATEGORIES, but there are no corresponding entries in categoryIcons, nodeIconToDisplayIconMap, nodeColors, or nodeColorsName. Some UI surfaces use those maps; missing keys can cause fallback icons/colors or blank icons.

Apply this diff to add consistent mappings:
--- a/src/frontend/src/utils/styleUtils.ts
+++ b/src/frontend/src/utils/styleUtils.ts
@@
 export const nodeColors: { [char: string]: string } = {
@@
   Tool: "#00fbfc",
+  knowledge_bases: "#7c3aed", // violet-ish, distinct from vectorstores/retrievers
 };
@@
 export const nodeColorsName: { [char: string]: string } = {
@@
   DataFrame: "pink",
+  knowledge_bases: "violet",
 };
@@
 export const categoryIcons: Record<string, string> = {
@@
   toolkits: "Package2",
   tools: "Hammer",
+  knowledge_bases: "Package2",
   custom: "Edit",
   custom_components: "GradientInfinity",
 };
@@
 export const nodeIconToDisplayIconMap: Record<string, string> = {
@@
   textsplitters: "Scissors",
   toolkits: "Package2",
   tools: "Hammer",
+  knowledge_bases: "Package2",
   custom_components: "GradientInfinity",
src/backend/base/langflow/components/knowledge_bases/ingestion.py (1)

285-290: Prefer atomic write for embedding_metadata.json to avoid partial files.

A crash between write_text and flush could leave a truncated JSON. Use a temp file + replace.

I can provide a small helper to write atomically using NamedTemporaryFile if you want it integrated here.

src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (3)

246-295: Good duplicate-prevention test; consider adding a variant with allow_duplicates=True.

Verifies skip on hash collision. Add a complementary test asserting both rows are returned when allow_duplicates=True.

If helpful, I can draft the test body.

103-108: Backward-compatibility mapping is missing.

Guidelines ask for file_names_mapping to keep version compatibility. Since this is a rename/move from KBIngestionComponent, provide a mapping from the old component file to the new one.

If you share the exact legacy path/class (e.g., langflow/components/data/kb_ingestion.py::KBIngestionComponent), I can propose the precise mapping snippet.

312-333: Successful path test is solid; add failure-path coverage for missing metadata.

Consider a test where embedding_metadata.json is absent: build_kb_info should now raise ValueError (after the suggested fix), and the test should assert the error message.

Happy to provide the assertion once the error text is finalized.
src/backend/base/langflow/components/knowledge_bases/__init__.py (1)
19-31: Lazy importer is clean; consider minor introspection polish.

Return a sorted list in dir for deterministic introspection order. No functional change.
-def __dir__() -> list[str]:
-    return list(__all__)
+def __dir__() -> list[str]:
+    return sorted(__all__)
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py (2)

90-95: Backward-compatibility mapping likely needed here, too.

As with ingestion tests, add file_names_mapping to cover the rename from KBRetrievalComponent.

Share the old path/class and I’ll draft the mapping block.

369-385: Optional: Verify include_embeddings behavior end-to-end.

You can patch the underlying Chroma client to return embeddings and assert that results include an _embeddings key when include_embeddings=True and omit it when False.

I can draft a focused unit test that stubs Chroma._client.get_collection(...).get(...) to return deterministic vectors.
src/backend/base/langflow/components/knowledge_bases/retrieval.py (4)
71-77: Mismatch between “Include Embeddings” help text and behavior

The input’s info states embeddings are “Only applicable if 'Include Metadata' is enabled,” but the code adds _embeddings regardless of include_metadata. Either enforce the dependency or clarify the description.

Two options:

Update help text (simple, no behavior change):
-            info="Whether to include embeddings in the output. Only applicable if 'Include Metadata' is enabled.",
+            info="Whether to include embeddings in the output (adds an _embeddings field when available).",
Or gate by include_metadata:
-            if self.include_embeddings:
+            if self.include_embeddings and self.include_metadata:
                 kwargs["_embeddings"] = id_to_embedding.get(doc[0].metadata.get("_id"))
I recommend the first to keep current flexibility while avoiding confusion.

Also applies to: 245-251

231-238: Avoid reliance on private chroma._client

Accessing chroma._client is private API and brittle across langchain_chroma/chromadb versions. Prefer a public accessor if available, or guard with feature detection and clear error messages.

Example defensive pattern:
# Try safer access first
collection = getattr(getattr(chroma, "_collection", None), "get", None)
if callable(collection):
    embeddings_result = chroma._collection.get(where={"_id": {"$in": doc_ids}}, include=["metadatas", "embeddings"])
else:
    # Fallback to client with explicit warning
    if not hasattr(chroma, "_client"):
        raise RuntimeError("Underlying Chroma client not accessible to fetch embeddings safely.")
    collection = chroma._client.get_collection(name=self.knowledge_base)
    embeddings_result = collection.get(where={"_id": {"$in": doc_ids}}, include=["metadatas", "embeddings"])
If the wrapper exposes a public get/collection accessor in your pinned version, prefer that instead.

206-222: Empty-query path uses similarity_search("")

When no search_query is provided, similarity_search("") may yield implementation-defined results. If the intended behavior is “top-k arbitrary/most recent rows,” consider fetching via collection.get(limit=k) or a deterministic fallback, rather than embedding an empty string.

Sketch:
if self.search_query:
    ...
else:
    # Deterministic fallback without embeddings call
    collection = chroma._client.get_collection(name=self.knowledge_base)
    fetched = collection.get(limit=self.top_k, include=["documents", "metadatas"])
    results = [(Document(page_content=doc, metadata=meta), 0.0) for doc, meta in zip(fetched["documents"], fetched["metadatas"])]
This also avoids an unnecessary embedding of "".

245-246: Score semantics: expose distance directly or a normalized similarity

You invert the distance via -1 * doc[1], producing negative scores. Consider exposing both distance (lower is better) and a normalized similarity in [0, 1], or just distance to avoid confusion.

Example:
-    kwargs["_score"] = -1 * doc[1]
+    kwargs["distance"] = float(doc[1])
+    # Optional: bounded similarity
+    kwargs["similarity"] = 1.0 / (1.0 + float(doc[1]))
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json (1)
656-673: Clarify “Include Embeddings” description in template

The description repeats the “Only applicable if 'Include Metadata' is enabled” constraint, which the backend does not enforce. To avoid UX confusion, update the info text here too.

Apply this diff:
-                "info": "Whether to include embeddings in the output. Only applicable if 'Include Metadata' is enabled.",
+                "info": "Whether to include embeddings in the output (adds an _embeddings field when available).",
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json (2)
853-869: Catching the wrong timeout exception in update_build_config

asyncio.wait_for raises asyncio.TimeoutError, not built-in TimeoutError. The current except might miss the specific timeout path.

Inside update_build_config, adjust:
-                except TimeoutError as e:
+                except asyncio.TimeoutError as e:
                     msg = "Embedding validation timed out. Please verify network connectivity and key."
                     raise ValueError(msg) from e
933-939: Default column_config sets both “identifier” and “vectorize” to true

This is valid (ID used for dedup hash, text used for vectors), just calling it out. If users expect a distinct identifier (e.g., URL) you might consider an example row with a non-vectorized identifier column in docs, but no change required.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 933198d and 84e6c05.

📒 Files selected for processing (10)

src/backend/base/langflow/components/data/__init__.py (0 hunks)
src/backend/base/langflow/components/knowledge_bases/__init__.py (1 hunks)
src/backend/base/langflow/components/knowledge_bases/ingestion.py (5 hunks)
src/backend/base/langflow/components/knowledge_bases/retrieval.py (7 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json (15 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json (13 hunks)
src/backend/tests/unit/base/data/test_kb_utils.py (1 hunks)
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (6 hunks)
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py (8 hunks)
src/frontend/src/utils/styleUtils.ts (1 hunks)

💤 Files with no reviewable changes (1)

src/backend/base/langflow/components/data/init.py

🧰 Additional context used

📓 Path-based instructions (9)

src/frontend/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

src/frontend/src/**/*.{ts,tsx,js,jsx}: All frontend TypeScript and JavaScript code should be located under src/frontend/src/ and organized into components, pages, icons, stores, types, utils, hooks, services, and assets directories as per the specified directory layout.
Use React 18 with TypeScript for all UI components in the frontend.
Format all TypeScript and JavaScript code using the make format_frontend command.
Lint all TypeScript and JavaScript code using the make lint command.

Files:

src/frontend/src/utils/styleUtils.ts

src/frontend/src/utils/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

All utility functions should be placed in the utils directory.

Files:

src/frontend/src/utils/styleUtils.ts

{src/backend/**/*.py,tests/**/*.py,Makefile}

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code

Files:

src/backend/tests/unit/base/data/test_kb_utils.py
src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/base/langflow/components/knowledge_bases/retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py
src/backend/base/langflow/components/knowledge_bases/__init__.py

src/backend/tests/unit/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

Test component integration within flows using create_flow, build_flow, and get_build_events utilities

Files:

src/backend/tests/unit/base/data/test_kb_utils.py
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

src/backend/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)

src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...

Files:

src/backend/tests/unit/base/data/test_kb_utils.py
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

src/backend/base/langflow/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/base/langflow/components/**/*.py: Add new backend components to the appropriate subdirectory under src/backend/base/langflow/components/
Implement async component methods using async def and await for asynchronous operations
Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
Use asyncio.Queue for non-blocking queue operations in async components and handle timeouts appropriately

Files:

src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/backend/base/langflow/components/knowledge_bases/retrieval.py
src/backend/base/langflow/components/knowledge_bases/__init__.py

src/backend/**/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

In your Python component class, set the icon attribute to a string matching the frontend icon mapping exactly (case-sensitive).

Files:

src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/base/langflow/components/knowledge_bases/retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py
src/backend/base/langflow/components/knowledge_bases/__init__.py

src/backend/tests/unit/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/tests/unit/components/**/*.py: Mirror the component directory structure for unit tests in src/backend/tests/unit/components/
Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests
Provide file_names_mapping for backward compatibility in component tests
Create comprehensive unit tests for all new components

Files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

src/backend/base/langflow/components/**/__init__.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

Update init.py with alphabetical imports when adding new components

Files:

src/backend/base/langflow/components/knowledge_bases/__init__.py

🧠 Learnings (4)

📚 Learning: 2025-07-21T14:16:14.125Z

Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/backend/tests/**/*.py : Test backward compatibility across Langflow versions in backend Python tests by mapping component files to supported versions using 'VersionComponentMapping'.

Applied to files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py

📚 Learning: 2025-08-05T22:51:27.961Z

Learnt from: edwinjosechittilappilly
PR: langflow-ai/langflow#0
File: :0-0
Timestamp: 2025-08-05T22:51:27.961Z
Learning: The TestComposioComponentAuth test in src/backend/tests/unit/components/bundles/composio/test_base_composio.py demonstrates proper integration testing patterns for external API components, including real API calls with mocking for OAuth completion, comprehensive resource cleanup, and proper environment variable handling with pytest.skip() fallbacks.

Applied to files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

📚 Learning: 2025-07-21T14:16:14.125Z

Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-07-21T14:16:14.125Z
Learning: Applies to src/backend/tests/**/*.py : When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.

Applied to files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

📚 Learning: 2025-07-18T18:25:54.486Z

Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/tests/unit/components/**/*.py : Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests

Applied to files:

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py
src/backend/tests/unit/components/knowledge_bases/test_ingestion.py

🧬 Code graph analysis (6)

src/backend/tests/unit/base/data/test_kb_utils.py (1)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (2)

compute_bm25 (53-109)

compute_tfidf (10-50)

src/backend/base/langflow/components/knowledge_bases/ingestion.py (4)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (1)

get_knowledge_bases (112-137)

src/backend/base/langflow/inputs/inputs.py (2)

HandleInput (76-87)

TableInput (38-73)

src/backend/base/langflow/schema/data.py (1)

Data (23-285)

src/backend/base/langflow/schema/dataframe.py (1)

DataFrame (11-206)

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py (4)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (1)

get_knowledge_bases (112-137)

src/backend/base/langflow/components/knowledge_bases/retrieval.py (2)

KnowledgeRetrievalComponent (27-256)

retrieve_data (173-256)

src/backend/tests/base.py (1)

ComponentTestBaseWithoutClient (163-164)

src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (5)

component_class (16-18)

mock_knowledge_base_path (21-24)

mock_user_data (32-36)

default_kwargs (62-101)

mock_user_id (57-59)

src/backend/base/langflow/components/knowledge_bases/retrieval.py (2)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (1)

get_knowledge_bases (112-137)

src/backend/base/langflow/schema/dataframe.py (1)

DataFrame (11-206)

src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (2)

src/backend/base/langflow/base/knowledge_bases/knowledge_base_utils.py (1)

get_knowledge_bases (112-137)

src/backend/base/langflow/components/knowledge_bases/ingestion.py (1)

KnowledgeIngestionComponent (41-668)

src/backend/base/langflow/components/knowledge_bases/__init__.py (3)

src/backend/base/langflow/components/_importing.py (1)

import_mod (8-37)

src/backend/base/langflow/components/knowledge_bases/ingestion.py (1)

KnowledgeIngestionComponent (41-668)

src/backend/base/langflow/components/knowledge_bases/retrieval.py (1)

KnowledgeRetrievalComponent (27-256)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test-starter-projects

🔇 Additional comments (15)

src/backend/tests/unit/base/data/test_kb_utils.py (1)

2-2: Import path refactor verified – ready to merge

All checks passed:

No references to the old langflow.base.data.kb_utils path were found.

The new module knowledge_base_utils.py exists under src/backend/base/langflow/base/knowledge_bases/.

Both compute_tfidf and compute_bm25 are correctly defined in that module.

Great work moving these utilities into the new knowledge_bases package.

src/backend/tests/unit/components/knowledge_bases/test_ingestion.py (2)

212-223: Patch targets for settings/key encryption look correct. LGTM.

Mocks are scoped to the module under test and won’t leak. Good practice.

109-120: Validate behavior for list inputs (Data and DataFrame).

Given the component now accepts HandleInput with is_list=True for Data/DataFrame, add tests for:

input_df: list[Data]

input_df: list[pd.DataFrame]

mixed list [Data, DataFrame] (if you adopt the defensive handling)

I can generate these unit tests with fixtures producing the data and asserting the concatenated row count.

src/backend/tests/unit/components/knowledge_bases/test_retrieval.py (3)

96-109: KB listing tests are accurate and mirror ingestion-side expectations. LGTM.

Covers hidden dirs and multiple KBs correctly.

302-329: Great override test for user-supplied API key.

Using a real SecretStr ensures parity with runtime behavior.

330-342: Test for missing metadata is valuable; mirror the ingestion-side failure.

Once ingestion raises on missing metadata, both components will behave consistently. No action here; just noting the alignment.
src/backend/base/langflow/components/knowledge_bases/retrieval.py (2)
27-33: Renaming and output alignment look good

Component metadata (display_name/name/icon), output name/method (retrieve_data), and dynamic KB options via update_build_config align with the PR goals and frontend mapping.

Also applies to: 80-87, 89-101

1-25: Formatting/Linting Verification Required

The automated check could not run in this environment because the make command was not found. Please verify locally that this module passes the repository’s formatting and linting standards by running:
make format_backend
make lint
src/backend/base/langflow/initial_setup/starter_projects/Knowledge Retrieval.json (4)

26-31: Edge wiring: KnowledgeRetrieval → ChatOutput is consistent

Source handle expose DataFrame; target ChatOutput accepts Data/DataFrame/Message. Connection metadata and ids look consistent.

603-617: Output port/method rename correctly reflected

Output port name retrieve_data and method retrieve_data match the backend component. Good alignment for lazy exports and UI.

715-737: Search query tool_mode enabled — good call

Marking search_query as tool_mode enables smoother interop with tool-calling models. No action needed.

535-561: Component identity updates are consistent

Module path, icon (“download”), display/title, and code_hash updates reflect the rename/move. No issues spotted.

src/backend/base/langflow/initial_setup/starter_projects/Knowledge Ingestion.json (3)

713-777: Rename and module relocation look correct

Node type/name/icon updated to Knowledge Ingestion with module langflow.components.knowledge_bases.ingestion.KnowledgeIngestionComponent. Starter graph metadata consistent.

940-960: Input flexibility: HandleInput for input_df (Data | DataFrame, list) is aligned with PR goals

This improves usability by accepting Data directly and lists. Matches the described auto-conversion behavior.

1-1122: No outdated references detected; please run formatting and lint manually

Ripgrep search for KBIngestionComponent, KBRetrievalComponent, get_chroma_kb_data, chroma_kb_data, components.data.kb_retrieval, components.data.kb_ingestion, or kb_utils returned no matches.

The make format_backend and make lint commands weren’t available in this environment—please run your project’s formatting and lint steps (e.g. via your Makefile, npm scripts, or CI commands) to ensure code style consistency.

coderabbitai · 2025-08-26T16:36:12Z

src/backend/base/langflow/components/knowledge_bases/ingestion.py

+        HandleInput(
            name="input_df",
-            display_name="Data",
-            info="Table with all original columns (already chunked / processed).",
+            display_name="Input",
+            info=(
+                "Table with all original columns (already chunked / processed). "
+                "Accepts Data or DataFrame. If Data is provided, it is converted to a DataFrame automatically."
+            ),
+            input_types=["Data", "DataFrame"],
+            is_list=True,
            required=True,


⚠️ Potential issue

Input handling misses list[DataFrame] support; can crash downstream.

The HandleInput advertises input_types ["Data", "DataFrame"] with is_list=True, but build_kb_info only handles Data, list[Data], or single DataFrame. Passing list[DataFrame] leaves df_source as a Python list, breaking validation and ingestion.

Apply this diff to support list[DataFrame] (and mixed Data/DataFrame lists defensively):

@@ async def build_kb_info(self) -> Data: - if isinstance(self.input_df, Data): - df_source: pd.DataFrame = self.input_df.to_dataframe() - elif isinstance(self.input_df, list) and all(isinstance(item, Data) for item in self.input_df): - # If input_df is a list of Data objects, concatenate them into a single DataFrame - df_source: pd.DataFrame = pd.concat([item.to_dataframe() for item in self.input_df], ignore_index=True) - else: - df_source: pd.DataFrame = self.input_df + if isinstance(self.input_df, Data): + df_source: pd.DataFrame = self.input_df.to_dataframe() + elif isinstance(self.input_df, list): + # Accept lists of Data or DataFrame (or a mix); normalize to DataFrame + frames: list[pd.DataFrame] = [] + for item in self.input_df: + if isinstance(item, Data): + frames.append(item.to_dataframe()) + elif isinstance(item, pd.DataFrame): + frames.append(item) + else: + msg = f"Unsupported input type in list: {type(item).__name__}. Expected Data or DataFrame." + raise ValueError(msg) + if not frames: + raise ValueError("Empty input list provided for 'input_df'.") + df_source = pd.concat(frames, ignore_index=True) + elif isinstance(self.input_df, pd.DataFrame): + df_source = self.input_df + else: + msg = f"Unsupported input type for 'input_df': {type(self.input_df).__name__}. Expected Data or DataFrame." + raise ValueError(msg)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

HandleInput(

name="input_df",

display_name="Data",

info="Table with all original columns (already chunked / processed).",

display_name="Input",

info=(

"Table with all original columns (already chunked / processed). "

"Accepts Data or DataFrame. If Data is provided, it is converted to a DataFrame automatically."

),

input_types=["Data", "DataFrame"],

is_list=True,

required=True,

async def build_kb_info(self) -> Data:

# … previous logic …

if isinstance(self.input_df, Data):

df_source: pd.DataFrame = self.input_df.to_dataframe()

elif isinstance(self.input_df, list):

# Accept lists of Data or DataFrame (or a mix); normalize to a single DataFrame

frames: list[pd.DataFrame] = []

for item in self.input_df:

if isinstance(item, Data):

frames.append(item.to_dataframe())

elif isinstance(item, pd.DataFrame):

frames.append(item)

else:

msg = (

f"Unsupported input type in list: {type(item).__name__}. "

"Expected Data or DataFrame."

)

raise ValueError(msg)

if not frames:

raise ValueError("Empty input list provided for 'input_df'.")

df_source = pd.concat(frames, ignore_index=True)

elif isinstance(self.input_df, pd.DataFrame):

df_source = self.input_df

else:

msg = (

f"Unsupported input type for 'input_df': {type(self.input_df).__name__}. "

"Expected Data or DataFrame."

)

raise ValueError(msg)

# … subsequent logic …

🤖 Prompt for AI Agents

In src/backend/base/langflow/components/knowledge_bases/ingestion.py around lines 107-116, build_kb_info currently accepts ["Data","DataFrame"] with is_list=True but only handles list[Data] or single DataFrame; this misses list[DataFrame] and mixed lists causing df_source to remain a raw Python list. Fix by normalizing incoming values: if input is a list, iterate elements and classify each element as Data (leave as-is) or DataFrame (convert to Data by records dicts or collect DataFrames); if all elements are DataFrames, concat them into a single DataFrame and proceed down the single-DataFrame branch; otherwise convert DataFrame elements to Data and treat the whole list as list[Data] so existing list-Data handling works; add defensive type checks and clear error messages for unsupported element types.

edwinjosechittilappilly

Code LGTM

edwinjosechittilappilly · 2025-08-26T17:38:13Z

please check the Icons in template vs sidebar
@erichare

github-actions · 2025-08-26T17:39:58Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	6.47% (1680/25935)	3.51% (690/19625)	3.47% (194/5584)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
682	0 💤	0 ❌	0 🔥	11.501s ⏱️

codecov · 2025-08-26T17:40:17Z

Codecov Report

❌ Patch coverage is 62.50000% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-1.6.0@e423526). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...e/langflow/components/knowledge_bases/retrieval.py	41.66%	7 Missing ⚠️
...e/langflow/components/knowledge_bases/ingestion.py	83.33%	2 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff                @@
##             release-1.6.0    #9543   +/-   ##
================================================
  Coverage                 ?   34.68%           
================================================
  Files                    ?     1209           
  Lines                    ?    57112           
  Branches                 ?     5419           
================================================
  Hits                     ?    19812           
  Misses                   ?    37156           
  Partials                 ?      144

Flag	Coverage Δ
backend	`56.19% <62.50%> (?)`
frontend	`5.81% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...gflow/base/knowledge_bases/knowledge_base_utils.py	`90.76% <ø> (ø)`
src/frontend/src/utils/styleUtils.ts	`49.09% <ø> (ø)`
...e/langflow/components/knowledge_bases/ingestion.py	`75.95% <83.33%> (ø)`
...e/langflow/components/knowledge_bases/retrieval.py	`73.98% <41.66%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sonarqubecloud · 2025-08-27T15:01:51Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.6% Duplication on New Code

See analysis details on SonarQube Cloud

fix: Knowledge base component refactor

2a0dabd

erichare requested review from edwinjosechittilappilly and rodrigosnader August 26, 2025 16:21

github-actions bot added the bug Something isn't working label Aug 26, 2025

[autofix.ci] apply automated fixes

78cd280