feat: introduce component hash history #11311
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis pull request refactors component indexing and hash history handling. The index builder is simplified to create deterministic indexes from scratch without preserving history, while a new separate script manages version-based hash histories. Component metadata now includes a unique Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 4❌ Failed checks (1 error, 3 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@scripts/build_hash_history.py`:
- Around line 62-66: The new-component branch that creates
history[component_name] currently only sets "versions"; update the block inside
the if component_name not in history check to also set a "name" key with the
human-readable component_name so new entries match existing
stable_hash_history.json shape (i.e., ensure history[component_name]["name"] =
component_name alongside history[component_name]["versions"] = {version_key:
code_hash}); locate and modify the creation logic referencing history,
component_name, version_key, and code_hash.
- Around line 108-120: The loop is using comp_name as the key when calling
update_history, but the history uses component UUIDs; change the call in the
loop to pass comp_details["metadata"]["component_id"] (instead of comp_name)
along with code_hash and current_version to update_history, and update the
update_history(function) signature and its docstring to indicate it expects a
component UUID (component_id) as the unique identifier rather than a component
name so existing lookups and writes use the UUID keys.
In `@src/backend/tests/unit/test_build_hash_history.py`:
- Around line 111-121: The test test_all_real_component_ids_are_unique is still
asserting uniqueness of metadata["component_id"] while the code now uses
comp_name (the dict key) as the identifier; either remove this test if
component_id is no longer used, or update it to collect comp_name keys from the
modules_dict returned by _import_components() (iterate modules_dict.items() and
collect the per-module dict keys) and assert the total list length equals the
set length to ensure comp_name uniqueness across categories; reference the
functions _import_components and main() and the variable comp_name when making
the change.
- Around line 79-109: The test_main_function assertions expect saved_history to
be keyed by component_id with a "name" field, but main() actually keys history
by component name and stores versions under that key; update the assertions in
test_main_function to check saved_history uses component names as keys (e.g.,
"MyComponent", "AnotherComponent", "ThirdComponent") and verify
saved_history["MyComponent"]["versions"]["0.1.0"] == "hash_v1" (and similarly
for the others) instead of checking for component_id keys and a separate "name"
field; also remove or reduce unnecessary patches (for example don't patch Path
if you can use tmp_path directly) so the test exercises more real behavior and
only mock true external dependencies like _import_components, load_hash_history,
save_hash_history, and get_lfx_version.
- Around line 43-76: The test must be updated to match the actual update_history
signature and storage key: call update_history with 4 args (history,
component_name, code_hash, current_version) instead of 5, replace all uses of
component_id as the lookup key with component_name (e.g. assert
history[component_name]["versions"]["0.3.0"] == code_hash_v1), remove assertions
expecting a history[...]["name"] field (implementation does not store it), and
ensure the ValueError check still uses the same version semantics but calls
update_history(history, component_name, code_hash_v1, "0.4.0") so checks target
the component_name-based history structure.
In `@src/lfx/src/lfx/custom/utils.py`:
- Around line 525-528: The hasattr(custom_component, "component_id") check is
ineffective because component_id is a class attribute and will always be
present; change the condition to verify the value is non-empty (e.g., if
custom_component.component_id is not None and custom_component.component_id !=
"") before assigning frontend_node.metadata["component_id"], and replace
logger.error with logger.warning to reflect that a missing component_id may be
expected for user-defined components; keep the same context variables
(custom_component, component_id, frontend_node.metadata, ctype_name, logger)
when implementing this change.
🧹 Nitpick comments (3)
scripts/build_hash_history.py (2)
12-16: Variable shadowing:versionshadows the imported function.The variable
versionon line 16 shadows theversionfunction imported on line 14, which could cause confusion.Proposed fix
def get_lfx_version(): """Get the installed lfx version.""" from importlib.metadata import version - return version("lfx") + lfx_version = version("lfx") + return lfx_version
8-9: Consider usingPathfor consistency withbuild_component_index.py.The history file paths are defined as strings, while
build_component_index.pyusesPathobjects forCOMPONENT_INDEX_PATH. UsingPathconsistently would improve maintainability.Proposed fix
-STABLE_HISTORY_FILE = "src/lfx/src/lfx/_assets/stable_hash_history.json" -NIGHTLY_HISTORY_FILE = "src/lfx/src/lfx/_assets/nightly_hash_history.json" +STABLE_HISTORY_FILE = Path(__file__).parent.parent / "src" / "lfx" / "src" / "lfx" / "_assets" / "stable_hash_history.json" +NIGHTLY_HISTORY_FILE = Path(__file__).parent.parent / "src" / "lfx" / "src" / "lfx" / "_assets" / "nightly_hash_history.json"This would also eliminate the need for
Path()wrappers on lines 106 and 122.src/backend/tests/unit/test_build_hash_history.py (1)
6-10: Consider avoidingsys.pathmanipulation for imports.Modifying
sys.pathat runtime is fragile and can break if the directory structure changes. Consider either:
- Making
scripts/a proper installable package with apyproject.tomlorsetup.py- Using a
conftest.pyfixture to handle the path setup- Adding the scripts directory to
PYTHONPATHin the test configuration
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
scripts/build_component_index.pyscripts/build_hash_history.pysrc/backend/base/langflow/services/telemetry/schema.pysrc/backend/tests/unit/test_build_hash_history.pysrc/backend/tests/unit/test_component_index_hash_history.pysrc/lfx/src/lfx/_assets/component_index.jsonsrc/lfx/src/lfx/_assets/stable_hash_history.jsonsrc/lfx/src/lfx/custom/custom_component/custom_component.pysrc/lfx/src/lfx/custom/utils.py
💤 Files with no reviewable changes (2)
- src/backend/tests/unit/test_component_index_hash_history.py
- src/backend/base/langflow/services/telemetry/schema.py
🧰 Additional context used
📓 Path-based instructions (3)
src/backend/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
src/backend/**/*.py: Use FastAPI async patterns withawaitfor async operations in component execution methods
Useasyncio.create_task()for background tasks and implement proper cleanup with try/except forasyncio.CancelledError
Usequeue.put_nowait()for non-blocking queue operations andasyncio.wait_for()with timeouts for controlled get operations
Files:
src/backend/tests/unit/test_build_hash_history.py
src/backend/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)
src/backend/tests/**/*.py: Place backend unit tests insrc/backend/tests/directory, component tests insrc/backend/tests/unit/components/organized by component subdirectory, and integration tests accessible viamake integration_tests
Use same filename as component with appropriate test prefix/suffix (e.g.,my_component.py→test_my_component.py)
Use theclientfixture (FastAPI Test Client) defined insrc/backend/tests/conftest.pyfor API tests; it provides an asynchttpx.AsyncClientwith automatic in-memory SQLite database and mocked environment variables. Skip client creation by marking test with@pytest.mark.noclient
Inherit from the correctComponentTestBasefamily class located insrc/backend/tests/base.pybased on API access needs:ComponentTestBase(no API),ComponentTestBaseWithClient(needs API), orComponentTestBaseWithoutClient(pure logic). Provide three required fixtures:component_class,default_kwargs, andfile_names_mapping
Create comprehensive unit tests for all new backend components. If unit tests are incomplete, create a corresponding Markdown file documenting manual testing steps and expected outcomes
Test both sync and async code paths, mock external dependencies appropriately, test error handling and edge cases, validate input/output behavior, and test component initialization and configuration
Use@pytest.mark.asynciodecorator for async component tests and ensure async methods are properly awaited
Test background tasks usingasyncio.create_task()and verify completion withasyncio.wait_for()with appropriate timeout constraints
Test queue operations using non-blockingqueue.put_nowait()andasyncio.wait_for(queue.get(), timeout=...)to verify queue processing without blocking
Use@pytest.mark.no_blockbustermarker to skip the blockbuster plugin in specific tests
For database tests that may fail in batch runs, run them sequentially usinguv run pytest src/backend/tests/unit/test_database.pyr...
Files:
src/backend/tests/unit/test_build_hash_history.py
**/test_*.py
📄 CodeRabbit inference engine (Custom checks)
**/test_*.py: Review test files for excessive use of mocks that may indicate poor test design - check if tests have too many mock objects that obscure what's actually being tested
Warn when mocks are used instead of testing real behavior and interactions, and suggest using real objects or test doubles when mocks become excessive
Ensure mocks are used appropriately for external dependencies only, not for core logic
Backend test files should follow the naming convention test_*.py with proper pytest structure
Test files should have descriptive test function names that explain what is being tested
Tests should be organized logically with proper setup and teardown
Consider including edge cases and error conditions for comprehensive test coverage
Verify tests cover both positive and negative scenarios where appropriate
For async functions in backend tests, ensure proper async testing patterns are used with pytest
For API endpoints, verify both success and error response testing
Files:
src/backend/tests/unit/test_build_hash_history.py
🧠 Learnings (10)
📚 Learning: 2025-06-26T19:43:18.260Z
Learnt from: ogabrielluiz
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-06-26T19:43:18.260Z
Learning: In langflow custom components, the `module_name` parameter is now propagated through template building functions to add module metadata and code hashes to frontend nodes for better component tracking and debugging.
Applied to files:
src/lfx/src/lfx/custom/utils.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component build config updates by calling `to_frontend_node()` to get the node template, then calling `update_build_config()` to apply configuration changes
Applied to files:
src/lfx/src/lfx/custom/utils.pysrc/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Create comprehensive unit tests for all new backend components. If unit tests are incomplete, create a corresponding Markdown file documenting manual testing steps and expected outcomes
Applied to files:
src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component versioning and backward compatibility using `file_names_mapping` fixture with `VersionComponentMapping` objects mapping component files across Langflow versions
Applied to files:
src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test both sync and async code paths, mock external dependencies appropriately, test error handling and edge cases, validate input/output behavior, and test component initialization and configuration
Applied to files:
src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Use `monkeypatch` fixture to mock internal functions for testing error handling scenarios; validate error status codes and error message content in responses
Applied to files:
src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Use predefined JSON flows and utility functions from `tests.unit.build_utils` (create_flow, build_flow, get_build_events, consume_and_assert_stream) for flow execution testing
Applied to files:
src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Place backend unit tests in `src/backend/tests/` directory, component tests in `src/backend/tests/unit/components/` organized by component subdirectory, and integration tests accessible via `make integration_tests`
Applied to files:
src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-08-05T22:51:27.961Z
Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-08-05T22:51:27.961Z
Learning: The TestComposioComponentAuth test in src/backend/tests/unit/components/bundles/composio/test_base_composio.py demonstrates proper integration testing patterns for external API components, including real API calls with mocking for OAuth completion, comprehensive resource cleanup, and proper environment variable handling with pytest.skip() fallbacks.
Applied to files:
src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-12-19T18:04:08.938Z
Learnt from: Jkavia
Repo: langflow-ai/langflow PR: 11111
File: src/backend/tests/unit/api/v2/test_workflow.py:10-11
Timestamp: 2025-12-19T18:04:08.938Z
Learning: In the langflow-ai/langflow repository, pytest-asyncio is configured with asyncio_mode = 'auto' in pyproject.toml. This means you do not need to decorate test functions or classes with pytest.mark.asyncio; async tests are auto-detected and run by pytest-asyncio. When reviewing tests, ensure they rely on this configuration (i.e., avoid unnecessary pytest.mark.asyncio decorators) and that tests living under any tests/ path (e.g., src/.../tests/**/*.py) follow this convention. If a test explicitly requires a different asyncio policy, document it and adjust the config accordingly.
Applied to files:
src/backend/tests/unit/test_build_hash_history.py
🧬 Code graph analysis (2)
scripts/build_hash_history.py (1)
src/lfx/src/lfx/interface/components.py (1)
import_langflow_components(465-499)
src/backend/tests/unit/test_build_hash_history.py (1)
scripts/build_hash_history.py (3)
main(80-123)update_history(54-77)_import_components(31-51)
🔇 Additional comments (5)
src/lfx/src/lfx/custom/custom_component/custom_component.py (1)
75-76: LGTM!The new
component_idattribute is cleanly added following the existing pattern for class attributes. The type annotation (str | None = None) is appropriate for an optional identifier.src/lfx/src/lfx/_assets/stable_hash_history.json (1)
1-2132: LGTM!The stable hash history JSON structure is well-formed with consistent schema:
- UUID keys provide stable, unique identifiers
- Each entry contains a human-readable
nameandversionsmap- Initial seeding with version
0.3.0establishes the baseline for tracking component changes across releasesscripts/build_hash_history.py (1)
89-98: LGTM on version/mode validation.The logic correctly enforces that:
- Nightly updates require a dev version
- Stable updates require a non-dev version
This prevents accidental cross-contamination of history files.
scripts/build_component_index.py (1)
157-158: LGTM!The
COMPONENT_INDEX_PATHconstant provides a well-defined, centralized location for the component index output. The path construction correctly navigates from the scripts directory (Path(__file__).parent.parent) to the repository root, then tosrc/lfx/src/lfx/_assets/component_index.json, which exists in the codebase. The usage pattern is clean, with the path defined at module level and safely used inmain()where directory creation is handled withparents=Trueandexist_ok=True.src/backend/tests/unit/test_build_hash_history.py (1)
12-41: LGTM on fixture structure.The fixture correctly mirrors the nested structure of
modules_dictwith categories containing components with metadata.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Adds component hash history files for stable and nightly versions. This will allow us to track Core Components across versions of Langflow, allowing users to disable Custom Component execution.
Uses a simple
version -> hashmapping. I decided against dealing with complexity of allowed ranges for now -- the growth of these files (even for the nightly) will not be significant in the next year (~10mb).Note this also removes the previous work done to add the hash history to the existing component index -- it makes more sense to keep them separated.