Skip to content

feat: introduce component hash history #11311

Merged
jordanrfrazier merged 27 commits intomainfrom
manage-hash-history-comp-index
Jan 20, 2026
Merged

feat: introduce component hash history #11311
jordanrfrazier merged 27 commits intomainfrom
manage-hash-history-comp-index

Conversation

@jordanrfrazier
Copy link
Collaborator

@jordanrfrazier jordanrfrazier commented Jan 15, 2026

Adds component hash history files for stable and nightly versions. This will allow us to track Core Components across versions of Langflow, allowing users to disable Custom Component execution.

Uses a simple version -> hash mapping. I decided against dealing with complexity of allowed ranges for now -- the growth of these files (even for the nightly) will not be significant in the next year (~10mb).

Note this also removes the previous work done to add the hash history to the existing component index -- it makes more sense to keep them separated.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request refactors component indexing and hash history handling. The index builder is simplified to create deterministic indexes from scratch without preserving history, while a new separate script manages version-based hash histories. Component metadata now includes a unique component_id attribute that propagates through the system.

Changes

Cohort / File(s) Summary
Index building refactoring
scripts/build_component_index.py
Removed hash history loading, merging, and prior index loading machinery; packaging dependency guard removed. Build now deterministically normalizes and hashes index fresh each run, without preserving prior state. New public constant COMPONENT_INDEX_PATH designates output location.
New hash history builder
scripts/build_hash_history.py
New script providing version-aware hash history management with functions for version retrieval, JSON loading/saving, component import, and history updates. Supports both stable and nightly release tracks with version conflict detection.
Test updates
src/backend/tests/unit/test_build_hash_history.py
New test suite covering update_history versioning scenarios and main function behavior with mocked helpers.
Test cleanup
src/backend/tests/unit/test_component_index_hash_history.py
Removed 300+ lines of tests covering old hash history merging and index loading machinery.
Telemetry schema
src/backend/base/langflow/services/telemetry/schema.py
Removed component_id field from ComponentInputsPayload example.
Hash history asset
src/lfx/src/lfx/_assets/stable_hash_history.json
New static JSON registry mapping 2000+ components to version and hash metadata for stable releases.
Component ID attribute
src/lfx/src/lfx/custom/custom_component/custom_component.py
Added component_id: str | None = None attribute documenting unique static identifier.
Metadata propagation
src/lfx/src/lfx/custom/utils.py
Extended build_component_metadata to propagate component_id into frontend node metadata when available.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 4
❌ Failed checks (1 error, 3 warnings)
Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error Tests call update_history with 5 arguments but function signature takes 4; tests expect UUID-keyed history with 'name' field, but implementation uses component names without creating 'name' field for new entries; test validates component_id uniqueness contradicting PR summary stating component_id was removed. Update test to match implementation: correct function calls to 4 arguments, use component names as keys, remove 'name' field assertions, and validate component name uniqueness instead of component_id. Fix implementation to extract component_id from metadata and include 'name' field in new entries.
Test Quality And Coverage ⚠️ Warning Tests call update_history() with 5 arguments but implementation signature accepts only 4; data structure mismatch on keying and 'name' field. Fix test calls to match actual signature: update_history(history, component_name, code_hash, version); update assertions to check history[component_name]['versions'] structure.
Test File Naming And Structure ⚠️ Warning Test file has critical structural mismatches with actual implementation regarding function parameters and data structure keys. Remove component_id parameter from update_history() calls; update assertions to use component_name as keys instead of component_id; remove test_all_real_component_ids_are_unique().
Excessive Mock Usage Warning ⚠️ Warning Test file exhibits excessive mocking of internal logic functions (_import_components, load_hash_history, save_hash_history) in integration tests, obscuring actual behavior verification and causing test assertions to mismatch implementation details. Remove mocks of internal logic functions from integration tests; keep only external dependency mocks; split into unit tests (pure logic without mocks) and integration tests (real internal calls); add tests verifying complete real flow.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective of the PR: introducing a component hash history tracking system across Langflow versions, which is the primary focus across multiple files and the stated PR objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch manage-hash-history-comp-index

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@scripts/build_hash_history.py`:
- Around line 62-66: The new-component branch that creates
history[component_name] currently only sets "versions"; update the block inside
the if component_name not in history check to also set a "name" key with the
human-readable component_name so new entries match existing
stable_hash_history.json shape (i.e., ensure history[component_name]["name"] =
component_name alongside history[component_name]["versions"] = {version_key:
code_hash}); locate and modify the creation logic referencing history,
component_name, version_key, and code_hash.
- Around line 108-120: The loop is using comp_name as the key when calling
update_history, but the history uses component UUIDs; change the call in the
loop to pass comp_details["metadata"]["component_id"] (instead of comp_name)
along with code_hash and current_version to update_history, and update the
update_history(function) signature and its docstring to indicate it expects a
component UUID (component_id) as the unique identifier rather than a component
name so existing lookups and writes use the UUID keys.

In `@src/backend/tests/unit/test_build_hash_history.py`:
- Around line 111-121: The test test_all_real_component_ids_are_unique is still
asserting uniqueness of metadata["component_id"] while the code now uses
comp_name (the dict key) as the identifier; either remove this test if
component_id is no longer used, or update it to collect comp_name keys from the
modules_dict returned by _import_components() (iterate modules_dict.items() and
collect the per-module dict keys) and assert the total list length equals the
set length to ensure comp_name uniqueness across categories; reference the
functions _import_components and main() and the variable comp_name when making
the change.
- Around line 79-109: The test_main_function assertions expect saved_history to
be keyed by component_id with a "name" field, but main() actually keys history
by component name and stores versions under that key; update the assertions in
test_main_function to check saved_history uses component names as keys (e.g.,
"MyComponent", "AnotherComponent", "ThirdComponent") and verify
saved_history["MyComponent"]["versions"]["0.1.0"] == "hash_v1" (and similarly
for the others) instead of checking for component_id keys and a separate "name"
field; also remove or reduce unnecessary patches (for example don't patch Path
if you can use tmp_path directly) so the test exercises more real behavior and
only mock true external dependencies like _import_components, load_hash_history,
save_hash_history, and get_lfx_version.
- Around line 43-76: The test must be updated to match the actual update_history
signature and storage key: call update_history with 4 args (history,
component_name, code_hash, current_version) instead of 5, replace all uses of
component_id as the lookup key with component_name (e.g. assert
history[component_name]["versions"]["0.3.0"] == code_hash_v1), remove assertions
expecting a history[...]["name"] field (implementation does not store it), and
ensure the ValueError check still uses the same version semantics but calls
update_history(history, component_name, code_hash_v1, "0.4.0") so checks target
the component_name-based history structure.

In `@src/lfx/src/lfx/custom/utils.py`:
- Around line 525-528: The hasattr(custom_component, "component_id") check is
ineffective because component_id is a class attribute and will always be
present; change the condition to verify the value is non-empty (e.g., if
custom_component.component_id is not None and custom_component.component_id !=
"") before assigning frontend_node.metadata["component_id"], and replace
logger.error with logger.warning to reflect that a missing component_id may be
expected for user-defined components; keep the same context variables
(custom_component, component_id, frontend_node.metadata, ctype_name, logger)
when implementing this change.
🧹 Nitpick comments (3)
scripts/build_hash_history.py (2)

12-16: Variable shadowing: version shadows the imported function.

The variable version on line 16 shadows the version function imported on line 14, which could cause confusion.

Proposed fix
 def get_lfx_version():
     """Get the installed lfx version."""
     from importlib.metadata import version
 
-    return version("lfx")
+    lfx_version = version("lfx")
+    return lfx_version

8-9: Consider using Path for consistency with build_component_index.py.

The history file paths are defined as strings, while build_component_index.py uses Path objects for COMPONENT_INDEX_PATH. Using Path consistently would improve maintainability.

Proposed fix
-STABLE_HISTORY_FILE = "src/lfx/src/lfx/_assets/stable_hash_history.json"
-NIGHTLY_HISTORY_FILE = "src/lfx/src/lfx/_assets/nightly_hash_history.json"
+STABLE_HISTORY_FILE = Path(__file__).parent.parent / "src" / "lfx" / "src" / "lfx" / "_assets" / "stable_hash_history.json"
+NIGHTLY_HISTORY_FILE = Path(__file__).parent.parent / "src" / "lfx" / "src" / "lfx" / "_assets" / "nightly_hash_history.json"

This would also eliminate the need for Path() wrappers on lines 106 and 122.

src/backend/tests/unit/test_build_hash_history.py (1)

6-10: Consider avoiding sys.path manipulation for imports.

Modifying sys.path at runtime is fragile and can break if the directory structure changes. Consider either:

  1. Making scripts/ a proper installable package with a pyproject.toml or setup.py
  2. Using a conftest.py fixture to handle the path setup
  3. Adding the scripts directory to PYTHONPATH in the test configuration
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a673cf and 9cec63c.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • scripts/build_component_index.py
  • scripts/build_hash_history.py
  • src/backend/base/langflow/services/telemetry/schema.py
  • src/backend/tests/unit/test_build_hash_history.py
  • src/backend/tests/unit/test_component_index_hash_history.py
  • src/lfx/src/lfx/_assets/component_index.json
  • src/lfx/src/lfx/_assets/stable_hash_history.json
  • src/lfx/src/lfx/custom/custom_component/custom_component.py
  • src/lfx/src/lfx/custom/utils.py
💤 Files with no reviewable changes (2)
  • src/backend/tests/unit/test_component_index_hash_history.py
  • src/backend/base/langflow/services/telemetry/schema.py
🧰 Additional context used
📓 Path-based instructions (3)
src/backend/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/**/*.py: Use FastAPI async patterns with await for async operations in component execution methods
Use asyncio.create_task() for background tasks and implement proper cleanup with try/except for asyncio.CancelledError
Use queue.put_nowait() for non-blocking queue operations and asyncio.wait_for() with timeouts for controlled get operations

Files:

  • src/backend/tests/unit/test_build_hash_history.py
src/backend/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)

src/backend/tests/**/*.py: Place backend unit tests in src/backend/tests/ directory, component tests in src/backend/tests/unit/components/ organized by component subdirectory, and integration tests accessible via make integration_tests
Use same filename as component with appropriate test prefix/suffix (e.g., my_component.pytest_my_component.py)
Use the client fixture (FastAPI Test Client) defined in src/backend/tests/conftest.py for API tests; it provides an async httpx.AsyncClient with automatic in-memory SQLite database and mocked environment variables. Skip client creation by marking test with @pytest.mark.noclient
Inherit from the correct ComponentTestBase family class located in src/backend/tests/base.py based on API access needs: ComponentTestBase (no API), ComponentTestBaseWithClient (needs API), or ComponentTestBaseWithoutClient (pure logic). Provide three required fixtures: component_class, default_kwargs, and file_names_mapping
Create comprehensive unit tests for all new backend components. If unit tests are incomplete, create a corresponding Markdown file documenting manual testing steps and expected outcomes
Test both sync and async code paths, mock external dependencies appropriately, test error handling and edge cases, validate input/output behavior, and test component initialization and configuration
Use @pytest.mark.asyncio decorator for async component tests and ensure async methods are properly awaited
Test background tasks using asyncio.create_task() and verify completion with asyncio.wait_for() with appropriate timeout constraints
Test queue operations using non-blocking queue.put_nowait() and asyncio.wait_for(queue.get(), timeout=...) to verify queue processing without blocking
Use @pytest.mark.no_blockbuster marker to skip the blockbuster plugin in specific tests
For database tests that may fail in batch runs, run them sequentially using uv run pytest src/backend/tests/unit/test_database.py r...

Files:

  • src/backend/tests/unit/test_build_hash_history.py
**/test_*.py

📄 CodeRabbit inference engine (Custom checks)

**/test_*.py: Review test files for excessive use of mocks that may indicate poor test design - check if tests have too many mock objects that obscure what's actually being tested
Warn when mocks are used instead of testing real behavior and interactions, and suggest using real objects or test doubles when mocks become excessive
Ensure mocks are used appropriately for external dependencies only, not for core logic
Backend test files should follow the naming convention test_*.py with proper pytest structure
Test files should have descriptive test function names that explain what is being tested
Tests should be organized logically with proper setup and teardown
Consider including edge cases and error conditions for comprehensive test coverage
Verify tests cover both positive and negative scenarios where appropriate
For async functions in backend tests, ensure proper async testing patterns are used with pytest
For API endpoints, verify both success and error response testing

Files:

  • src/backend/tests/unit/test_build_hash_history.py
🧠 Learnings (10)
📚 Learning: 2025-06-26T19:43:18.260Z
Learnt from: ogabrielluiz
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-06-26T19:43:18.260Z
Learning: In langflow custom components, the `module_name` parameter is now propagated through template building functions to add module metadata and code hashes to frontend nodes for better component tracking and debugging.

Applied to files:

  • src/lfx/src/lfx/custom/utils.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component build config updates by calling `to_frontend_node()` to get the node template, then calling `update_build_config()` to apply configuration changes

Applied to files:

  • src/lfx/src/lfx/custom/utils.py
  • src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Create comprehensive unit tests for all new backend components. If unit tests are incomplete, create a corresponding Markdown file documenting manual testing steps and expected outcomes

Applied to files:

  • src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test component versioning and backward compatibility using `file_names_mapping` fixture with `VersionComponentMapping` objects mapping component files across Langflow versions

Applied to files:

  • src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Test both sync and async code paths, mock external dependencies appropriately, test error handling and edge cases, validate input/output behavior, and test component initialization and configuration

Applied to files:

  • src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Use `monkeypatch` fixture to mock internal functions for testing error handling scenarios; validate error status codes and error message content in responses

Applied to files:

  • src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Use predefined JSON flows and utility functions from `tests.unit.build_utils` (create_flow, build_flow, get_build_events, consume_and_assert_stream) for flow execution testing

Applied to files:

  • src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Place backend unit tests in `src/backend/tests/` directory, component tests in `src/backend/tests/unit/components/` organized by component subdirectory, and integration tests accessible via `make integration_tests`

Applied to files:

  • src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-08-05T22:51:27.961Z
Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 0
File: :0-0
Timestamp: 2025-08-05T22:51:27.961Z
Learning: The TestComposioComponentAuth test in src/backend/tests/unit/components/bundles/composio/test_base_composio.py demonstrates proper integration testing patterns for external API components, including real API calls with mocking for OAuth completion, comprehensive resource cleanup, and proper environment variable handling with pytest.skip() fallbacks.

Applied to files:

  • src/backend/tests/unit/test_build_hash_history.py
📚 Learning: 2025-12-19T18:04:08.938Z
Learnt from: Jkavia
Repo: langflow-ai/langflow PR: 11111
File: src/backend/tests/unit/api/v2/test_workflow.py:10-11
Timestamp: 2025-12-19T18:04:08.938Z
Learning: In the langflow-ai/langflow repository, pytest-asyncio is configured with asyncio_mode = 'auto' in pyproject.toml. This means you do not need to decorate test functions or classes with pytest.mark.asyncio; async tests are auto-detected and run by pytest-asyncio. When reviewing tests, ensure they rely on this configuration (i.e., avoid unnecessary pytest.mark.asyncio decorators) and that tests living under any tests/ path (e.g., src/.../tests/**/*.py) follow this convention. If a test explicitly requires a different asyncio policy, document it and adjust the config accordingly.

Applied to files:

  • src/backend/tests/unit/test_build_hash_history.py
🧬 Code graph analysis (2)
scripts/build_hash_history.py (1)
src/lfx/src/lfx/interface/components.py (1)
  • import_langflow_components (465-499)
src/backend/tests/unit/test_build_hash_history.py (1)
scripts/build_hash_history.py (3)
  • main (80-123)
  • update_history (54-77)
  • _import_components (31-51)
🔇 Additional comments (5)
src/lfx/src/lfx/custom/custom_component/custom_component.py (1)

75-76: LGTM!

The new component_id attribute is cleanly added following the existing pattern for class attributes. The type annotation (str | None = None) is appropriate for an optional identifier.

src/lfx/src/lfx/_assets/stable_hash_history.json (1)

1-2132: LGTM!

The stable hash history JSON structure is well-formed with consistent schema:

  • UUID keys provide stable, unique identifiers
  • Each entry contains a human-readable name and versions map
  • Initial seeding with version 0.3.0 establishes the baseline for tracking component changes across releases
scripts/build_hash_history.py (1)

89-98: LGTM on version/mode validation.

The logic correctly enforces that:

  • Nightly updates require a dev version
  • Stable updates require a non-dev version

This prevents accidental cross-contamination of history files.

scripts/build_component_index.py (1)

157-158: LGTM!

The COMPONENT_INDEX_PATH constant provides a well-defined, centralized location for the component index output. The path construction correctly navigates from the scripts directory (Path(__file__).parent.parent) to the repository root, then to src/lfx/src/lfx/_assets/component_index.json, which exists in the codebase. The usage pattern is clean, with the path defined at module level and safely used in main() where directory creation is handled with parents=True and exist_ok=True.

src/backend/tests/unit/test_build_hash_history.py (1)

12-41: LGTM on fixture structure.

The fixture correctly mirrors the nested structure of modules_dict with categories containing components with metadata.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 16, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 16, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 16, 2026
@jordanrfrazier jordanrfrazier added this pull request to the merge queue Jan 16, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 16, 2026
@jordanrfrazier jordanrfrazier added this pull request to the merge queue Jan 16, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 16, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 16, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 16, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 20, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 20, 2026
@jordanrfrazier jordanrfrazier added this pull request to the merge queue Jan 20, 2026
Merged via the queue into main with commit 50bab0a Jan 20, 2026
90 of 91 checks passed
@jordanrfrazier jordanrfrazier deleted the manage-hash-history-comp-index branch January 20, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants