Fix: Langfuse "parent run not found" error with agents using tools#10268
Fix: Langfuse "parent run not found" error with agents using tools#10268HzaRashid merged 6 commits intolangflow-ai:mainfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughUpdates 12 starter project JSON files by changing the Agent component code_hash from Changes
Sequence DiagramsequenceDiagram
participant Agent as Agent Component
participant Executor as AgentExecutor
participant Tool as Tool Instance
rect rgb(200, 220, 240)
Note over Agent,Tool: New Shared Callbacks Flow
Agent->>Agent: _get_shared_callbacks()<br/>(initialize once)
Agent->>Executor: run_agent(...,<br/>callbacks=shared_callbacks)
Agent->>Tool: _get_tools() returns tools<br/>with shared_callbacks applied
Executor->>Tool: execute with shared_callbacks
Tool->>Tool: callbacks attribute updated<br/>with shared callbacks
end
rect rgb(240, 220, 200)
Note over Agent,Tool: Previous Flow (comparison)
Agent->>Executor: run_agent(...,<br/>callbacks=get_langchain_callbacks())
Agent->>Tool: _get_tools() creates tools
Note over Tool: Fresh callbacks retrieved<br/>at each call site
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (8 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e268e7e to
7706f7a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py (1)
10-37: Consider addressing the Ruff ARG002 warnings.The
serializedandkwargsparameters inon_tool_start(line 27) are flagged as unused. While these are intentional for signature matching with the real callback, you can silence the warnings using_prefix or explicit# noqa: ARG002comments to keep the CI clean.Apply this diff to silence the warnings:
- def on_tool_start(self, serialized, input_str, *, run_id, parent_run_id=None, **kwargs): + def on_tool_start(self, _serialized, input_str, *, run_id, parent_run_id=None, **_kwargs): """Original implementation that would fail."""Or add
# noqa: ARG002to the line if you prefer to keep the original names.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/backend/base/langflow/services/tracing/langfuse.py(3 hunks)src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
{src/backend/**/*.py,tests/**/*.py,Makefile}
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code
Files:
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.pysrc/backend/base/langflow/services/tracing/langfuse.py
src/backend/tests/unit/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
Test component integration within flows using create_flow, build_flow, and get_build_events utilities
Files:
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py
src/backend/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)
src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...
Files:
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py
**/@(test_*.py|*.test.@(ts|tsx))
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
**/@(test_*.py|*.test.@(ts|tsx)): Check if tests have too many mock objects that obscure what's actually being tested
Warn when mocks are used instead of testing real behavior and interactions
Suggest using real objects or test doubles when mocks become excessive
Ensure mocks are used appropriately for external dependencies, not core logic
Recommend integration tests when unit tests become overly mocked
Test files should have descriptive test function names explaining what is tested
Tests should be organized logically with proper setup and teardown
Include edge cases and error conditions for comprehensive coverage
Verify tests cover both positive and negative scenarios where appropriate
Tests should cover the main functionality being implemented
Ensure tests are not just smoke tests but actually validate behavior
For API endpoints, verify both success and error response testing
Files:
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py
**/test_*.py
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
**/test_*.py: Check that backend test files follow naming convention: test_.py
Backend tests should be named test_.py and follow proper pytest structure
For async Python code, ensure proper async testing patterns (pytest) are used
Backend tests should follow pytest conventions; frontend tests should use Playwright
Files:
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py
🧬 Code graph analysis (2)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py (1)
src/backend/base/langflow/services/tracing/langfuse.py (5)
LangFuseTracer(26-184)_create_dummy_parent(187-204)on_tool_start(192-200)ready(51-52)get_langchain_callback(157-175)
src/backend/base/langflow/services/tracing/langfuse.py (1)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py (1)
on_tool_start(27-36)
🪛 GitHub Actions: Ruff Style Check
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py
[error] 27-27: ARG002 Unused method argument: serialized. (ruff) Command: uv run --only-dev ruff check --output-format=github .
🪛 GitHub Check: Ruff Style Check (3.13)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py
[failure] 306-306: Ruff (ARG002)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py:306:68: ARG002 Unused method argument: mock_langfuse_client
[failure] 64-64: Ruff (ARG001)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py:64:21: ARG001 Unused function argument: mock_langfuse_client
[failure] 32-32: Ruff (EM101)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py:32:29: EM101 Exception must not use a string literal, assign to variable first
[failure] 32-32: Ruff (TRY003)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py:32:19: TRY003 Avoid specifying long messages outside the exception class
[failure] 32-32: Ruff (TRY002)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py:32:19: TRY002 Create your own exception
[failure] 27-27: Ruff (ARG002)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py:27:85: ARG002 Unused method argument: kwargs
[failure] 27-27: Ruff (ARG002)
src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py:27:29: ARG002 Unused method argument: serialized
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Lint Backend / Run Mypy (3.10)
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Frontend Tests / Determine Test Suites and Shard Distribution
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Test Starter Templates
🔇 Additional comments (15)
src/backend/base/langflow/services/tracing/langfuse.py (4)
23-23: LGTM!The
Lockimport is correctly used in the_create_dummy_parentfunction to implement thread-safe parent span creation.
158-164: Clear documentation of the architectural workaround.The docstring effectively explains why the wrapper is necessary—Langfuse callbacks have an architectural issue where the agent's run_id is cleaned up before tools execute, causing "parent run not found" errors.
168-175: Verify that per-call lock creation aligns with your concurrency model.The current implementation creates a new wrapper with a fresh lock each time
get_langchain_callback()is called. This means:
- Each callback instance has its own lock
- Multiple concurrent callbacks won't synchronize with each other
- Only concurrent operations within the same callback instance are protected
This is likely correct if each flow/trace gets its own callback instance, but if multiple flows could share callbacks and need coordinated parent creation, you would need a shared lock at the class or module level.
Please confirm: is a new callback instance created per flow/trace, or can callbacks be shared across flows? If they're per-flow, the current approach is correct. If shared, you may need a different locking strategy.
187-204: Double-checked locking pattern is correctly implemented.The implementation properly handles concurrent parent creation:
- First check outside the lock (line 193) to avoid lock contention in the common case
- Lock acquisition (line 194) to serialize competing threads
- Second check inside the lock (line 195) to prevent duplicate creation
- Parent span creation with clear metadata (lines 196-199)
The shallow copy of
base_callback.__dict__(line 203) intentionally shares mutable state between the wrapper and original callback, preserving the callback's internal tracking.One minor note: The
instance_lockis created in the closure and will be unique per_create_dummy_parent()call, which is correct for per-instance synchronization.src/backend/tests/unit/services/tracing/test_langfuse_concurrency.py (11)
39-61: LGTM!The
mock_langfuse_clientfixture provides comprehensive mocking of the Langfuse client, trace, and span objects, including the language model callback retrieval. The health check mock ensures the tracer initialization succeeds.
64-78: Themock_langfuse_clientfixture dependency is correctly used.Ruff flags this as an unused argument (ARG001), but this is a pytest fixture dependency pattern where the fixture is automatically applied by pytest's dependency injection system. The presence of the parameter ensures the mock is set up before the tracer is created.
84-111: Excellent negative test for race conditions.This test effectively demonstrates that without the locking wrapper, concurrent tool starts fail with "parent run not found" errors when the parent span is missing.
113-141: Strong validation of the auto-creation mechanism.The test confirms that:
- Both concurrent tools succeed after wrapping
- The parent span is created exactly once (line 139)
- The auto-created parent is properly marked with metadata (line 140)
142-178: Thorough thread-based concurrency test.Using actual threads (not asyncio) to test the locking mechanism is excellent—it validates that the double-checked locking pattern works correctly with OS-level thread scheduling. The verification that only one parent is created despite 10 concurrent attempts (line 176) confirms the lock is effective.
180-218: Realistic simulation of the agent+tools issue.This test closely mirrors the real-world scenario described in issue #10242:
- Agent starts and is assigned a run_id
- Agent finishes and its run_id is removed (line 190)
- Tools continue executing with the now-missing parent_run_id (line 202)
- Wrapper auto-creates the missing parent, allowing tools to complete
220-231: LGTM!The test verifies the integration point—that
get_langchain_callback()returns a properly wrapped handler with the expected attributes.
232-261: Good sequential validation of the double-check pattern.This test verifies that after the first tool creates the parent, the second tool finds it without creating a duplicate. Combined with the concurrent tests (lines 142-178), this provides comprehensive coverage of the locking logic.
263-287: Valuable stress test for state integrity.Testing concurrent token streaming ensures that the wrapper doesn't introduce race conditions in other callback operations that access
wrapped_callback.runs.
292-304: LGTM!The integration test confirms that the tracer returns a
DummyParentwrapper (line 303), validating the end-to-end behavior.
306-322: Themock_langfuse_clientfixture dependency is correctly used.Similar to line 64, Ruff flags this as unused (ARG002), but it's a pytest fixture dependency. The fixture ensures the mock Langfuse client is active during the test.
480994b to
3a76ebc
Compare
|
ogabrielluiz
left a comment
There was a problem hiding this comment.
Hey @HzaRashid
Have you tried updating the Langfuse package?
will take a look! |
|
Hi @ogabrielluiz! Upgrading to v2.6.10 (from v2.53.9) does not fix the issue. Upgrading to a 3.x version would bring some breaking changes which I am not sure how to adapt our code for, perhaps due to my lack of familiarity with Langfuse. Maybe we can discuss on Slack? If upgrading to v3.x does not work, I can open an issue and/or pr on langfuse-python, and we can use the current wrapper solution as a temporary patch until its fixed by the langfuse team? |
I think this was an implementation by made by us. Could you elaborate what the breaking changes? It seems the code in in their langchain example(https://langfuse.com/integrations/frameworks/langchain) didn't change. I'm just worried about increasing complexity since this wasn't a problem until a few months ago. |
3a76ebc to
1889595
Compare
|
@coderabbitai redo the summary/walkthrough as it is outdated |
|
@HzaRashid, I'll regenerate the summary and walkthrough for this PR to reflect the latest changes. ✅ Actions performedSummary regeneration triggered. |
| for tool in tools or []: | ||
| if hasattr(tool, "callbacks"): | ||
| tool.callbacks = callbacks |
There was a problem hiding this comment.
nit: This could be a function.
045d03e to
a22998c
Compare
328f0af to
f2b3a77
Compare
…that the parent run is persisted during tool calls. chore: build component index refactor tool callback setting loop into a function chore: update component index recover block in base/agent.py add comments and use fresh callbacks when exposing agent as a tool improve comment in agent.py
5715f7a to
d3ab5d6
Compare
ogabrielluiz
left a comment
There was a problem hiding this comment.
Looks great @HzaRashid !
…angflow-ai#10268) * fix: Use the same langchain callbacks for the agent and its tools so that the parent run is persisted during tool calls. chore: build component index refactor tool callback setting loop into a function chore: update component index recover block in base/agent.py add comments and use fresh callbacks when exposing agent as a tool improve comment in agent.py * chore: update component index * chore: update component index * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) --------- Co-authored-by: Hamza Rashid <hzaras@IBM-JZ7K264.localdomain> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>



Closes #10242, error logs found in #10058.
When using the agent component with Langfuse tracing, its
run_idis not persisted during tool calls, causing a"parent run not found"error in Langfuse'sLangchainCallbackHandler'son_tool_startmethod. This happens because when tool components get exposed as tools in theirget_toolsmethod,TracingService.get_langchain_callbacks()is run, and the correspondingget_langchain_callback()handler for Langfuse instantiates a newLangchainCallbackHandlerinstance, and inside of its_init_method it setsself.runs = {}, resulting in a missingparent_run_id(AgentExecutor). This is fixed by using the same Langchain callback for the agent and its tools instead of instantiating a new callback every time a tool component is exposed.Summary by CodeRabbit
Release Notes
New Features
Improvements