feat: updates to Docling Remote and Chunker components by ricofurtado · Pull Request #11684 · langflow-ai/langflow

ricofurtado · 2026-02-09T20:46:44Z

This pull request adds comprehensive unit tests for the ChunkDoclingDocumentComponent to ensure correct handling of HybridChunker parameters and updates the component configuration in component_index.json to support new flags and improve usability. The main focus is on supporting and testing the new merge_peers and always_emit_headings options for chunking documents.

Component configuration enhancements:

Added merge_peers and always_emit_headings as configurable attributes for the ChunkDoclingDocumentComponent, including their default values and UI metadata. (src/lfx/src/lfx/_assets/component_index.json) [1] [2]
Updated the input template for chunker to include new input types and options, improving flexibility for document chunking. (src/lfx/src/lfx/_assets/component_index.json)
Set default value fields for several integer attributes to ensure proper initialization in the UI and backend. (src/lfx/src/lfx/_assets/component_index.json) [1] [2] [3] [4] [5]

Testing improvements:

Added unit tests to verify that the ChunkDoclingDocumentComponent correctly updates its build configuration based on chunker and provider selections, and that HybridChunker receives the appropriate flags for merge_peers and always_emit_headings. (src/backend/tests/unit/components/docling/test_chunk_docling_document_component.py)

…gDocumentComponent `pragma: allowlist secret`

coderabbitai · 2026-02-09T20:47:14Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Two new boolean input parameters (merge_peers and always_emit_headings) are added to the ChunkDoclingDocumentComponent with visibility logic that displays them only when HybridChunker is the active chunker. These parameters are passed to HybridChunker initialization during document processing.

Changes

Cohort / File(s)	Summary
New ChunkDoclingDocumentComponent inputs `src/lfx/src/lfx/_assets/component_index.json`, `src/lfx/src/lfx/components/docling/chunk_docling_document.py`	Added two new boolean inputs (`merge_peers`, `always_emit_headings`) with display names, descriptions, and default values. Extended build-config logic to show/hide these inputs when HybridChunker is active. Updated HybridChunker instantiation to receive these parameters from component state.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 6

❌ Failed checks (1 error, 4 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	PR introduces new functionality (merge_peers and always_emit_headings parameters) without any corresponding test coverage.	Add unit tests for parameter validation and integration tests verifying HybridChunker receives correct parameters. Address API incompatibility of always_emit_headings parameter.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage	⚠️ Warning	PR adds two new boolean parameters to ChunkDoclingDocumentComponent but includes zero test files or test modifications.	Add unit tests validating parameter storage, visibility toggle logic, and HybridChunker instantiation; add integration tests for full chunk_documents() workflow.
Test File Naming And Structure	⚠️ Warning	The pull request adds two new component options (merge_peers and always_emit_headings) but includes no test files following standard naming conventions (test_.py or .test.ts).	Add test_chunk_docling_document.py with tests validating the new options are properly exposed, passed to HybridChunker, and handle edge cases including unsupported parameters.
Title check	⚠️ Warning	The title mentions generic updates to multiple components but the actual changes focus specifically on adding two new options (merge_peers and always_emit_headings) to ChunkDoclingDocumentComponent only.	Update the title to be more specific, such as: 'feat: add merge_peers and always_emit_headings options to ChunkDoclingDocumentComponent' to accurately reflect the primary changes.
Excessive Mock Usage Warning	❓ Inconclusive	No test files for ChunkDoclingDocumentComponent were found in the pull request to assess mock usage patterns.	Provide test file paths for ChunkDoclingDocumentComponent or clarify if this PR includes test coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chunk-docling-document-component-changes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-09T20:48:47Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	18.8% (6095/32404)	12.25% (3097/25275)	12.64% (879/6952)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
2310	0 💤	0 ❌	0 🔥	32.267s ⏱️

codecov · 2026-02-09T20:49:49Z

Codecov Report

❌ Patch coverage is 5.47945% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.30%. Comparing base (cb22542) to head (c315ced).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/lfx/src/lfx/inputs/inputs.py	5.47%	68 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (5.47%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (41.93%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11684      +/-   ##
==========================================
- Coverage   35.33%   35.30%   -0.04%     
==========================================
  Files        1525     1525              
  Lines       73302    73365      +63     
  Branches    11025    11041      +16     
==========================================
  Hits        25898    25898              
- Misses      45991    46055      +64     
+ Partials     1413     1412       -1

Flag	Coverage Δ
backend	`55.83% <ø> (-0.02%)`	⬇️
frontend	`16.98% <ø> (ø)`
lfx	`41.93% <5.47%> (-0.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/lfx/src/lfx/inputs/inputs.py	`57.95% <5.47%> (-11.60%)`	⬇️

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/lfx/src/lfx/_assets/component_index.json`:
- Line 64090: Summary: remove the unsupported always_emit_headings parameter and
input. Fix: in ChunkDoclingDocumentComponent remove the Message/Bool Input
definition for "always_emit_headings" from the inputs list and remove any
build_config toggles referencing "always_emit_headings" in update_build_config;
also remove the argument always_emit_headings=bool(self.always_emit_headings)
passed into the HybridChunker() instantiation inside chunk_documents (and any
uses of self.always_emit_headings). References to change: the inputs list entry
named "always_emit_headings", the update_build_config branch that sets
build_config["always_emit_headings"][...] and the HybridChunker(...) call in
chunk_documents.

In `@src/lfx/src/lfx/components/docling/chunk_docling_document.py`:
- Around line 183-187: The instantiation of HybridChunker is passing an
unsupported parameter always_emit_headings which will raise a TypeError; remove
the always_emit_headings argument from the HybridChunker(...) call (leave
tokenizer=tokenizer and merge_peers=bool(self.merge_peers)), or if you intend to
control heading inclusion, replace it with the supported parameter
include_heading_hierarchy and pass the appropriate boolean (e.g.,
include_heading_hierarchy=bool(self.include_heading_hierarchy)) so the
HybridChunker call uses only valid kwargs.

🧹 Nitpick comments (1)

src/lfx/src/lfx/_assets/component_index.json (1)

72454-72454: Unrelated dependency version bumps included in this PR.

Hunks 5–14 update google to 2.5.0 and vlmrun to 0.5.4 across multiple components. These changes are unrelated to the stated PR objective (adding merge_peers and always_emit_headings). Consider whether these should be in a separate PR for cleaner change tracking, or confirm they were intentionally bundled (e.g., via an index regeneration script).

src/lfx/src/lfx/_assets/component_index.json

src/lfx/src/lfx/components/docling/chunk_docling_document.py

…tComponent

…parameters

mpawlow

@ricofurtado

Code Review 1

See PR comments: (a), (b), (c)
Note: I did not perform a functional review

src/lfx/src/lfx/components/docling/chunk_docling_document.py

src/backend/tests/unit/components/docling/test_chunk_docling_document_component.py

…cases

Copilot

Pull request overview

This PR aims to add two new options to the ChunkDoclingDocumentComponent: merge_peers (to merge undersized chunks with shared metadata) and always_emit_headings (to emit headings for empty sections). However, the implementation is incomplete.

Changes:

Added merge_peers BoolInput parameter (fully implemented and working)
Added always_emit_headings BoolInput parameter (declared but not implemented)
Updated update_build_config to show/hide both parameters based on chunker selection
Added else clause for unknown chunker types (defensive programming improvement)
Updated component hash and metadata files
Added unit tests for build config behavior

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

File	Description
src/lfx/src/lfx/components/docling/chunk_docling_document.py	Added two BoolInput parameters, updated build config logic, passed merge_peers to HybridChunker, added else clause for unknown chunkers
src/lfx/src/lfx/_assets/stable_hash_history.json	Updated component hash from d84ce7ffc6cb to dfde83c23a83
src/lfx/src/lfx/_assets/component_index.json	Added merge_peers to field_order and field definitions, updated code hash and embedded code value
src/backend/tests/unit/components/docling/test_chunk_docling_document_component.py	Added tests for build config behavior with new parameters and merge_peers functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/lfx/src/lfx/components/docling/chunk_docling_document.py

src/lfx/src/lfx/_assets/component_index.json

src/backend/tests/unit/components/docling/test_chunk_docling_document_component.py

src/lfx/src/lfx/components/docling/chunk_docling_document.py

mpawlow

@ricofurtado @edwinjosechittilappilly

Code Review 2

Approved / LGTM
See PR comment (2a) for a Minor concern

mpawlow · 2026-02-23T20:36:31Z

src/lfx/src/lfx/components/docling/chunk_docling_document.py

            info=("Which chunker to use."),
            value="HybridChunker",
            real_time_refresh=True,
+            input_types=["Message"],


(2a) [Minor] Verify a piped in Message value will not cause errors

Concern: The dropdown value for Message drives update_build_config, and if a Message is connected to it instead of selecting from the dropdown, the real_time_refresh mechanism and the build_config["chunker"]["value"] check in update_build_config may not work as expected.

This is a Minor severity comment. Please feel free to optionally address or ignore

Cool!
Thanks Mike!

@HimavarshaVS lets send this to QA ? and accordingly we can update?

@mpawlow True, The idea is we would use mesasge to connect it to connect to Global variable in case if we want to switch using API call/runtime.

This has been approved by QA. Once the CI is passed, we should be good to merge

Copilot