Skip to content

feat: Docling components#8394

Merged
ogabrielluiz merged 27 commits intolangflow-ai:mainfrom
dolfim-ibm:add-docling-component
Jun 24, 2025
Merged

feat: Docling components#8394
ogabrielluiz merged 27 commits intolangflow-ai:mainfrom
dolfim-ibm:add-docling-component

Conversation

@dolfim-ibm
Copy link
Contributor

@dolfim-ibm dolfim-ibm commented Jun 6, 2025

  • Convert with Docling locally
  • Export DoclingDocument to Markdown, HTML, Text, DocTags
  • Chunk DoclingDocument
  • Docling Serve component

image

Summary by CodeRabbit

  • New Features

    • Added comprehensive Docling document processing components for loading, chunking, local and remote processing, and exporting with customizable options.
    • Integrated new Docling icon with eager and lazy loading support.
    • Introduced a new Docling sidebar entry in the user interface.
  • Chores

    • Updated dependencies to include Docling and enforced minimum version requirements for compatibility.

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 6, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes introduce a new "Docling" integration across backend and frontend. Backend additions include new components for chunking, inlining, exporting, loading, and remote processing of Docling documents, along with utility functions and dependency updates. Frontend changes add a Docling icon, update icon mappings, and extend sidebar bundles to include Docling.

Changes

Files/Groups Change Summary
pyproject.toml Added "docling>=2.36.1" dependency; introduced [tool.uv] section to override "python-pptx" version.
src/backend/base/langflow/components/docling/__init__.py New module importing and exporting five Docling-related components.
src/backend/base/langflow/components/docling/chunk_docling_document.py Added ChunkDoclingDocumentComponent for splitting documents into chunks using Docling chunkers.
src/backend/base/langflow/components/docling/docling_inline.py Added DoclingInlineComponent for local document processing with configurable pipeline and OCR options.
src/backend/base/langflow/components/docling/export_docling_document.py Added ExportDoclingDocumentComponent for exporting Docling documents in various formats.
src/backend/base/langflow/components/docling/load_docling_document.py Added LoadDoclingDocumentComponent for loading DoclingDocument objects from JSON files.
src/backend/base/langflow/components/docling/docling_remote.py Added DoclingRemoteComponent for remote processing of documents via Docling Serve API with concurrency and retry logic.
src/backend/base/langflow/components/docling/_utils.py Added utility function extract_docling_documents for extracting DoclingDocument objects from various input types.
src/frontend/src/icons/Docling/Docling.jsx Added new React SVG component SvgDocling rendering the Docling icon.
src/frontend/src/icons/Docling/index.tsx Added DoclingIcon component using forwardRef to wrap SvgDocling.
src/frontend/src/icons/eagerIconImports.ts Imported DoclingIcon and added it to eagerIconsMapping.
src/frontend/src/icons/lazyIconImports.ts Added lazy-loaded "Docling" entry to lazyIconsMapping.
src/frontend/src/utils/styleUtils.ts Added Docling entry to the SIDEBAR_BUNDLES array for sidebar integration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant Backend
    participant DoclingLib
    participant DoclingServeAPI

    User->>Frontend: Selects Docling feature (chunk, inline, export, load, remote)
    Frontend->>Backend: Sends request with files/data and Docling parameters
    alt Local processing
        Backend->>DoclingLib: Processes documents (chunking, inlining, exporting, loading)
        DoclingLib-->>Backend: Returns processed DoclingDocument(s) or export result
    else Remote processing
        Backend->>DoclingServeAPI: Sends base64 encoded files for async conversion
        DoclingServeAPI-->>Backend: Returns task IDs
        Backend->>DoclingServeAPI: Polls task status with retry logic
        DoclingServeAPI-->>Backend: Returns conversion results
        Backend->>DoclingLib: Validates and parses DoclingDocument JSON
    end
    Backend-->>Frontend: Returns processed data/results
    Frontend-->>User: Displays Docling results with Docling icon in sidebar
Loading

Suggested labels

size:XXL, lgtm

✨ Finishing Touches
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the enhancement New feature or request label Jun 6, 2025
Signed-off-by: DKL <dkl@zurich.ibm.com>
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 6, 2025
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 6, 2025
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 8, 2025
@dolfim-ibm dolfim-ibm marked this pull request as ready for review June 8, 2025 20:11
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 8, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (8)
pyproject.toml (1)

222-227: Note temporary override for python-pptx.

This override in [tool.uv] forces python-pptx>=1.0.2 to address compatibility with document processing. Consider adding a TODO to remove this once upstream fixes are released.

src/backend/base/langflow/components/docling/export_docling_document.py (4)

48-48: Fix typo in info text.

There's a typo in the info text: "betweek" should be "between".

-            info="Add this placeholder betweek pages in the markdown output.",
+            info="Add this placeholder between pages in the markdown output.",

66-127: Consider refactoring to reduce complexity.

The static analysis correctly identifies that this method has too many branches (16/12). The input validation logic could be extracted into a separate method to improve readability and maintainability.

Consider extracting the document extraction logic into a helper method:

+    def _extract_documents(self) -> list[DoclingDocument]:
+        from docling_core.types.doc import DoclingDocument
+        
+        if isinstance(self.data_inputs, DataFrame):
+            if not len(self.data_inputs):
+                msg = "DataFrame is empty"
+                raise TypeError(msg)
+            try:
+                return self.data_inputs[self.doc_key].to_list()
+            except Exception as e:
+                msg = f"Error extracting DoclingDocument from DataFrame: {e}"
+                raise TypeError(msg) from e
+        # ... rest of extraction logic
+        
     def export_document(self) -> list[Data]:
-        from docling_core.types.doc import DoclingDocument, ImageRefMode
-
-        documents: list[DoclingDocument] = []
-        # ... complex validation logic
+        from docling_core.types.doc import ImageRefMode
+        
+        documents = self._extract_documents()
         # ... export logic
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 66-66: Too many branches (16/12)

(R0912)


122-122: Address the TODO comment.

The TODO indicates missing metadata functionality. This could enhance the exported data's usefulness.

Would you like me to help implement the metadata addition functionality or open a new issue to track this enhancement?


125-125: Improve error message accuracy.

The error message mentions "splitting text" but this method is exporting documents, not splitting them.

-            msg = f"Error splitting text: {e}"
+            msg = f"Error exporting document: {e}"
src/backend/base/langflow/components/docling/chunk_docling_document.py (2)

11-11: Fix typo in description.

There's a typo in the description: "DocumentDocument" should be "DoclingDocument".

-    description: str = "Use the DocumentDocument chunkers to split the document into chunks."
+    description: str = "Use the DoclingDocument chunkers to split the document into chunks."

45-46: Remove unused helper method.

The _docs_to_data method is defined but never used in this component. Consider removing it to reduce code clutter.

-    def _docs_to_data(self, docs) -> list[Data]:
-        return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]
-
src/backend/base/langflow/components/docling/load_docling_document.py (1)

27-52: Consider the commented text export line.

The implementation is correct with proper error handling and local imports. However, there's a commented line for text export that might indicate incomplete functionality.

# "text": doc.export_to_markdown(),

Consider either removing this comment if the functionality isn't needed, or implementing it if it provides value to downstream components.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e23e543 and 2c86c05.

⛔ Files ignored due to path filters (3)
  • src/frontend/package-lock.json is excluded by !**/package-lock.json
  • src/frontend/src/icons/Docling/Docling.svg is excluded by !**/*.svg
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • pyproject.toml (2 hunks)
  • src/backend/base/langflow/components/docling/__init__.py (1 hunks)
  • src/backend/base/langflow/components/docling/chunk_docling_document.py (1 hunks)
  • src/backend/base/langflow/components/docling/docling_inline.py (1 hunks)
  • src/backend/base/langflow/components/docling/export_docling_document.py (1 hunks)
  • src/backend/base/langflow/components/docling/load_docling_document.py (1 hunks)
  • src/frontend/src/icons/Docling/Docling.jsx (1 hunks)
  • src/frontend/src/icons/Docling/index.tsx (1 hunks)
  • src/frontend/src/icons/eagerIconImports.ts (2 hunks)
  • src/frontend/src/icons/lazyIconImports.ts (1 hunks)
  • src/frontend/src/utils/styleUtils.ts (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
src/frontend/src/icons/Docling/index.tsx (1)
src/frontend/src/icons/Docling/Docling.jsx (1)
  • SvgDocling (1-336)
src/frontend/src/icons/eagerIconImports.ts (1)
src/frontend/src/icons/Docling/index.tsx (1)
  • DoclingIcon (4-9)
src/backend/base/langflow/components/docling/__init__.py (4)
src/backend/base/langflow/components/docling/chunk_docling_document.py (1)
  • ChunkDoclingDocumentComponent (9-119)
src/backend/base/langflow/components/docling/docling_inline.py (1)
  • DoclingInlineComponent (6-130)
src/backend/base/langflow/components/docling/export_docling_document.py (1)
  • ExportDoclingDocumentComponent (6-131)
src/backend/base/langflow/components/docling/load_docling_document.py (1)
  • LoadDoclingDocumentComponent (7-52)
src/backend/base/langflow/components/docling/chunk_docling_document.py (4)
src/backend/base/langflow/inputs/inputs.py (3)
  • DropdownInput (467-491)
  • HandleInput (76-87)
  • MessageTextInput (205-256)
src/backend/base/langflow/schema/data.py (1)
  • Data (23-275)
src/backend/base/langflow/schema/dataframe.py (1)
  • DataFrame (11-206)
src/backend/base/langflow/components/docling/export_docling_document.py (1)
  • as_dataframe (130-131)
🪛 Biome (1.9.4)
src/frontend/src/icons/Docling/index.tsx

[error] 6-6: Don't use '{}' as a type.

Prefer explicitly define the object shape. '{}' means "any non-nullable value".

(lint/complexity/noBannedTypes)

🪛 Pylint (3.3.7)
src/backend/base/langflow/components/docling/chunk_docling_document.py

[refactor] 48-48: Too many branches (16/12)

(R0912)

src/backend/base/langflow/components/docling/export_docling_document.py

[refactor] 66-66: Too many branches (16/12)

(R0912)

src/backend/base/langflow/components/docling/docling_inline.py

[refactor] 73-73: Too many local variables (17/15)

(R0914)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Optimize new Python code in this PR
  • GitHub Check: Update Starter Projects
🔇 Additional comments (11)
pyproject.toml (1)

130-130: Ensure Docling dependency version compatibility.

Verify that docling>=2.36.1 is compatible with the new components and does not introduce conflicts with existing dependencies.

src/frontend/src/utils/styleUtils.ts (1)

259-259: Register Docling in sidebar bundles.

The new { display_name: "Docling", name: "docling", icon: "Docling" } entry correctly integrates the Docling feature set into the sidebar.

src/frontend/src/icons/eagerIconImports.ts (2)

27-27: Import DoclingIcon for eager loading.

The new import import { DoclingIcon } from "@/icons/Docling"; correctly adds Docling to the eager icon registry.


145-145: Map Docling to DoclingIcon.

Adding "Docling": DoclingIcon to eagerIconsMapping ensures the Docling icon is available for immediate rendering.

src/frontend/src/icons/lazyIconImports.ts (1)

73-74: Add lazy-loaded Docling icon entry.

The new "Docling": () => import("@/icons/Docling").then((mod) => ({ default: mod.DoclingIcon })), enables Docling icon to be fetched on demand.

src/backend/base/langflow/components/docling/__init__.py (1)

1-11: Well-structured package initialization.

The package initialization follows Python best practices with clear imports and a properly defined __all__ list that matches the imported components. This provides a clean public API for the Docling components module.

src/frontend/src/icons/Docling/Docling.jsx (1)

1-338: Well-implemented SVG icon component.

The React component follows best practices with proper props spreading and scalable dimensions. The complex SVG graphics are well-structured with appropriate use of gradients, transformations, and embedded imagery for the Docling brand representation.

src/backend/base/langflow/components/docling/chunk_docling_document.py (1)

96-114: Excellent chunking implementation with rich metadata.

The chunking logic is well-implemented, using the contextualize method to enrich chunks and properly extracting metadata including document ID and item references. The error handling appropriately catches and re-raises exceptions with descriptive messages.

src/backend/base/langflow/components/docling/load_docling_document.py (1)

1-26: LGTM! Well-structured component definition.

The component metadata, inheritance, and input/output definitions are properly implemented. The restriction to JSON files aligns with the component's purpose of loading DoclingDocument objects.

src/backend/base/langflow/components/docling/docling_inline.py (2)

48-71: LGTM! Well-configured input options.

The input definitions provide good configurability for different Docling pipelines and OCR engines while maintaining sensible defaults.


113-130: LGTM! Solid file processing and result handling.

The conversion logic correctly handles file filtering, processes documents through the converter, and properly maps results to Data objects with appropriate error handling.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 9, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/backend/base/langflow/components/docling/docling_inline.py (1)

72-129: 🛠️ Refactor suggestion

Address the complexity issue by refactoring converter setup.

The method has too many local variables (17/15 limit) as flagged by static analysis. This is the same issue identified in previous reviews.

Extract the pipeline configuration logic into separate class methods to reduce complexity:

+    def _create_standard_pipeline_options(self) -> "PdfPipelineOptions":
+        from docling.datamodel.pipeline_options import OcrOptions, PdfPipelineOptions
+        from docling.models.factories import get_ocr_factory
+        
+        pipeline_options = PdfPipelineOptions()
+        pipeline_options.do_ocr = self.ocr_engine != ""
+        
+        if pipeline_options.do_ocr:
+            ocr_factory = get_ocr_factory(allow_external_plugins=False)
+            ocr_options: OcrOptions = ocr_factory.create_options(kind=self.ocr_engine)
+            pipeline_options.ocr_options = ocr_options
+            
+        return pipeline_options
+
+    def _create_vlm_pipeline_options(self) -> "VlmPipelineOptions":
+        from docling.datamodel.pipeline_options import VlmPipelineOptions
+        return VlmPipelineOptions()
+
+    def _get_converter(self) -> "DocumentConverter":
+        from docling.datamodel.base_models import InputFormat
+        from docling.document_converter import DocumentConverter, FormatOption, PdfFormatOption
+        from docling.pipeline.vlm_pipeline import VlmPipeline
+        
+        if self.pipeline == "standard":
+            pipeline_options = self._create_standard_pipeline_options()
+            pdf_format_option = PdfFormatOption(pipeline_options=pipeline_options)
+        elif self.pipeline == "vlm":
+            pipeline_options = self._create_vlm_pipeline_options()
+            pdf_format_option = PdfFormatOption(pipeline_cls=VlmPipeline, pipeline_options=pipeline_options)
+        
+        format_options: dict[InputFormat, FormatOption] = {
+            InputFormat.PDF: pdf_format_option,
+            InputFormat.IMAGE: pdf_format_option,
+        }
+        
+        return DocumentConverter(format_options=format_options)

     def process_files(self, file_list: list[BaseFileComponent.BaseFile]) -> list[BaseFileComponent.BaseFile]:
         from docling.datamodel.base_models import ConversionStatus
-        
-        def _get_converter() -> DocumentConverter:
-            # Remove the nested function and complex logic
         
         file_paths = [file.path for file in file_list if file.path]
         
         if not file_paths:
             self.log("No files to process.")
             return file_list
         
-        converter = _get_converter()
+        converter = self._get_converter()
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 72-72: Too many local variables (17/15)

(R0914)

🧹 Nitpick comments (1)
src/backend/base/langflow/components/docling/docling_inline.py (1)

47-66: LGTM! Well-structured input configuration.

The inputs provide appropriate configuration options for Docling pipelines and OCR engines. The TODO comment indicates good planning for future extensibility.

Would you like me to help implement additional Docling options mentioned in the TODO comment?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c86c05 and f847efd.

📒 Files selected for processing (1)
  • src/backend/base/langflow/components/docling/docling_inline.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
src/backend/base/langflow/components/docling/docling_inline.py

[refactor] 72-72: Too many local variables (17/15)

(R0914)

🔇 Additional comments (3)
src/backend/base/langflow/components/docling/docling_inline.py (3)

1-13: LGTM! Clean component definition with proper metadata.

The imports are appropriate and the class definition follows good practices with comprehensive metadata including documentation URL and proper inheritance.


15-45: LGTM! Comprehensive file format support.

The VALID_EXTENSIONS list correctly covers a wide range of document formats supported by Docling, and the duplicate "png" issue from previous reviews has been resolved.


68-70: LGTM! Appropriate output configuration.

Simple and clean output configuration that properly extends the base component outputs.

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
@github-actions github-actions bot removed the enhancement New feature or request label Jun 9, 2025
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 23, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 23, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 23, 2025
@dolfim-ibm
Copy link
Contributor Author

@ogabrielluiz @rodrigosnader the PR is updated and ready for review from our side.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 23, 2025
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 24, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 24, 2025
@ogabrielluiz ogabrielluiz enabled auto-merge June 24, 2025 17:47
@ogabrielluiz ogabrielluiz added this pull request to the merge queue Jun 24, 2025
Merged via the queue into langflow-ai:main with commit 6631de2 Jun 24, 2025
75 of 76 checks passed
Khurdhula-Harshavardhan pushed a commit to JigsawStack/langflow that referenced this pull request Jul 1, 2025
* initial DoclingComponent

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Correct Docling icon style properties.

Signed-off-by: DKL <dkl@zurich.ibm.com>

* add file_path

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add load from json and export to various formats

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add chunking component

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update src/backend/base/langflow/components/docling/docling_inline.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* add Docling Serve component

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* apply some suggestions

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update src/backend/base/langflow/components/docling/_utils.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update src/backend/base/langflow/components/docling/docling_remote.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* add check for DoclingDocument in list

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix import

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add maximum poll timeout and better checks for the retry logic

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add updated starter_projects

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* refactor _get_converter

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* return only DataFrame

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove LoadDoclingDocument

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more options in the chunk component

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move docling imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* [autofix.ci] apply automated fixes

* move utils to langflow.base

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: DKL <dkl@zurich.ibm.com>
Co-authored-by: DKL <dkl@zurich.ibm.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants