Skip to content

feat: Add specialized handler for tool calling improvements#10857

Merged
Cristhianzl merged 21 commits intomainfrom
cz/improve-ibm-models
Jan 8, 2026
Merged

feat: Add specialized handler for tool calling improvements#10857
Cristhianzl merged 21 commits intomainfrom
cz/improve-ibm-models

Conversation

@Cristhianzl
Copy link
Member

@Cristhianzl Cristhianzl commented Dec 3, 2025

IBM WatsonX Tool Calling Improvements

Why This Was Needed

OpenAI models (GPT-4, etc.) were trained specifically for tool/function calling with a structured API. They understand when to call tools, how to format arguments as JSON, and when to stop and provide a final answer.

IBM WatsonX models (Granite, Llama, Mistral running on WatsonX) have different training and API behavior:

  • They weren't originally designed for native tool calling
  • The tool_choice parameter behaves differently than OpenAI's implementation
  • They tend to output tool calls as text (<tool_call>...</tool_call>) instead of using the structured API
  • They struggle with multi-step reasoning that requires chaining tool results

Problem

IBM WatsonX models had significant issues with tool calling in the Agent component:

  1. Models couldn't chain multiple tools - When asked questions requiring 2+ tools, the model would only call one or output tool calls as text
  2. Infinite loops with tool_choice='required' - Models couldn't provide final answers
  3. Text-based tool calls - Models output <tool_call> tags as text instead of using the native API
  4. Placeholder syntax - Models used <result-from-previous> instead of actual values

Solution

Created a specialized handler (ibm_granite_handler.py) with:

  • Dynamic tool_choice switching: Forces tool_choice='required' for first N iterations to ensure tools are called, then switches to 'auto' to allow final answers
  • Single tool call enforcement: WatsonX models only support one tool at a time, so we keep only the first call
  • Placeholder detection: Detects and handles when models use placeholder syntax in arguments
  • Enhanced system prompts: Adds explicit instructions for proper tool usage

Files Changed

  • src/lfx/src/lfx/components/langchain_utilities/ibm_granite_handler.py - New handler
  • src/lfx/src/lfx/components/langchain_utilities/tool_calling.py - Integration with Agent
  • src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py - Tests

Usage

The handler is automatically applied when using Granite models with the Tool Calling Agent component. No configuration needed.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change introduces IBM WatsonX/Granite-specific tool calling support with model detection, enhanced system prompts for multi-tool scenarios, placeholder detection in tool arguments, and a dynamic agent creator that alternates tool-choice binding based on iteration depth. The feature integrates into the existing tool-calling pipeline and is covered by comprehensive unit tests.

Changes

Cohort / File(s) Summary
IBM Granite Handler – New Module
src/lfx/src/lfx/components/langchain_utilities/ibm_granite_handler.py
New module providing WatsonX/Granite model detection (is_watsonx_model, is_granite_model), enhanced system prompt construction (get_enhanced_system_prompt), placeholder detection in tool arguments (detect_placeholder_in_args), and dynamic agent creation with tool-choice switching (create_granite_agent / create_watsonx_agent alias).
Tool Calling Integration
src/lfx/src/lfx/components/langchain_utilities/tool_calling.py
Integrated Granite handler imports and conditional logic: detects Granite models and enhances system prompts before building messages; routes to Granite-specific agent creator when applicable, else falls back to default tool-calling agent.
Agent Base Refactoring
src/lfx/src/lfx/base/agents/agent.py
Removed explicit empty input_dict initialization and replaced with a comment; existing population logic unchanged.
Granite Handler Unit Tests
src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py
Comprehensive test module validating model detection strategies, prompt augmentation logic, placeholder pattern matching, Granite agent creation, dynamic iteration behavior, tool-choice binding, and edge cases across multiple test classes and parameterized scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ToolCalling
    participant GraniteHandler
    participant LLM
    participant Agent
    participant Tools

    Client->>ToolCalling: create_agent_runnable(llm, tools, prompt)
    
    rect rgb(200, 220, 240)
    Note over ToolCalling,GraniteHandler: Granite Model Detection & Prompt Enhancement
    ToolCalling->>GraniteHandler: is_granite_model(llm)
    GraniteHandler-->>ToolCalling: bool result
    
    alt Granite Model Detected
        ToolCalling->>GraniteHandler: get_enhanced_system_prompt(base_prompt, tools)
        GraniteHandler-->>ToolCalling: enhanced_prompt with tool instructions
    end
    end
    
    rect rgb(240, 220, 200)
    Note over ToolCalling,Agent: Agent Creation
    alt Granite Model with Tools
        ToolCalling->>GraniteHandler: create_granite_agent(llm, tools, prompt)
        
        loop For Each Iteration
            rect rgb(220, 240, 220)
            Note over GraniteHandler,LLM: Dynamic Tool Choice Binding
            GraniteHandler->>LLM: Select tool_choice (required/auto based on iteration)
            LLM->>LLM: Bind tools with selected tool_choice
            end
            
            GraniteHandler->>LLM: Invoke with input & agent_scratchpad
            LLM-->>GraniteHandler: tool_call or response
            
            GraniteHandler->>GraniteHandler: detect_placeholder_in_args(tool_calls)
            
            alt Placeholder Detected
                GraniteHandler->>GraniteHandler: Inject corrective SystemMessage
            end
            
            GraniteHandler->>Tools: Execute tool_calls
            Tools-->>GraniteHandler: tool results
        end
        
        GraniteHandler-->>ToolCalling: RunnableLambda agent
    else Default Model
        ToolCalling->>Agent: create_tool_calling_agent(llm, tools, prompt)
        Agent-->>ToolCalling: default agent
    end
    end
    
    ToolCalling-->>Client: agent_runnable
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30–45 minutes

  • ibm_granite_handler.py: New module with multiple interconnected functions; the create_granite_agent function contains complex logic around dynamic LLM binding, iteration tracking, and placeholder detection that warrants careful review.
  • Placeholder detection patterns: The PLACEHOLDER_PATTERN regex and detect_placeholder_in_args logic should be validated against the test cases to ensure correctness.
  • Dynamic tool-choice switching: The iteration-depth-based toggling between tool_choice="required" and tool_choice="auto" in dynamic_invoke needs verification of state management and conditional routing.
  • Integration points in tool_calling.py: While straightforward, confirm that Granite-specific branching does not inadvertently affect non-Granite model paths.
  • Test file scope: The extensive test module (800+ lines) is thorough but should be spot-checked for coverage of edge cases, especially around nested/mock structures and mixed-case matching.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Excessive Mock Usage Warning ❓ Inconclusive Test file cannot be accessed to assess mock usage patterns and determine if mocking is excessive or appropriately targeted at external dependencies versus core business logic. Examine actual test file to quantify mocks, verify core business logic uses real objects, and ensure mocks target external dependencies rather than internal control flow.
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Add specialized handler for tool calling improvements' accurately reflects the main change: introduction of a new IBM Granite handler module with specialized tool-calling logic.
Docstring Coverage ✅ Passed Docstring coverage is 96.55% which is sufficient. The required threshold is 80.00%.
Test Coverage For New Implementations ✅ Passed Comprehensive test coverage verified with new test file (test_ibm_granite_handler.py) containing 9 test classes covering all 6 new functions, including unit tests, integration tests, parameterized tests, and edge-case coverage.
Test Quality And Coverage ✅ Passed Test suite demonstrates comprehensive coverage with 200+ lines testing all 5 public functions across 9 well-structured test classes, featuring parametrized testing (16+ scenarios), explicit error handling validation, extensive edge case coverage, 80+ assertions, and integration tests with ToolCallingAgentComponent.
Test File Naming And Structure ✅ Passed The test file test_ibm_granite_handler.py fully adheres to all specified naming, structure, and content requirements with 50+ descriptive test methods covering positive/negative scenarios and edge cases.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cz/improve-ibm-models

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 3, 2025
@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.77%. Comparing base (ed624d7) to head (1a946ba).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/lfx/src/lfx/base/agents/agent.py 0.00% 3 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (40.20%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #10857      +/-   ##
==========================================
+ Coverage   33.76%   33.77%   +0.01%     
==========================================
  Files        1400     1400              
  Lines       66341    66342       +1     
  Branches     9791     9791              
==========================================
+ Hits        22401    22409       +8     
+ Misses      42799    42792       -7     
  Partials     1141     1141              
Flag Coverage Δ
backend 53.16% <ø> (+0.04%) ⬆️
frontend 15.57% <ø> (ø)
lfx 40.20% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/lfx/src/lfx/base/agents/agent.py 27.68% <0.00%> (-0.16%) ⬇️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 3, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/lfx/src/lfx/components/langchain_utilities/tool_calling.py (1)

56-62: Mutating self.system_prompt in-place may cause issues if called multiple times.

If create_agent_runnable() is invoked more than once on the same component instance, the system prompt will accumulate enhancements. Consider using a local variable instead:

-        if is_granite_model(self.llm) and self.tools:
-            self.system_prompt = get_enhanced_system_prompt(
-                self.system_prompt or "",
-                self.tools
-            )
+        effective_system_prompt = self.system_prompt or ""
+        if is_granite_model(self.llm) and self.tools:
+            effective_system_prompt = get_enhanced_system_prompt(
+                effective_system_prompt,
+                self.tools
+            )

Then use effective_system_prompt when building the prompt template. If this is a non-issue in practice, feel free to disregard.

src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py (1)

733-739: Consider more explicit exception handling in tests.

Using contextlib.suppress(Exception) hides all exceptions, which may mask unexpected failures. Consider catching specific expected exceptions or asserting on the behavior before the exception:

# Alternative: verify the call was made regardless of downstream parsing
with patch("lfx.components.langchain_utilities.ibm_granite_handler.format_to_tool_messages", return_value=[]):
    try:
        agent.invoke(inputs)
    except Exception:  # ToolsAgentOutputParser may fail
        pass

self.mock_llm_required.invoke.assert_called()

This makes the test intention clearer while still handling expected exceptions.

src/lfx/src/lfx/components/langchain_utilities/ibm_granite_handler.py (2)

55-72: The try/except/else control flow is correct but could be clearer.

The function works correctly, but the else: return "" branch handles both "no args_schema" and "args_schema exists but no model_fields" cases. Consider consolidating:

 def _get_tool_schema_description(tool) -> str:
     """Extract a brief description of the tool's expected parameters."""
+    if not hasattr(tool, "args_schema") or not tool.args_schema:
+        return ""
     try:
-        if hasattr(tool, "args_schema") and tool.args_schema:
-            schema = tool.args_schema
-            if hasattr(schema, "model_fields"):
-                fields = schema.model_fields
-                params = []
-                for name, field in fields.items():
-                    required = field.is_required() if hasattr(field, "is_required") else True
-                    req_str = "(required)" if required else "(optional)"
-                    params.append(f"{name} {req_str}")
-                return f"Parameters: {', '.join(params)}" if params else ""
-    except Exception:  # noqa: BLE001
+        schema = tool.args_schema
+        if not hasattr(schema, "model_fields"):
+            return ""
+        fields = schema.model_fields
+        params = []
+        for name, field in fields.items():
+            required = field.is_required() if hasattr(field, "is_required") else True
+            req_str = "(required)" if required else "(optional)"
+            params.append(f"{name} {req_str}")
+        return f"Parameters: {', '.join(params)}" if params else ""
+    except Exception:
         return ""
-    else:
-        return ""

This is optional; the current implementation works.


188-196: Inline import could be moved to module level.

The SystemMessage import inside the function is executed on every placeholder detection. Moving it to the top of the file would be slightly more efficient:

 from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.runnables import RunnableLambda
+from langchain_core.messages import SystemMessage
             if has_placeholder:
                 logger.warning("[IBM WatsonX] Placeholder detected, forcing final answer")
-                from langchain_core.messages import SystemMessage
                 corrective_msg = SystemMessage(content=(

This is a minor optimization.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b05d0eb and bf0b83a.

📒 Files selected for processing (4)
  • src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py (1 hunks)
  • src/lfx/src/lfx/base/agents/agent.py (1 hunks)
  • src/lfx/src/lfx/components/langchain_utilities/ibm_granite_handler.py (1 hunks)
  • src/lfx/src/lfx/components/langchain_utilities/tool_calling.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/backend/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/**/*.py: Use FastAPI async patterns with await for async operations in component execution methods
Use asyncio.create_task() for background tasks and implement proper cleanup with try/except for asyncio.CancelledError
Use queue.put_nowait() for non-blocking queue operations and asyncio.wait_for() with timeouts for controlled get operations

Files:

  • src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py
src/backend/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)

src/backend/tests/**/*.py: Place backend unit tests in src/backend/tests/ directory, component tests in src/backend/tests/unit/components/ organized by component subdirectory, and integration tests accessible via make integration_tests
Use same filename as component with appropriate test prefix/suffix (e.g., my_component.pytest_my_component.py)
Use the client fixture (FastAPI Test Client) defined in src/backend/tests/conftest.py for API tests; it provides an async httpx.AsyncClient with automatic in-memory SQLite database and mocked environment variables. Skip client creation by marking test with @pytest.mark.noclient
Inherit from the correct ComponentTestBase family class located in src/backend/tests/base.py based on API access needs: ComponentTestBase (no API), ComponentTestBaseWithClient (needs API), or ComponentTestBaseWithoutClient (pure logic). Provide three required fixtures: component_class, default_kwargs, and file_names_mapping
Create comprehensive unit tests for all new backend components. If unit tests are incomplete, create a corresponding Markdown file documenting manual testing steps and expected outcomes
Test both sync and async code paths, mock external dependencies appropriately, test error handling and edge cases, validate input/output behavior, and test component initialization and configuration
Use @pytest.mark.asyncio decorator for async component tests and ensure async methods are properly awaited
Test background tasks using asyncio.create_task() and verify completion with asyncio.wait_for() with appropriate timeout constraints
Test queue operations using non-blocking queue.put_nowait() and asyncio.wait_for(queue.get(), timeout=...) to verify queue processing without blocking
Use @pytest.mark.no_blockbuster marker to skip the blockbuster plugin in specific tests
For database tests that may fail in batch runs, run them sequentially using uv run pytest src/backend/tests/unit/test_database.py r...

Files:

  • src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py
**/test_*.py

📄 CodeRabbit inference engine (Custom checks)

**/test_*.py: Review test files for excessive use of mocks that may indicate poor test design - check if tests have too many mock objects that obscure what's actually being tested
Warn when mocks are used instead of testing real behavior and interactions, and suggest using real objects or test doubles when mocks become excessive
Ensure mocks are used appropriately for external dependencies only, not for core logic
Backend test files should follow the naming convention test_*.py with proper pytest structure
Test files should have descriptive test function names that explain what is being tested
Tests should be organized logically with proper setup and teardown
Consider including edge cases and error conditions for comprehensive test coverage
Verify tests cover both positive and negative scenarios where appropriate
For async functions in backend tests, ensure proper async testing patterns are used with pytest
For API endpoints, verify both success and error response testing

Files:

  • src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py
🧠 Learnings (3)
📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Backend components should be structured with clear separation of concerns: agents, data processing, embeddings, input/output, models, text processing, prompts, tools, and vector stores

Applied to files:

  • src/lfx/src/lfx/components/langchain_utilities/tool_calling.py
📚 Learning: 2025-08-11T16:52:26.755Z
Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 9336
File: src/backend/base/langflow/base/models/openai_constants.py:29-33
Timestamp: 2025-08-11T16:52:26.755Z
Learning: The "gpt-5-chat-latest" model in the OpenAI models configuration does not support tool calling, so tool_calling should be set to False for this model in src/backend/base/langflow/base/models/openai_constants.py.

Applied to files:

  • src/lfx/src/lfx/components/langchain_utilities/tool_calling.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Create comprehensive unit tests for all new backend components. If unit tests are incomplete, create a corresponding Markdown file documenting manual testing steps and expected outcomes

Applied to files:

  • src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py
🧬 Code graph analysis (3)
src/lfx/src/lfx/components/langchain_utilities/tool_calling.py (1)
src/lfx/src/lfx/components/langchain_utilities/ibm_granite_handler.py (3)
  • create_granite_agent (129-201)
  • get_enhanced_system_prompt (75-107)
  • is_granite_model (45-52)
src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py (1)
src/lfx/src/lfx/components/langchain_utilities/ibm_granite_handler.py (5)
  • create_granite_agent (129-201)
  • detect_placeholder_in_args (110-126)
  • get_enhanced_system_prompt (75-107)
  • is_granite_model (45-52)
  • is_watsonx_model (29-42)
src/lfx/src/lfx/components/langchain_utilities/ibm_granite_handler.py (3)
src/lfx/src/lfx/field_typing/constants.py (1)
  • ChatPromptTemplate (73-74)
src/lfx/src/lfx/base/tools/flow_tool.py (1)
  • args (32-34)
src/backend/tests/unit/mock_language_model.py (2)
  • bind_tools (69-72)
  • invoke (42-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Lint Backend / Run Mypy (3.10)
  • GitHub Check: Lint Backend / Run Mypy (3.11)
  • GitHub Check: Lint Backend / Run Mypy (3.12)
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
  • GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
  • GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
  • GitHub Check: Test Docker Images / Test docker images
  • GitHub Check: Test Starter Templates
  • GitHub Check: Update Starter Projects
  • GitHub Check: Run Ruff Check and Format
  • GitHub Check: Update Component Index
🔇 Additional comments (19)
src/lfx/src/lfx/base/agents/agent.py (1)

184-184: LGTM!

The comment accurately describes the purpose of the subsequent code block that adds system_prompt and chat_history to the existing input_dict.

src/lfx/src/lfx/components/langchain_utilities/tool_calling.py (2)

5-11: LGTM!

Clean separation of IBM-specific logic into a dedicated module with appropriate imports.


78-86: Potential inconsistency: Handler claims all WatsonX models are affected, but only Granite models get special handling.

The ibm_granite_handler.py docstring states tool calling issues affect ALL WatsonX models (Llama, Mistral, etc.), yet here only Granite models (detected by is_granite_model) receive the specialized handler. Non-Granite WatsonX models will fall through to the default create_tool_calling_agent.

If the intention is to handle all WatsonX models, consider using is_watsonx_model() instead:

-            if is_granite_model(self.llm) and self.tools:
+            if is_watsonx_model(self.llm) and self.tools:
                 return create_granite_agent(self.llm, self.tools, prompt)

If the current behavior is intentional (only Granite for now), update the handler's docstring to clarify scope.

src/backend/tests/unit/components/models_and_agents/test_ibm_granite_handler.py (9)

26-30: LGTM!

The create_mock_tool helper is a clean and reusable utility for creating mock tools with proper name attributes throughout the tests.


33-143: LGTM!

Comprehensive test coverage for is_watsonx_model() including class name detection, module detection, case insensitivity, and negative cases for non-WatsonX providers.


151-254: LGTM!

Thorough test coverage for the deprecated is_granite_model() function, including model_id/model_name detection, fallback behavior, case insensitivity, and edge cases.


261-347: LGTM!

Good test coverage for get_enhanced_system_prompt() including edge cases (empty, None, single tool), boundary condition (exactly 2 tools), and content verification.


497-507: Good documentation of current limitation.

The test clearly documents that the current implementation doesn't detect placeholders in nested structures. If this is intentional, consider adding a comment in the production code as well.


565-619: LGTM!

Well-structured parameterized tests for the regex pattern with comprehensive positive and negative cases.


626-698: LGTM!

Good test coverage for create_granite_agent() including error handling, valid inputs, tool_choice bindings, and custom parameters.


818-841: Test confirms Llama on WatsonX uses default agent.

This test validates that non-Granite WatsonX models (like Llama) use the default create_tool_calling_agent. This is consistent with the current is_granite_model check but contradicts the handler's docstring claim that all WatsonX models are affected. Ensure this is the intended behavior.


921-1002: LGTM!

Comprehensive edge case testing including type coercion, regex safety, preservation guarantees, and documented implementation limitations.

src/lfx/src/lfx/components/langchain_utilities/ibm_granite_handler.py (7)

1-20: LGTM!

Well-documented module purpose with clear explanation of WatsonX-specific tool calling issues affecting multiple model types.


22-26: Potential false positive for HTML <data> tag.

The regex will match <data> (HTML5 element) as it contains the keyword "data". This is unlikely in practice for tool arguments but worth noting. The current implementation is reasonable for the expected use case.


29-42: LGTM!

Robust WatsonX detection with multiple fallback strategies (class name, module name).


45-52: LGTM!

Clean implementation with proper deprecation notice and graceful handling of missing/non-string attributes.


75-107: LGTM!

Well-structured prompt enhancement with appropriate early return for edge cases and comprehensive instructions addressing WatsonX tool-calling quirks.


110-126: LGTM!

Clean implementation with proper null checks and support for both dict and string arguments. The documented limitations (no nested dict/list checking) are reasonable for the use case.


200-205: LGTM!

Good logging of agent creation and useful alias for backward/forward compatibility.

IMPORTANT INSTRUCTIONS FOR TOOL USAGE:

1. ALWAYS call tools when you need information - never say "I cannot" or "I don't have access".
2. Call ONE tool at a time, wait for the result, then decide your next action.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this limit the parallel tool calling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is a WatsonX platform limitation, not a design choice. WatsonX models don't reliably support parallel tool calls - when they attempt multiple calls, they often fail or produce incorrect results. The code already handles this by keeping only the first tool call when multiple are returned (see _limit_to_single_tool_call()).

@jordanrfrazier
Copy link
Collaborator

Is this intentional behavior for Granite models? The question would be whether we should be adding workarounds to support models that aren't intended to work with multi-tool agentic functions.

I'm not very familiar with Granite's capabilities, but the solution looks as if it is handling many "quirks" in order to get it to work - and often making some very specific behavioral decisions.


# Create LLM variants with different tool_choice settings
# Note: WatsonX doesn't support parallel_tool_calls parameter, we handle multiple
# tool calls manually by keeping only the first one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the solution here, but it feels like it should be solved on the WatsonX/Granite side rather than adding workarounds on our end.

My main concerns would be:
a) complexity of the workarounds
b) making specific behavioral decisions like the tool_choice, the dynamic prompts, the set number of iterations, the stop message.
c) inconsistencies in behaviors in our system using these models vs. outside our system

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raise valid concerns. I've simplified the code significantly:

  1. Extracted helper functions (_limit_to_single_tool_call, _handle_placeholder_in_response) for better readability
  2. Improved docstrings to explain why this exists - these are documented WatsonX platform behaviors, not bugs we're working around
  3. Reduced complexity - the main invoke() function is now much cleaner

Regarding the strategic question: These workarounds address known WatsonX API behaviors (tool_choice handling, single tool call support). While ideally IBM would fix these upstream, this handler ensures Langflow users get a working experience today

@jordanrfrazier
Copy link
Collaborator

What if we limited the usage of watsonx/granite models in agentic workflows with a clear list of what's supported or not?

@Adam-Aghili
Copy link
Collaborator

@jordanrfrazier I know granite 4 should support multi-tool agents. I dont get why we need extra steps though.

https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 5, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 5, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 7, 2026
@Cristhianzl Cristhianzl force-pushed the cz/improve-ibm-models branch from 0ba146e to 967b9ae Compare January 7, 2026 20:25
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 7, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 7, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 7, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 8, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 8, 2026
@Cristhianzl Cristhianzl added this pull request to the merge queue Jan 8, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jan 8, 2026
@Cristhianzl Cristhianzl enabled auto-merge January 8, 2026 14:04
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 8, 2026
@Cristhianzl Cristhianzl added this pull request to the merge queue Jan 8, 2026
Merged via the queue into main with commit 1b113a4 Jan 8, 2026
55 of 58 checks passed
@Cristhianzl Cristhianzl deleted the cz/improve-ibm-models branch January 8, 2026 14:35
def is_granite_model(llm) -> bool:
"""Check if the LLM is an IBM Granite model.

DEPRECATED: Use is_watsonx_model() instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classic ai bot thinking that every time it changes something it needs to make the code it wrote 30 seconds ago deprecated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants