Skip to content

feat: Add not contains filter operator in DataFrame Operations Component#9415

Merged
Cristhianzl merged 13 commits intomainfrom
cz/not-contains-filter-df
Aug 29, 2025
Merged

feat: Add not contains filter operator in DataFrame Operations Component#9415
Cristhianzl merged 13 commits intomainfrom
cz/not-contains-filter-df

Conversation

@Cristhianzl
Copy link
Member

@Cristhianzl Cristhianzl commented Aug 16, 2025

This pull request adds support for a new "not contains" filter operator in the DataFrameOperationsComponent, allowing users to filter rows where a column does not contain a specified value. The change updates both the UI options and the filtering logic.

Enhancements to DataFrame filtering:

  • Added "not contains" to the filter_operator dropdown options in the DataFrameOperationsComponent, expanding the available filter types for users.
  • Implemented the "not contains" operator in the filter_rows_by_value method, enabling filtering of rows where the column value does not contain the specified filter value.

Summary by CodeRabbit

  • New Features
    • Added a “Not contains” operator to DataFrame filters, letting you exclude rows where a selected column includes a given value. Works with text-like data and handles missing values gracefully. Integrates with existing operators (equals, not equals, contains, starts with, ends with, greater than, less than). Other DataFrame operations (e.g., replace values, drop duplicates) are unchanged.

…where "not contains" filter option was missing, causing incorrect filtering behavior.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 16, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Added a new "not contains" filter operator to DataFrameOperationsComponent and implemented its logic in filter_rows_by_value by negating the string contains check. No other operators or public signatures were changed.

Changes

Cohort / File(s) Summary of Changes
DataFrame filtering operators
src/backend/base/langflow/components/processing/dataframe_operations.py
Introduced "not contains" option in operator dropdown; implemented negated contains mask in filter_rows_by_value. Existing operators and other operations unchanged.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Component as DataFrameOperationsComponent
  participant Pandas as pandas.DataFrame

  User->>Component: Select operator ("not contains") and value
  Component->>Pandas: filter_rows_by_value(col.astype(str).str.contains(value, na=False))
  Note right of Component: Negate mask for "not contains"
  Component->>Pandas: Apply ~mask to DataFrame
  Pandas-->>Component: Filtered DataFrame
  Component-->>User: Return result
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

enhancement, size:M, lgtm

Suggested reviewers

  • rodrigosnader
  • edwinjosechittilappilly
  • erichare
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cz/not-contains-filter-df

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 16, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/backend/base/langflow/components/processing/dataframe_operations.py (2)

257-258: Use regex=False to avoid unintended regex behavior and improve robustness

The implementation is correct and consistent with "contains". However, pandas treats the pattern as a regex by default. This can:

  • Surprise users when filter_value has special regex chars (e.g., . * + ?)
  • Raise errors for invalid patterns (e.g., unmatched brackets)
  • Open the door to performance pitfalls with pathological patterns

Recommend using a literal substring match with regex=False. Apply to "not contains" (below) and, for consistency, also to the existing "contains" case above.

Diff for this block:

-        elif operator == "not contains":
-            mask = ~column.astype(str).str.contains(str(filter_value), na=False)
+        elif operator == "not contains":
+            mask = ~column.astype(str).str.contains(str(filter_value), na=False, regex=False)

Also update the "contains" branch for consistency (outside the selected lines):

elif operator == "contains":
    mask = column.astype(str).str.contains(str(filter_value), na=False, regex=False)

If existing users rely on regex semantics, consider an advanced toggle (e.g., filter_regex: bool) instead of changing defaults.

I can add a small test matrix covering contains/not contains with literals, regex metacharacters, and NaNs. Want me to draft it?


312-312: Gracefully handle empty column_name in Drop Duplicates

If column_name is missing/empty, this will raise a KeyError. Defaulting subset=None uses all columns, which is a safer fallback.

-        return DataFrame(df.drop_duplicates(subset=self.column_name))
+        subset = self.column_name if getattr(self, "column_name", None) else None
+        return DataFrame(df.drop_duplicates(subset=subset))
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 226c71b and f3d0016.

📒 Files selected for processing (1)
  • src/backend/base/langflow/components/processing/dataframe_operations.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/backend/base/langflow/components/**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/backend_development.mdc)

src/backend/base/langflow/components/**/*.py: Add new backend components to the appropriate subdirectory under src/backend/base/langflow/components/
Implement async component methods using async def and await for asynchronous operations
Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
Use asyncio.Queue for non-blocking queue operations in async components and handle timeouts appropriately

Files:

  • src/backend/base/langflow/components/processing/dataframe_operations.py
{src/backend/**/*.py,tests/**/*.py,Makefile}

📄 CodeRabbit Inference Engine (.cursor/rules/backend_development.mdc)

{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code

Files:

  • src/backend/base/langflow/components/processing/dataframe_operations.py
src/backend/**/components/**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/icons.mdc)

In your Python component class, set the icon attribute to a string matching the frontend icon mapping exactly (case-sensitive).

Files:

  • src/backend/base/langflow/components/processing/dataframe_operations.py
🔇 Additional comments (1)
src/backend/base/langflow/components/processing/dataframe_operations.py (1)

82-82: LGTM: "not contains" option added to the dropdown matches backend operator string

The UI option string exactly matches the backend check ("not contains"), so the wiring is correct.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 16, 2025
jordanrfrazier and others added 3 commits August 26, 2025 23:08
* fix: Avoid namespace collision for Astra

* [autofix.ci] apply automated fixes

* Update Vector Store RAG.json

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
@jordanrfrazier jordanrfrazier changed the base branch from main to release-1.6.0 August 27, 2025 14:46
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 27, 2025
erichare and others added 3 commits August 27, 2025 15:25
* fix: Knowledge base component refactor

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* Update styleUtils.ts

* Update ingestion.py

* [autofix.ci] apply automated fixes

* Fix ingestion of df

* [autofix.ci] apply automated fixes

* Update Knowledge Ingestion.json

* Fix one failing test

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* Revert composio versions for CI

* Revert "Revert composio versions for CI"

This reverts commit 9bcb694.

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Edwin Jose <edwin.jose@datastax.com>
Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>
fix .env load on windows script

Co-authored-by: Ítalo Johnny <italojohnnydosanjos@gmail.com>
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 27, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 27, 2025
Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@jordanrfrazier we might need to add tests later for this PR.
can we add it in follow up PR post release.

@github-actions github-actions bot added the lgtm This PR has been approved by a maintainer label Aug 27, 2025
@jordanrfrazier jordanrfrazier changed the base branch from release-1.6.0 to main August 27, 2025 19:11
@github-actions github-actions bot removed the enhancement New feature or request label Aug 27, 2025
@github-actions github-actions bot added the enhancement New feature or request label Aug 27, 2025
@Cristhianzl Cristhianzl added this pull request to the merge queue Aug 27, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 27, 2025
@Cristhianzl Cristhianzl added this pull request to the merge queue Aug 28, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 28, 2025
@Cristhianzl Cristhianzl enabled auto-merge August 29, 2025 16:45
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 29, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Aug 29, 2025

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 6%
6.47% (1680/25935) 3.51% (690/19625) 3.47% (194/5584)

Unit Test Results

Tests Skipped Failures Errors Time
682 0 💤 0 ❌ 0 🔥 11.841s ⏱️

@codecov
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 62.96296% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.59%. Comparing base (2313435) to head (f6a0f2e).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...e/langflow/components/knowledge_bases/retrieval.py 41.66% 7 Missing ⚠️
...e/langflow/components/knowledge_bases/ingestion.py 83.33% 2 Missing ⚠️
...flow/components/processing/dataframe_operations.py 50.00% 1 Missing ⚠️

❌ Your project status has failed because the head coverage (5.81%) is below the target coverage (10.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9415      +/-   ##
==========================================
- Coverage   34.69%   34.59%   -0.11%     
==========================================
  Files        1209     1209              
  Lines       57115    57115              
  Branches     5419     5419              
==========================================
- Hits        19818    19757      -61     
- Misses      37153    37214      +61     
  Partials      144      144              
Flag Coverage Δ
backend 56.02% <62.96%> (-0.19%) ⬇️
frontend 5.81% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...gflow/base/knowledge_bases/knowledge_base_utils.py 90.76% <ø> (ø)
...c/backend/base/langflow/components/agents/agent.py 59.24% <100.00%> (ø)
src/frontend/src/utils/styleUtils.ts 49.09% <ø> (ø)
...flow/components/processing/dataframe_operations.py 71.42% <50.00%> (ø)
...e/langflow/components/knowledge_bases/ingestion.py 75.95% <83.33%> (ø)
...e/langflow/components/knowledge_bases/retrieval.py 73.98% <41.66%> (ø)

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…rate limiting requests to avoid false test failures
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 29, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 29, 2025
@sonarqubecloud
Copy link

@Cristhianzl Cristhianzl added this pull request to the merge queue Aug 29, 2025
Merged via the queue into main with commit ab017ba Aug 29, 2025
75 of 76 checks passed
@Cristhianzl Cristhianzl deleted the cz/not-contains-filter-df branch August 29, 2025 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants