UDF support: CREATE FUNCTION DDL with pipeline SQL integration by ryannedolan · Pull Request #198 · linkedin/Hoptimator

ryannedolan · 2026-03-19T23:39:55Z

Summary

Adds CREATE FUNCTION DDL support with end-to-end data plane integration
Demo Java UDFs (Greet, StringLength) and Python UDF (reverse_string) baked into the Flink runner image
Routes pipeline output through SqlJob CRD instead of FlinkSessionJob directly, enabling dynamic UDF file delivery via the files field
Revives the hoptimator-flink-adapter module (SqlJob → FlinkSessionJob reconciler) and wires it into the operator build
FlinkRunner parses --file: directives to write UDF files to disk before executing SQL
Dockerfile updated with Python/PyFlink for Python UDF support

Testing Done

Unit tests for Java UDFs (GreetTest, StringLengthTest)
Unit tests for FlinkRunner file directive parsing (FlinkRunnerTest)
Integration test k8s-ddl-udf-demo.id verifies pipeline generation with real UDF class names
All existing integration tests updated for SqlJob output format
Full build passes (checkstyle, spotbugs, all tests)
Verify end-to-end on deployed cluster: CREATE FUNCTION greet AS 'com.linkedin.hoptimator.flink.runner.functions.Greet' followed by a materialized view using greet()

🤖 Generated with Claude Code

Add support for user-defined functions (UDFs) that can be registered via CREATE FUNCTION and referenced in SQL queries. Registered functions are included in pipeline SQL so Flink can execute them at runtime. DDL syntax: CREATE FUNCTION name [RETURNS type] AS 'class' [LANGUAGE lang] [WITH (...)] DROP FUNCTION name Phase 1 - JDBC driver + pipeline SQL: - UserFunction API model (Deployable) - OpaqueFunction: permissive ScalarFunction for Calcite validation with configurable return type (RETURNS clause) and ANY-typed parameters - Session-scoped function registry on HoptimatorConnection - CREATE/DROP FUNCTION handling in HoptimatorDdlExecutor - FunctionImplementor in ScriptImplementor generates CREATE FUNCTION DDL - PipelineRel.Implementor tracks functions and emits DDL before connectors - Parser extended with RETURNS and LANGUAGE clauses Phase 2 - Python code delivery: - Job API gains files field for inline code (e.g., Python UDF sources) - SqlJob CRD spec gains files field - FlinkStreamingSqlJob and reconciler pass files through - K8sJobDeployer exports files to template environment Tests: - ScriptImplementorTest: FunctionImplementor DDL generation - Quidem unit test (create-function-ddl.id): DDL parsing, type validation - Quidem integration test (k8s-ddl-udf.id): pipeline SQL with !specify Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…n functions Calcite normalizes identifiers to uppercase, and all session-registered functions are emitted in pipeline SQL (not just the one used in the query). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add Java and Python UDF implementations baked into the Flink runner image so CREATE FUNCTION DDL resolves real functions at runtime: - Greet: scalar VARCHAR UDF (Java) - StringLength: scalar INTEGER UDF (Java) - reverse_string: scalar VARCHAR UDF (Python/PyFlink) Update Dockerfile to install Python/PyFlink and copy Python UDFs. Configure Flink session cluster with Python executable paths. Add k8s-ddl-udf-demo.id integration test using real UDF class names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Change flink-template.yaml to generate SqlJob instead of FlinkSessionJob directly, so that UDF files (Python code) are bundled into the CRD and can be dynamically delivered to the data plane. - flink-template.yaml now generates SqlJob with sql + files fields - FlinkStreamingSqlJob.yaml.template changed to FlinkSessionJob (session mode) - FlinkStreamingSqlJob encodes files as --file: directives in sql args - FlinkRunner parses --file: directives, writes to /opt/python-udfs/ - FlinkControllerProvider registers FlinkSessionJob API - All integration test expected output updated from FlinkSessionJob to SqlJob Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolve conflict in venice-ddl-insert-partial.id: take main's updated SQL with multiple key fields (;-delimiter fix from #199) in SqlJob format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The flink-adapter module was orphaned from the build (not in settings.gradle). Revive it so the SqlJob -> FlinkSessionJob reconciler is compiled, packaged, and deployed with the operator. - Add hoptimator-flink-adapter to settings.gradle - Add as runtimeOnly dependency in hoptimator-operator-integration (discovered via SPI ControllerProvider) - Rewrite FlinkControllerProvider and FlinkStreamingSqlJobReconciler to use current K8sContext/K8sApi pattern (was using old Operator API) - Fix build.gradle dependency aliases (libs.kubernetes.client) - Add hoptimator-util dependency for Api interface Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace hardcoded absolute path with System.getProperty fallback to satisfy SpotBugs DMI_HARDCODED_ABSOLUTE_FILENAME check. Configurable via -Dhoptimator.udf.dir, defaults to /opt/python-udfs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The template engine renders an empty map as blank string, not {}. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SnakeYAML dumps an empty map as "{}\n", which the template engine renders as an indented {} on a separate line after "files:". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SnakeYAML's dump() appends a trailing newline to its output (e.g., "{}\n" for an empty map). The template engine's multiline expansion converts this into a spurious whitespace-only line. Trimming the output fixes the rendering of {{files}} and other map variables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the --file: encoding mechanism with the production pattern: FlinkRunner receives --sqljob=namespace/name and fetches the SqlJob CR directly from the K8s API to get SQL statements and UDF files. - FlinkRunner uses DynamicKubernetesApi to fetch SqlJob CR - Extracts spec.sql (statements) and spec.files (UDF code) - Writes files to UDF directory, then executes SQL - Falls back to SQL-from-args for backward compatibility - Reconciler simplified: just passes SqlJob reference to template - FlinkStreamingSqlJob reduced to namespace+name export - RBAC added for Flink SA to read SqlJob CRs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ryannedolan and others added 11 commits March 19, 2026 16:24

Merge origin/main into udfs

e0dca08

Resolve conflict in venice-ddl-insert-partial.id: take main's updated SQL with multiple key fields (;-delimiter fix from #199) in SqlJob format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix integration test expected output: empty files renders as blank

bd404b7

The template engine renders an empty map as blank string, not {}. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix integration test: empty files map renders as indented {}

768ec26

SnakeYAML dumps an empty map as "{}\n", which the template engine renders as an indented {} on a separate line after "files:". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UDF support: CREATE FUNCTION DDL with pipeline SQL integration#198

UDF support: CREATE FUNCTION DDL with pipeline SQL integration#198
ryannedolan wants to merge 11 commits intomainfrom
udfs

ryannedolan commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryannedolan commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ryannedolan commented Mar 19, 2026 •

edited

Loading