UDF support: CREATE FUNCTION DDL with pipeline SQL integration#198
Draft
ryannedolan wants to merge 11 commits intomainfrom
Draft
UDF support: CREATE FUNCTION DDL with pipeline SQL integration#198ryannedolan wants to merge 11 commits intomainfrom
ryannedolan wants to merge 11 commits intomainfrom
Conversation
Add support for user-defined functions (UDFs) that can be registered via CREATE FUNCTION and referenced in SQL queries. Registered functions are included in pipeline SQL so Flink can execute them at runtime. DDL syntax: CREATE FUNCTION name [RETURNS type] AS 'class' [LANGUAGE lang] [WITH (...)] DROP FUNCTION name Phase 1 - JDBC driver + pipeline SQL: - UserFunction API model (Deployable) - OpaqueFunction: permissive ScalarFunction for Calcite validation with configurable return type (RETURNS clause) and ANY-typed parameters - Session-scoped function registry on HoptimatorConnection - CREATE/DROP FUNCTION handling in HoptimatorDdlExecutor - FunctionImplementor in ScriptImplementor generates CREATE FUNCTION DDL - PipelineRel.Implementor tracks functions and emits DDL before connectors - Parser extended with RETURNS and LANGUAGE clauses Phase 2 - Python code delivery: - Job API gains files field for inline code (e.g., Python UDF sources) - SqlJob CRD spec gains files field - FlinkStreamingSqlJob and reconciler pass files through - K8sJobDeployer exports files to template environment Tests: - ScriptImplementorTest: FunctionImplementor DDL generation - Quidem unit test (create-function-ddl.id): DDL parsing, type validation - Quidem integration test (k8s-ddl-udf.id): pipeline SQL with !specify Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n functions Calcite normalizes identifiers to uppercase, and all session-registered functions are emitted in pipeline SQL (not just the one used in the query). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Java and Python UDF implementations baked into the Flink runner image so CREATE FUNCTION DDL resolves real functions at runtime: - Greet: scalar VARCHAR UDF (Java) - StringLength: scalar INTEGER UDF (Java) - reverse_string: scalar VARCHAR UDF (Python/PyFlink) Update Dockerfile to install Python/PyFlink and copy Python UDFs. Configure Flink session cluster with Python executable paths. Add k8s-ddl-udf-demo.id integration test using real UDF class names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change flink-template.yaml to generate SqlJob instead of FlinkSessionJob directly, so that UDF files (Python code) are bundled into the CRD and can be dynamically delivered to the data plane. - flink-template.yaml now generates SqlJob with sql + files fields - FlinkStreamingSqlJob.yaml.template changed to FlinkSessionJob (session mode) - FlinkStreamingSqlJob encodes files as --file: directives in sql args - FlinkRunner parses --file: directives, writes to /opt/python-udfs/ - FlinkControllerProvider registers FlinkSessionJob API - All integration test expected output updated from FlinkSessionJob to SqlJob Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflict in venice-ddl-insert-partial.id: take main's updated SQL with multiple key fields (;-delimiter fix from #199) in SqlJob format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The flink-adapter module was orphaned from the build (not in settings.gradle). Revive it so the SqlJob -> FlinkSessionJob reconciler is compiled, packaged, and deployed with the operator. - Add hoptimator-flink-adapter to settings.gradle - Add as runtimeOnly dependency in hoptimator-operator-integration (discovered via SPI ControllerProvider) - Rewrite FlinkControllerProvider and FlinkStreamingSqlJobReconciler to use current K8sContext/K8sApi pattern (was using old Operator API) - Fix build.gradle dependency aliases (libs.kubernetes.client) - Add hoptimator-util dependency for Api interface Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded absolute path with System.getProperty fallback to satisfy SpotBugs DMI_HARDCODED_ABSOLUTE_FILENAME check. Configurable via -Dhoptimator.udf.dir, defaults to /opt/python-udfs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The template engine renders an empty map as blank string, not {}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SnakeYAML dumps an empty map as "{}\n", which the template engine
renders as an indented {} on a separate line after "files:".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SnakeYAML's dump() appends a trailing newline to its output (e.g.,
"{}\n" for an empty map). The template engine's multiline expansion
converts this into a spurious whitespace-only line. Trimming the
output fixes the rendering of {{files}} and other map variables.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the --file: encoding mechanism with the production pattern: FlinkRunner receives --sqljob=namespace/name and fetches the SqlJob CR directly from the K8s API to get SQL statements and UDF files. - FlinkRunner uses DynamicKubernetesApi to fetch SqlJob CR - Extracts spec.sql (statements) and spec.files (UDF code) - Writes files to UDF directory, then executes SQL - Falls back to SQL-from-args for backward compatibility - Reconciler simplified: just passes SqlJob reference to template - FlinkStreamingSqlJob reduced to namespace+name export - RBAC added for Flink SA to read SqlJob CRs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing Done
🤖 Generated with Claude Code