Skip to content

Deterministic diff chunking, file risk scoring, and eval coverage#94

Merged
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:deterministic-diff-chunking-file-risk-scoring-an
Mar 17, 2026
Merged

Deterministic diff chunking, file risk scoring, and eval coverage#94
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:deterministic-diff-chunking-file-risk-scoring-an

Conversation

@derekmisler
Copy link
Contributor

@derekmisler derekmisler commented Mar 13, 2026

Summary

  • Split diffs deterministically at file boundaries (~1000 lines/chunk) instead of relying on LLM judgment
  • Score files by risk (security paths, change size, hunk count, error patterns) so the drafter prioritizes high-risk files
  • Update root agent to use pre-split chunks from /tmp/drafter_chunk_*.diff
  • Add file-based-diff and large-diff-chunking eval cases

Inspired by Sentry's production-data-driven code review patterns.

Closes: https://github.com/docker/gordon/issues/211

@derekmisler derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch 2 times, most recently from 04b3e84 to 183bc90 Compare March 13, 2026 16:20
@derekmisler derekmisler self-assigned this Mar 13, 2026
@derekmisler derekmisler marked this pull request as ready for review March 13, 2026 16:27
@derekmisler derekmisler requested a review from a team as a code owner March 13, 2026 16:27
Copy link
Contributor

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

This PR introduces deterministic diff chunking and file risk scoring to improve code review efficiency. The implementation is generally sound, but there are two areas that need attention:

  1. Medium severity: The file risk scoring loop lacks error handling, which could result in incomplete risk scores if shell commands fail silently
  2. Low severity: JSON construction in the chunk manifest could fail with unusual file names containing special characters

The chunking logic and eval test cases appear well-designed. The issues identified are edge cases that should be addressed to improve robustness.

Findings

  • 1 MEDIUM severity issue (error handling)
  • 1 LOW severity issue (shell quoting)

This review focuses only on code added in this PR. Pre-existing code is not evaluated.

@derekmisler derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch 2 times, most recently from 0544ef5 to bf6a2b4 Compare March 13, 2026 17:41
@derekmisler
Copy link
Contributor Author

/review

Copy link
Contributor

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

The PR introduces deterministic diff chunking and file risk scoring, which are valuable improvements. However, there is one notable issue in the shell scripting logic that should be addressed:

Medium Severity Issues:

  • Missing error handling for jq operations in the manifest building logic could lead to corrupted chunk manifests if jq fails

Details

The chunking script builds a JSON manifest using multiple jq invocations, but does not check for failures. While unlikely in practice, if any jq command fails (e.g., due to unexpected input), the manifest could become corrupted, potentially causing downstream review steps to skip files or fail silently.

Recommendation

Add error checking after jq operations (either set -e at the script level or explicit $? checks) to ensure manifest integrity. The risk scoring section already demonstrates this pattern (line 292), so applying the same approach to the chunking logic would be consistent.

@derekmisler derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch from bf6a2b4 to 504ffeb Compare March 13, 2026 17:47
@derekmisler
Copy link
Contributor Author

/review

Copy link
Contributor

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🟡 NEEDS ATTENTION

This PR adds deterministic diff chunking and file risk scoring to improve the review workflow. The implementation is mostly solid, but there is one confirmed medium-severity issue that should be addressed.

Summary

The PR introduces two new workflow steps:

  1. Diff chunking: Splits large diffs at file boundaries targeting ~1000 lines per chunk
  2. File risk scoring: Assigns risk scores based on security-sensitive paths, change size, and complexity

The chunking logic has a design flaw where it only creates new chunks when both the line count exceeds 1000 AND the directory changes. This means a single directory with many files could result in unbounded chunk sizes, defeating the purpose of chunking.

Findings

1 MEDIUM severity issue found:

@derekmisler derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch from 504ffeb to 93c12fb Compare March 13, 2026 17:55
@derekmisler
Copy link
Contributor Author

/review

Copy link
Contributor

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

This PR introduces deterministic diff chunking and file risk scoring, which are valuable improvements to the review workflow. However, there are two medium-severity issues in the chunking implementation that should be addressed:

  1. Stale chunk files: Previous runs may leave orphaned chunk files in /tmp/ that could be processed incorrectly
  2. Manifest key type inconsistency: The chunk manifest uses string keys instead of numeric keys, which may cause issues if the orchestrator expects numeric indexing

Both issues are in the new chunking logic and should be straightforward to fix.

Findings

  • 2 MEDIUM severity issues requiring attention
  • 0 HIGH severity issues

Signed-off-by: Derek Misler <derek.misler@docker.com>
@derekmisler derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch from 0aa5830 to f4fd4dc Compare March 16, 2026 14:26
@derekmisler derekmisler enabled auto-merge (squash) March 16, 2026 14:26
@derekmisler derekmisler merged commit 39cce26 into docker:main Mar 17, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants