Deterministic diff chunking, file risk scoring, and eval coverage by derekmisler · Pull Request #94 · docker/cagent-action

derekmisler · 2026-03-13T15:22:44Z

Summary

Split diffs deterministically at file boundaries (~1000 lines/chunk) instead of relying on LLM judgment
Score files by risk (security paths, change size, hunk count, error patterns) so the drafter prioritizes high-risk files
Update root agent to use pre-split chunks from /tmp/drafter_chunk_*.diff
Add file-based-diff and large-diff-chunking eval cases

Inspired by Sentry's production-data-driven code review patterns.

Closes: https://github.com/docker/gordon/issues/211

docker-agent

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

This PR introduces deterministic diff chunking and file risk scoring to improve code review efficiency. The implementation is generally sound, but there are two areas that need attention:

Medium severity: The file risk scoring loop lacks error handling, which could result in incomplete risk scores if shell commands fail silently
Low severity: JSON construction in the chunk manifest could fail with unusual file names containing special characters

The chunking logic and eval test cases appear well-designed. The issues identified are edge cases that should be addressed to improve robustness.

Findings

1 MEDIUM severity issue (error handling)
1 LOW severity issue (shell quoting)

This review focuses only on code added in this PR. Pre-existing code is not evaluated.

review-pr/action.yml

derekmisler · 2026-03-13T17:42:11Z

/review

docker-agent

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

The PR introduces deterministic diff chunking and file risk scoring, which are valuable improvements. However, there is one notable issue in the shell scripting logic that should be addressed:

Medium Severity Issues:

Missing error handling for jq operations in the manifest building logic could lead to corrupted chunk manifests if jq fails

Details

The chunking script builds a JSON manifest using multiple jq invocations, but does not check for failures. While unlikely in practice, if any jq command fails (e.g., due to unexpected input), the manifest could become corrupted, potentially causing downstream review steps to skip files or fail silently.

Recommendation

Add error checking after jq operations (either set -e at the script level or explicit $? checks) to ensure manifest integrity. The risk scoring section already demonstrates this pattern (line 292), so applying the same approach to the chunking logic would be consistent.

review-pr/action.yml

derekmisler · 2026-03-13T17:48:00Z

/review

docker-agent

Review Summary

Assessment: 🟡 NEEDS ATTENTION

This PR adds deterministic diff chunking and file risk scoring to improve the review workflow. The implementation is mostly solid, but there is one confirmed medium-severity issue that should be addressed.

Summary

The PR introduces two new workflow steps:

Diff chunking: Splits large diffs at file boundaries targeting ~1000 lines per chunk
File risk scoring: Assigns risk scores based on security-sensitive paths, change size, and complexity

The chunking logic has a design flaw where it only creates new chunks when both the line count exceeds 1000 AND the directory changes. This means a single directory with many files could result in unbounded chunk sizes, defeating the purpose of chunking.

Findings

1 MEDIUM severity issue found:

review-pr/action.yml

derekmisler · 2026-03-13T17:56:32Z

/review

docker-agent

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

This PR introduces deterministic diff chunking and file risk scoring, which are valuable improvements to the review workflow. However, there are two medium-severity issues in the chunking implementation that should be addressed:

Stale chunk files: Previous runs may leave orphaned chunk files in /tmp/ that could be processed incorrectly
Manifest key type inconsistency: The chunk manifest uses string keys instead of numeric keys, which may cause issues if the orchestrator expects numeric indexing

Both issues are in the new chunking logic and should be straightforward to fix.

Findings

2 MEDIUM severity issues requiring attention
0 HIGH severity issues

review-pr/action.yml

Signed-off-by: Derek Misler <derek.misler@docker.com>

derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch 2 times, most recently from 04b3e84 to 183bc90 Compare March 13, 2026 16:20

derekmisler self-assigned this Mar 13, 2026

derekmisler marked this pull request as ready for review March 13, 2026 16:27

derekmisler requested a review from a team as a code owner March 13, 2026 16:27

docker-agent bot reviewed Mar 13, 2026

View reviewed changes

review-pr/action.yml Outdated Show resolved Hide resolved

review-pr/action.yml Show resolved Hide resolved

derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch 2 times, most recently from 0544ef5 to bf6a2b4 Compare March 13, 2026 17:41

docker-agent bot reviewed Mar 13, 2026

View reviewed changes

review-pr/action.yml Show resolved Hide resolved

derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch from bf6a2b4 to 504ffeb Compare March 13, 2026 17:47

docker-agent bot reviewed Mar 13, 2026

View reviewed changes

review-pr/action.yml Outdated Show resolved Hide resolved

derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch from 504ffeb to 93c12fb Compare March 13, 2026 17:55

docker-agent bot reviewed Mar 13, 2026

View reviewed changes

review-pr/action.yml Show resolved Hide resolved

review-pr/action.yml Show resolved Hide resolved

Deterministic diff chunking, file risk scoring, and eval coverage

f4fd4dc

Signed-off-by: Derek Misler <derek.misler@docker.com>

derekmisler force-pushed the deterministic-diff-chunking-file-risk-scoring-an branch from 0aa5830 to f4fd4dc Compare March 16, 2026 14:26

derekmisler enabled auto-merge (squash) March 16, 2026 14:26

krissetto approved these changes Mar 17, 2026

View reviewed changes

derekmisler merged commit 39cce26 into docker:main Mar 17, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic diff chunking, file risk scoring, and eval coverage#94

Deterministic diff chunking, file risk scoring, and eval coverage#94
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:deterministic-diff-chunking-file-risk-scoring-an

derekmisler commented Mar 13, 2026 •

edited

Loading

Uh oh!

docker-agent bot left a comment

Uh oh!

Uh oh!

Uh oh!

derekmisler commented Mar 13, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

Uh oh!

derekmisler commented Mar 13, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

Uh oh!

derekmisler commented Mar 13, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

derekmisler commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

Findings

Uh oh!

Uh oh!

Uh oh!

derekmisler commented Mar 13, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

Details

Recommendation

Uh oh!

Uh oh!

derekmisler commented Mar 13, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

Findings

Uh oh!

Uh oh!

derekmisler commented Mar 13, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Assessment: 🟡 NEEDS ATTENTION

Summary

Findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

derekmisler commented Mar 13, 2026 •

edited

Loading