Deterministic diff chunking, file risk scoring, and eval coverage#94
Conversation
04b3e84 to
183bc90
Compare
There was a problem hiding this comment.
Review Summary
Assessment: 🟡 NEEDS ATTENTION
Summary
This PR introduces deterministic diff chunking and file risk scoring to improve code review efficiency. The implementation is generally sound, but there are two areas that need attention:
- Medium severity: The file risk scoring loop lacks error handling, which could result in incomplete risk scores if shell commands fail silently
- Low severity: JSON construction in the chunk manifest could fail with unusual file names containing special characters
The chunking logic and eval test cases appear well-designed. The issues identified are edge cases that should be addressed to improve robustness.
Findings
- 1 MEDIUM severity issue (error handling)
- 1 LOW severity issue (shell quoting)
This review focuses only on code added in this PR. Pre-existing code is not evaluated.
0544ef5 to
bf6a2b4
Compare
|
/review |
There was a problem hiding this comment.
Review Summary
Assessment: 🟡 NEEDS ATTENTION
Summary
The PR introduces deterministic diff chunking and file risk scoring, which are valuable improvements. However, there is one notable issue in the shell scripting logic that should be addressed:
Medium Severity Issues:
- Missing error handling for jq operations in the manifest building logic could lead to corrupted chunk manifests if jq fails
Details
The chunking script builds a JSON manifest using multiple jq invocations, but does not check for failures. While unlikely in practice, if any jq command fails (e.g., due to unexpected input), the manifest could become corrupted, potentially causing downstream review steps to skip files or fail silently.
Recommendation
Add error checking after jq operations (either set -e at the script level or explicit $? checks) to ensure manifest integrity. The risk scoring section already demonstrates this pattern (line 292), so applying the same approach to the chunking logic would be consistent.
bf6a2b4 to
504ffeb
Compare
|
/review |
There was a problem hiding this comment.
Review Summary
Assessment: 🟡 NEEDS ATTENTION
This PR adds deterministic diff chunking and file risk scoring to improve the review workflow. The implementation is mostly solid, but there is one confirmed medium-severity issue that should be addressed.
Summary
The PR introduces two new workflow steps:
- Diff chunking: Splits large diffs at file boundaries targeting ~1000 lines per chunk
- File risk scoring: Assigns risk scores based on security-sensitive paths, change size, and complexity
The chunking logic has a design flaw where it only creates new chunks when both the line count exceeds 1000 AND the directory changes. This means a single directory with many files could result in unbounded chunk sizes, defeating the purpose of chunking.
Findings
1 MEDIUM severity issue found:
504ffeb to
93c12fb
Compare
|
/review |
There was a problem hiding this comment.
Review Summary
Assessment: 🟡 NEEDS ATTENTION
Summary
This PR introduces deterministic diff chunking and file risk scoring, which are valuable improvements to the review workflow. However, there are two medium-severity issues in the chunking implementation that should be addressed:
- Stale chunk files: Previous runs may leave orphaned chunk files in
/tmp/that could be processed incorrectly - Manifest key type inconsistency: The chunk manifest uses string keys instead of numeric keys, which may cause issues if the orchestrator expects numeric indexing
Both issues are in the new chunking logic and should be straightforward to fix.
Findings
- 2 MEDIUM severity issues requiring attention
- 0 HIGH severity issues
Signed-off-by: Derek Misler <derek.misler@docker.com>
0aa5830 to
f4fd4dc
Compare
Summary
/tmp/drafter_chunk_*.diffInspired by Sentry's production-data-driven code review patterns.
Closes: https://github.com/docker/gordon/issues/211