Skip to content

Commit 04b3e84

Browse files
committed
Deterministic diff chunking, file risk scoring, and eval coverage
Signed-off-by: Derek Misler <derek.misler@docker.com>
1 parent 61cd1a7 commit 04b3e84

File tree

4 files changed

+191
-13
lines changed

4 files changed

+191
-13
lines changed

review-pr/action.yml

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,130 @@ runs:
182182
fi
183183
fi
184184
185+
- name: Split diff into chunks
186+
if: hashFiles('pr.diff') != ''
187+
id: chunk-diff
188+
shell: bash
189+
run: |
190+
TOTAL_LINES=$(wc -l < pr.diff | tr -d ' ')
191+
echo "📄 Diff is $TOTAL_LINES lines"
192+
193+
if [ "$TOTAL_LINES" -le 1500 ]; then
194+
# Small diff — single chunk
195+
cp pr.diff /tmp/drafter_chunk_1.diff
196+
echo "chunk_count=1" >> $GITHUB_OUTPUT
197+
# Build manifest: list all files in the diff
198+
FILES=$(grep '^diff --git' pr.diff | sed 's|diff --git a/.* b/||' | jq -R . | jq -s '.')
199+
echo "chunk_manifest=$(jq -nc --argjson f "$FILES" '{"1": $f}')" >> $GITHUB_OUTPUT
200+
echo "✅ Single chunk ($TOTAL_LINES lines)"
201+
else
202+
# Large diff — split at file boundaries, targeting ~1000 lines per chunk
203+
CHUNK=1
204+
CHUNK_LINES=0
205+
CURRENT_FILE=""
206+
CURRENT_DIR=""
207+
MANIFEST="{}"
208+
CHUNK_FILES="[]"
209+
210+
# Create first chunk file
211+
> /tmp/drafter_chunk_1.diff
212+
213+
while IFS= read -r line; do
214+
if [[ "$line" == "diff --git"* ]]; then
215+
FILE=$(echo "$line" | sed 's|diff --git a/.* b/||')
216+
DIR=$(dirname "$FILE")
217+
218+
# Start new chunk if current chunk exceeds target and directory changed
219+
if [ "$CHUNK_LINES" -gt 1000 ] && [ "$DIR" != "$CURRENT_DIR" ]; then
220+
# Save current chunk's file list to manifest
221+
MANIFEST=$(echo "$MANIFEST" | jq --argjson files "$CHUNK_FILES" --arg k "$CHUNK" '.[$k] = $files')
222+
CHUNK=$((CHUNK + 1))
223+
CHUNK_LINES=0
224+
CHUNK_FILES="[]"
225+
> /tmp/drafter_chunk_${CHUNK}.diff
226+
fi
227+
228+
CURRENT_FILE="$FILE"
229+
CURRENT_DIR="$DIR"
230+
CHUNK_FILES=$(echo "$CHUNK_FILES" | jq --arg f "$FILE" '. += [$f]')
231+
fi
232+
233+
echo "$line" >> /tmp/drafter_chunk_${CHUNK}.diff
234+
CHUNK_LINES=$((CHUNK_LINES + 1))
235+
done < pr.diff
236+
237+
# Save final chunk's file list
238+
MANIFEST=$(echo "$MANIFEST" | jq --argjson files "$CHUNK_FILES" --arg k "$CHUNK" '.[$k] = $files')
239+
240+
echo "chunk_count=$CHUNK" >> $GITHUB_OUTPUT
241+
echo "chunk_manifest=$(echo "$MANIFEST" | jq -c .)" >> $GITHUB_OUTPUT
242+
echo "✅ Split into $CHUNK chunks (target ~1000 lines each)"
243+
244+
for i in $(seq 1 $CHUNK); do
245+
LINES=$(wc -l < /tmp/drafter_chunk_${i}.diff | tr -d ' ')
246+
FILES_IN_CHUNK=$(echo "$MANIFEST" | jq -r --arg k "$i" '.[$k] | length')
247+
echo " Chunk $i: $LINES lines, $FILES_IN_CHUNK files"
248+
done
249+
fi
250+
251+
- name: Score file risk
252+
if: hashFiles('pr.diff') != ''
253+
shell: bash
254+
run: |
255+
# Extract per-file diff stats (added lines, hunk count)
256+
awk '
257+
/^diff --git/ {
258+
if (file != "") print file, added, hunks
259+
file = $0; sub(/.*b\//, "", file)
260+
added = 0; hunks = 0
261+
}
262+
/^@@/ { hunks++ }
263+
/^\+[^+]/ { added++ }
264+
END { if (file != "") print file, added, hunks }
265+
' pr.diff > /tmp/file_diff_stats.txt
266+
267+
# Build final scores
268+
jq -n '{}' > /tmp/file_risk_scores.json
269+
while read -r file added hunks; do
270+
SCORE=0
271+
272+
# Security-sensitive paths: +2
273+
if echo "$file" | grep -qiE 'auth|security|crypto|session|secret|token|password|credential'; then
274+
SCORE=$((SCORE + 2))
275+
fi
276+
277+
# Large change (>100 added lines): +2
278+
if [ "$added" -gt 100 ]; then
279+
SCORE=$((SCORE + 2))
280+
fi
281+
282+
# Many hunks (>3): +1
283+
if [ "$hunks" -gt 3 ]; then
284+
SCORE=$((SCORE + 1))
285+
fi
286+
287+
# Test/doc/config files: score = 0
288+
if echo "$file" | grep -qiE '_test\.go$|\.test\.[tj]sx?$|\.spec\.[tj]sx?$|test_.*\.py$|\.md$|\.ya?ml$|\.json$|\.toml$'; then
289+
SCORE=0
290+
fi
291+
292+
# Error handling patterns in diff for this file: +1
293+
ERROR_LINES=$(awk -v f="$file" '
294+
/^diff --git/ { cur=$0; sub(/.*b\//, "", cur) }
295+
cur == f && /^\+[^+]/ && /catch|rescue|except|recover|error|panic/ { count++ }
296+
END { print count+0 }
297+
' pr.diff)
298+
if [ "$ERROR_LINES" -gt 0 ]; then
299+
SCORE=$((SCORE + 1))
300+
fi
301+
302+
jq --arg f "$file" --argjson s "$SCORE" '.[$f] = $s' /tmp/file_risk_scores.json > /tmp/file_risk_scores.tmp \
303+
&& mv /tmp/file_risk_scores.tmp /tmp/file_risk_scores.json
304+
done < /tmp/file_diff_stats.txt
305+
306+
rm -f /tmp/file_diff_stats.txt
307+
echo "✅ File risk scores: $(jq -c . /tmp/file_risk_scores.json)"
308+
185309
- name: Resolve stale review threads
186310
continue-on-error: true
187311
shell: bash
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"id": "a1b2c3d4-file-diff-test-001",
3+
"title": "File-based diff reading - agent reads pr.diff from disk (run 1)",
4+
"evals": {
5+
"setup": "apk add --no-cache github-cli jq && mkdir -p /tmp && cat > pr.diff << 'DIFFEOF'\ndiff --git a/pkg/auth/handler.go b/pkg/auth/handler.go\nindex abc1234..def5678 100644\n--- a/pkg/auth/handler.go\n+++ b/pkg/auth/handler.go\n@@ -45,6 +45,15 @@ func (h *Handler) Login(w http.ResponseWriter, r *http.Request) {\n \ttoken := r.Header.Get(\"Authorization\")\n \tif token == \"\" {\n \t\thttp.Error(w, \"unauthorized\", http.StatusUnauthorized)\n+\t\treturn\n+\t}\n+\n+\t// Validate token format\n+\tparts := strings.Split(token, \" \")\n+\tif len(parts) != 2 || parts[0] != \"Bearer\" {\n+\t\thttp.Error(w, \"invalid token format\", http.StatusUnauthorized)\n+\t\treturn\n+\t}\n \n \tsession, err := h.store.Get(r, \"session\")\n \tif err != nil {\ndiff --git a/pkg/auth/middleware.go b/pkg/auth/middleware.go\nindex 111aaa..222bbb 100644\n--- a/pkg/auth/middleware.go\n+++ b/pkg/auth/middleware.go\n@@ -12,8 +12,12 @@ func AuthMiddleware(next http.Handler) http.Handler {\n \treturn http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {\n \t\ttoken := r.Header.Get(\"Authorization\")\n \t\tif token == \"\" {\n-\t\t\thttp.Error(w, \"missing token\", http.StatusUnauthorized)\n-\t\t\treturn\n+\t\t\t// Check cookie as fallback\n+\t\t\tcookie, err := r.Cookie(\"auth_token\")\n+\t\t\tif err != nil || cookie.Value == \"\" {\n+\t\t\t\thttp.Error(w, \"missing token\", http.StatusUnauthorized)\n+\t\t\t\treturn\n+\t\t\t}\n+\t\t\ttoken = cookie.Value\n \t\t}\n \t\tnext.ServeHTTP(w, r)\n \t})\nDIFFEOF\ncp pr.diff /tmp/drafter_chunk_1.diff && echo 'handler.go' > changed_files.txt && echo 'middleware.go' >> changed_files.txt && jq -n '{title: \"Add token format validation and cookie auth fallback\", author: {login: \"testuser\"}, body: \"Adds Bearer token format validation and cookie-based auth fallback.\", baseRefName: \"main\", headRefName: \"fix/auth-improvements\"}' > pr_metadata.json",
6+
"relevance": [
7+
"The agent ran 'echo $GITHUB_ACTIONS' before performing the review to detect the output mode",
8+
"The agent output the review to the console as formatted markdown instead of posting via gh api",
9+
"The agent successfully read the diff from the pr.diff file on disk rather than trying to fetch it via gh or git",
10+
"The drafter response is valid JSON containing a 'findings' array and a 'summary' field",
11+
"The review assessment label is '🟢 APPROVE' or '🟡 NEEDS ATTENTION' — it is NOT '🔴 CRITICAL'"
12+
]
13+
},
14+
"messages": [
15+
{
16+
"message": {
17+
"agentName": "",
18+
"message": {
19+
"role": "user",
20+
"content": "Review the following PR.\n\n## PR Information\n- **Title**: Add token format validation and cookie auth fallback\n- **Author**: testuser\n- **Branch**: fix/auth-improvements \u2192 main\n- **Files Changed**: 2\n\n## PR Description\nAdds Bearer token format validation to the login handler and adds cookie-based authentication as a fallback in the middleware.\n\n## Changed Files\n\npkg/auth/handler.go\npkg/auth/middleware.go\n\n---\n\n## Instructions\n\nExecute the review pipeline:\n\n1. **Gather**: Read the pre-fetched `pr.diff` file. If missing, run `gh pr diff` (use the full URL, not just the number)\n2. **Draft**: Delegate to `drafter` agent to generate bug hypotheses\n3. **Verify**: For each hypothesis, delegate to `verifier` agent\n4. **Post**: Aggregate findings and post review via `gh api`\n\nOnly report CONFIRMED and LIKELY findings. Always post as COMMENT (never APPROVE or REQUEST_CHANGES)."
21+
}
22+
}
23+
}
24+
]
25+
}
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{
2+
"id": "b2c3d4e5-large-chunk-test-001",
3+
"title": "Large diff chunking - agent delegates pre-split chunks (run 1)",
4+
"evals": {
5+
"setup": "apk add --no-cache github-cli jq && generate_file() { local filepath=$1 num_lines=$2; echo \"diff --git a/$filepath b/$filepath\"; echo \"index abc1234..def5678 100644\"; echo \"--- a/$filepath\"; echo \"+++ b/$filepath\"; echo \"@@ -1,5 +1,$num_lines @@\"; echo \" package $(basename $(dirname $filepath))\"; echo \" \"; echo \" import (\"; echo \"+\\t\\\"fmt\\\"\"; echo \"+\\t\\\"net/http\\\"\"; echo \" )\"; echo \" \"; i=0; while [ $i -lt $((num_lines - 10)) ]; do if [ \"$filepath\" = \"pkg/storage/db.go\" ] && [ $i -eq 50 ]; then echo \"+// GetUser fetches a user by ID from the database.\"; echo \"+func GetUser(db *sql.DB, userID string) (*User, error) {\"; echo \"+\\tquery := fmt.Sprintf(\\\"SELECT id, name, email FROM users WHERE id = '%s'\\\", userID)\"; echo \"+\\trow := db.QueryRow(query)\"; echo \"+\\tvar u User\"; echo \"+\\tif err := row.Scan(&u.ID, &u.Name, &u.Email); err != nil {\"; echo \"+\\t\\treturn nil, fmt.Errorf(\\\"scan user: %w\\\", err)\"; echo \"+\\t}\"; echo \"+\\treturn &u, nil\"; echo \"+}\"; i=$((i + 10)); else echo \"+// line $i of $filepath\"; i=$((i + 1)); fi; done; } && { generate_file pkg/api/handlers.go 400; generate_file pkg/api/middleware.go 350; generate_file pkg/storage/db.go 400; generate_file pkg/config/loader.go 200; generate_file internal/worker/processor.go 300; } > pr.diff && echo \"Total lines: $(wc -l < pr.diff)\" && CHUNK=1 CHUNK_LINES=0 && : > /tmp/drafter_chunk_1.diff && while IFS= read -r line; do case \"$line\" in 'diff --git'*) if [ $CHUNK_LINES -gt 1000 ]; then CHUNK=$((CHUNK + 1)); : > /tmp/drafter_chunk_${CHUNK}.diff; CHUNK_LINES=0; fi ;; esac; echo \"$line\" >> /tmp/drafter_chunk_${CHUNK}.diff; CHUNK_LINES=$((CHUNK_LINES + 1)); done < pr.diff && echo \"Chunks: $CHUNK\" && for c in $(seq 1 $CHUNK); do echo \"Chunk $c: $(wc -l < /tmp/drafter_chunk_${c}.diff) lines\"; done && printf 'pkg/api/handlers.go\\npkg/api/middleware.go\\npkg/storage/db.go\\npkg/config/loader.go\\ninternal/worker/processor.go\\n' > changed_files.txt && jq -n '{title: \"Add API handlers, storage layer, config loader, and worker\", author: {login: \"testuser\"}, body: \"Large feature PR adding multiple packages.\", baseRefName: \"main\", headRefName: \"feature/big-feature\"}' > pr_metadata.json && echo 'Setup complete'",
6+
"relevance": [
7+
"The agent ran 'echo $GITHUB_ACTIONS' before performing the review to detect the output mode",
8+
"The agent output the review to the console as formatted markdown instead of posting via gh api",
9+
"The agent detected and used pre-split chunk files at /tmp/drafter_chunk_*.diff for delegation rather than splitting the diff itself",
10+
"The drafter response is valid JSON containing a 'findings' array and a 'summary' field",
11+
"At least one finding flags the SQL injection vulnerability in pkg/storage/db.go where user input is interpolated into a SQL query via fmt.Sprintf",
12+
"The SQL injection finding has severity 'high' because unsanitized user input in SQL queries is a critical security vulnerability"
13+
]
14+
},
15+
"messages": [
16+
{
17+
"message": {
18+
"agentName": "",
19+
"message": {
20+
"role": "user",
21+
"content": "Review the following PR.\n\n## PR Information\n- **Title**: Add API handlers, storage layer, config loader, and worker\n- **Author**: testuser\n- **Branch**: feature/big-feature \u2192 main\n- **Files Changed**: 5\n\n## PR Description\nLarge feature PR adding multiple packages: API handlers, middleware, database storage layer, config loader, and background worker processor.\n\n## Changed Files\n\npkg/api/handlers.go\npkg/api/middleware.go\npkg/storage/db.go\npkg/config/loader.go\ninternal/worker/processor.go\n\n---\n\n## Instructions\n\nExecute the review pipeline:\n\n1. **Gather**: Read the pre-fetched `pr.diff` file. If missing, run `gh pr diff` (use the full URL, not just the number)\n2. **Draft**: Delegate to `drafter` agent to generate bug hypotheses\n3. **Verify**: For each hypothesis, delegate to `verifier` agent\n4. **Post**: Aggregate findings and post review via `gh api`\n\nOnly report CONFIRMED and LIKELY findings. Always post as COMMENT (never APPROVE or REQUEST_CHANGES)."
22+
}
23+
}
24+
}
25+
]
26+
}

review-pr/agents/pr-review.yaml

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -87,18 +87,13 @@ agents:
8787
```
8888
3. Use `get_memories` to check for any learned patterns from previous feedback
8989
4. **Delegate to drafter(s)**:
90-
a. Count the total lines in the diff (e.g., `wc -l pr.diff`).
91-
b. If the diff is ≤ 1500 lines, write it to `/tmp/drafter_chunk_1.diff` and delegate
92-
to the drafter in a single call.
93-
c. If the diff is > 1500 lines:
94-
- Split the diff at file boundaries (`diff --git` headers) using shell commands.
95-
- Group files into chunks targeting ~1000 lines each. Keep related files
96-
together when possible (same directory).
97-
- Write each chunk to `/tmp/drafter_chunk_N.diff` (numbered sequentially).
98-
- Delegate each chunk to the drafter separately.
99-
- Merge all findings arrays into a single list before proceeding to step 5.
100-
- Combine summaries into one overall summary.
101-
d. Include any relevant learned patterns from memory in each delegation.
90+
a. Check for pre-split chunk files at `/tmp/drafter_chunk_*.diff`
91+
(the CI workflow splits the diff deterministically before the agent runs).
92+
b. For each chunk file, delegate to the drafter with the file path.
93+
c. If no chunk files exist (console mode), read the diff and write it
94+
to `/tmp/drafter_chunk_1.diff` yourself, then delegate.
95+
d. Merge all findings arrays into a single list before proceeding.
96+
e. Include any relevant learned patterns from memory in each delegation.
10297
10398
## CRITICAL: How to delegate to the drafter
10499
@@ -114,11 +109,19 @@ agents:
114109
Project context (from AGENTS.md):
115110
<contents of AGENTS.md, or "No AGENTS.md found" if absent>
116111
112+
File risk scores (prioritize high-risk files):
113+
<contents of /tmp/file_risk_scores.json, or "No risk scores available" if absent>
114+
117115
Learned patterns: <any relevant memories>
118116
```
119117
120118
The drafter has `read_file` access and will read the chunk from disk. Keep the
121-
delegation message short — just the file path, chunk number, project context, and any learned patterns.
119+
delegation message short — just the file path, chunk number, project context,
120+
risk scores, and any learned patterns.
121+
122+
If `/tmp/file_risk_scores.json` exists, read it and include the scores. The drafter
123+
should spend more time on high-scoring files. If it doesn't exist (console mode),
124+
skip it.
122125
123126
**Include a file listing** so the drafter knows what files exist on disk. Before
124127
delegating, run:

0 commit comments

Comments
 (0)