Skip to content

Commit 93c12fb

Browse files
committed
Deterministic diff chunking, file risk scoring, and eval coverage
Signed-off-by: Derek Misler <derek.misler@docker.com>
1 parent b120fe2 commit 93c12fb

File tree

4 files changed

+197
-13
lines changed

4 files changed

+197
-13
lines changed

review-pr/action.yml

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,137 @@ runs:
182182
fi
183183
fi
184184
185+
- name: Split diff into chunks
186+
if: hashFiles('pr.diff') != ''
187+
id: chunk-diff
188+
shell: bash
189+
run: |
190+
set -euo pipefail
191+
TOTAL_LINES=$(wc -l < pr.diff | tr -d ' ')
192+
echo "📄 Diff is $TOTAL_LINES lines"
193+
194+
if [ "$TOTAL_LINES" -le 1500 ]; then
195+
# Small diff — single chunk
196+
cp pr.diff /tmp/drafter_chunk_1.diff
197+
echo "chunk_count=1" >> $GITHUB_OUTPUT
198+
# Build manifest: list all files in the diff
199+
FILES=$(grep '^diff --git' pr.diff | sed 's|diff --git a/.* b/||' | jq -R . | jq -s '.')
200+
echo "chunk_manifest=$(jq -nc --argjson f "$FILES" '{"1": $f}')" >> $GITHUB_OUTPUT
201+
echo "✅ Single chunk ($TOTAL_LINES lines)"
202+
else
203+
# Large diff — split at file boundaries, targeting ~1000 lines per chunk
204+
CHUNK=1
205+
CHUNK_LINES=0
206+
CURRENT_FILE=""
207+
CURRENT_DIR=""
208+
MANIFEST="{}"
209+
CHUNK_FILES="[]"
210+
211+
# Create first chunk file
212+
> /tmp/drafter_chunk_1.diff
213+
214+
while IFS= read -r line; do
215+
if [[ "$line" == "diff --git"* ]]; then
216+
FILE=$(echo "$line" | sed 's|diff --git a/.* b/||')
217+
DIR=$(dirname "$FILE")
218+
219+
# Start new chunk if:
220+
# - over soft limit (1000 lines) and directory changed, OR
221+
# - over hard limit (2000 lines) regardless of directory
222+
if ([ "$CHUNK_LINES" -gt 1000 ] && [ "$DIR" != "$CURRENT_DIR" ]) || [ "$CHUNK_LINES" -gt 2000 ]; then
223+
# Save current chunk's file list to manifest
224+
MANIFEST=$(echo "$MANIFEST" | jq --argjson files "$CHUNK_FILES" --arg k "$CHUNK" '.[$k] = $files')
225+
CHUNK=$((CHUNK + 1))
226+
CHUNK_LINES=0
227+
CHUNK_FILES="[]"
228+
> /tmp/drafter_chunk_${CHUNK}.diff
229+
fi
230+
231+
CURRENT_FILE="$FILE"
232+
CURRENT_DIR="$DIR"
233+
CHUNK_FILES=$(echo "$CHUNK_FILES" | jq --arg f "$FILE" '. += [$f]')
234+
fi
235+
236+
echo "$line" >> /tmp/drafter_chunk_${CHUNK}.diff
237+
CHUNK_LINES=$((CHUNK_LINES + 1))
238+
done < pr.diff
239+
240+
# Save final chunk's file list
241+
MANIFEST=$(echo "$MANIFEST" | jq --argjson files "$CHUNK_FILES" --arg k "$CHUNK" '.[$k] = $files')
242+
243+
echo "chunk_count=$CHUNK" >> $GITHUB_OUTPUT
244+
echo "chunk_manifest=$(echo "$MANIFEST" | jq -c .)" >> $GITHUB_OUTPUT
245+
echo "✅ Split into $CHUNK chunks (target ~1000 lines each)"
246+
247+
for i in $(seq 1 $CHUNK); do
248+
LINES=$(wc -l < /tmp/drafter_chunk_${i}.diff | tr -d ' ')
249+
FILES_IN_CHUNK=$(echo "$MANIFEST" | jq -r --arg k "$i" '.[$k] | length')
250+
echo " Chunk $i: $LINES lines, $FILES_IN_CHUNK files"
251+
done
252+
fi
253+
254+
- name: Score file risk
255+
if: hashFiles('pr.diff') != ''
256+
shell: bash
257+
run: |
258+
# Extract per-file diff stats (added lines, hunk count)
259+
awk '
260+
/^diff --git/ {
261+
if (file != "") print file, added, hunks
262+
file = $0; sub(/.*b\//, "", file)
263+
added = 0; hunks = 0
264+
}
265+
/^@@/ { hunks++ }
266+
/^\+[^+]/ { added++ }
267+
END { if (file != "") print file, added, hunks }
268+
' pr.diff > /tmp/file_diff_stats.txt
269+
270+
# Build final scores
271+
jq -n '{}' > /tmp/file_risk_scores.json
272+
while read -r file added hunks; do
273+
SCORE=0
274+
275+
# Security-sensitive paths: +2
276+
if echo "$file" | grep -qiE 'auth|security|crypto|session|secret|token|password|credential'; then
277+
SCORE=$((SCORE + 2))
278+
fi
279+
280+
# Large change (>100 added lines): +2
281+
if [ "$added" -gt 100 ]; then
282+
SCORE=$((SCORE + 2))
283+
fi
284+
285+
# Many hunks (>3): +1
286+
if [ "$hunks" -gt 3 ]; then
287+
SCORE=$((SCORE + 1))
288+
fi
289+
290+
# Test/doc/config files: score = 0
291+
if echo "$file" | grep -qiE '_test\.go$|\.test\.[tj]sx?$|\.spec\.[tj]sx?$|test_.*\.py$|\.md$|\.ya?ml$|\.json$|\.toml$'; then
292+
SCORE=0
293+
fi
294+
295+
# Error handling patterns in diff for this file: +1
296+
ERROR_LINES=$(awk -v f="$file" '
297+
/^diff --git/ { cur=$0; sub(/.*b\//, "", cur) }
298+
cur == f && /^\+[^+]/ && /catch|rescue|except|recover|error|panic/ { count++ }
299+
END { print count+0 }
300+
' pr.diff)
301+
if [ "$ERROR_LINES" -gt 0 ]; then
302+
SCORE=$((SCORE + 1))
303+
fi
304+
305+
if jq --arg f "$file" --argjson s "$SCORE" '.[$f] = $s' /tmp/file_risk_scores.json > /tmp/file_risk_scores.tmp; then
306+
mv /tmp/file_risk_scores.tmp /tmp/file_risk_scores.json
307+
else
308+
echo "⚠️ Failed to score $file, skipping"
309+
rm -f /tmp/file_risk_scores.tmp
310+
fi
311+
done < /tmp/file_diff_stats.txt
312+
313+
rm -f /tmp/file_diff_stats.txt
314+
echo "✅ File risk scores: $(jq -c . /tmp/file_risk_scores.json)"
315+
185316
- name: Generate file history
186317
if: hashFiles('changed_files.txt') != ''
187318
shell: bash
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"id": "a1b2c3d4-file-diff-test-001",
3+
"title": "File-based diff reading - agent reads pr.diff from disk (run 1)",
4+
"evals": {
5+
"setup": "apk add --no-cache github-cli jq && mkdir -p /tmp && cat > pr.diff << 'DIFFEOF'\ndiff --git a/pkg/auth/handler.go b/pkg/auth/handler.go\nindex abc1234..def5678 100644\n--- a/pkg/auth/handler.go\n+++ b/pkg/auth/handler.go\n@@ -45,6 +45,15 @@ func (h *Handler) Login(w http.ResponseWriter, r *http.Request) {\n \ttoken := r.Header.Get(\"Authorization\")\n \tif token == \"\" {\n \t\thttp.Error(w, \"unauthorized\", http.StatusUnauthorized)\n+\t\treturn\n+\t}\n+\n+\t// Validate token format\n+\tparts := strings.Split(token, \" \")\n+\tif len(parts) != 2 || parts[0] != \"Bearer\" {\n+\t\thttp.Error(w, \"invalid token format\", http.StatusUnauthorized)\n+\t\treturn\n+\t}\n \n \tsession, err := h.store.Get(r, \"session\")\n \tif err != nil {\ndiff --git a/pkg/auth/middleware.go b/pkg/auth/middleware.go\nindex 111aaa..222bbb 100644\n--- a/pkg/auth/middleware.go\n+++ b/pkg/auth/middleware.go\n@@ -12,8 +12,12 @@ func AuthMiddleware(next http.Handler) http.Handler {\n \treturn http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {\n \t\ttoken := r.Header.Get(\"Authorization\")\n \t\tif token == \"\" {\n-\t\t\thttp.Error(w, \"missing token\", http.StatusUnauthorized)\n-\t\t\treturn\n+\t\t\t// Check cookie as fallback\n+\t\t\tcookie, err := r.Cookie(\"auth_token\")\n+\t\t\tif err != nil || cookie.Value == \"\" {\n+\t\t\t\thttp.Error(w, \"missing token\", http.StatusUnauthorized)\n+\t\t\t\treturn\n+\t\t\t}\n+\t\t\ttoken = cookie.Value\n \t\t}\n \t\tnext.ServeHTTP(w, r)\n \t})\nDIFFEOF\ncp pr.diff /tmp/drafter_chunk_1.diff && echo 'handler.go' > changed_files.txt && echo 'middleware.go' >> changed_files.txt && jq -n '{title: \"Add token format validation and cookie auth fallback\", author: {login: \"testuser\"}, body: \"Adds Bearer token format validation and cookie-based auth fallback.\", baseRefName: \"main\", headRefName: \"fix/auth-improvements\"}' > pr_metadata.json",
6+
"relevance": [
7+
"The agent ran 'echo $GITHUB_ACTIONS' before performing the review to detect the output mode",
8+
"The agent output the review to the console as formatted markdown instead of posting via gh api",
9+
"The agent successfully read the diff from the pr.diff file on disk rather than trying to fetch it via gh or git",
10+
"The drafter response is valid JSON containing a 'findings' array and a 'summary' field",
11+
"The review includes an assessment label (one of '🟢 APPROVE', '🟡 NEEDS ATTENTION', or '🔴 CRITICAL')"
12+
]
13+
},
14+
"messages": [
15+
{
16+
"message": {
17+
"agentName": "",
18+
"message": {
19+
"role": "user",
20+
"content": "Review the following PR.\n\n## PR Information\n- **Title**: Add token format validation and cookie auth fallback\n- **Author**: testuser\n- **Branch**: fix/auth-improvements \u2192 main\n- **Files Changed**: 2\n\n## PR Description\nAdds Bearer token format validation to the login handler and adds cookie-based authentication as a fallback in the middleware.\n\n## Changed Files\n\npkg/auth/handler.go\npkg/auth/middleware.go\n\n---\n\n## Instructions\n\nExecute the review pipeline:\n\n1. **Gather**: Read the pre-fetched `pr.diff` file. If missing, run `gh pr diff` (use the full URL, not just the number)\n2. **Draft**: Delegate to `drafter` agent to generate bug hypotheses\n3. **Verify**: For each hypothesis, delegate to `verifier` agent\n4. **Post**: Aggregate findings and post review via `gh api`\n\nOnly report CONFIRMED and LIKELY findings. Always post as COMMENT (never APPROVE or REQUEST_CHANGES)."
21+
}
22+
}
23+
}
24+
]
25+
}
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{
2+
"id": "b2c3d4e5-large-chunk-test-001",
3+
"title": "Large diff chunking - agent delegates pre-split chunks (run 1)",
4+
"evals": {
5+
"setup": "apk add --no-cache github-cli jq && generate_file() { local filepath=$1 num_lines=$2; echo \"diff --git a/$filepath b/$filepath\"; echo \"index abc1234..def5678 100644\"; echo \"--- a/$filepath\"; echo \"+++ b/$filepath\"; echo \"@@ -1,5 +1,$num_lines @@\"; echo \" package $(basename $(dirname $filepath))\"; echo \" \"; echo \" import (\"; echo \"+\\t\\\"fmt\\\"\"; echo \"+\\t\\\"net/http\\\"\"; echo \" )\"; echo \" \"; i=0; while [ $i -lt $((num_lines - 10)) ]; do if [ \"$filepath\" = \"pkg/storage/db.go\" ] && [ $i -eq 50 ]; then echo \"+// GetUser fetches a user by ID from the database.\"; echo \"+func GetUser(db *sql.DB, userID string) (*User, error) {\"; echo \"+\\tquery := fmt.Sprintf(\\\"SELECT id, name, email FROM users WHERE id = '%s'\\\", userID)\"; echo \"+\\trow := db.QueryRow(query)\"; echo \"+\\tvar u User\"; echo \"+\\tif err := row.Scan(&u.ID, &u.Name, &u.Email); err != nil {\"; echo \"+\\t\\treturn nil, fmt.Errorf(\\\"scan user: %w\\\", err)\"; echo \"+\\t}\"; echo \"+\\treturn &u, nil\"; echo \"+}\"; i=$((i + 10)); else echo \"+// line $i of $filepath\"; i=$((i + 1)); fi; done; } && { generate_file pkg/api/handlers.go 400; generate_file pkg/api/middleware.go 350; generate_file pkg/storage/db.go 400; generate_file pkg/config/loader.go 200; generate_file internal/worker/processor.go 300; } > pr.diff && echo \"Total lines: $(wc -l < pr.diff)\" && CHUNK=1 CHUNK_LINES=0 && : > /tmp/drafter_chunk_1.diff && while IFS= read -r line; do case \"$line\" in 'diff --git'*) if [ $CHUNK_LINES -gt 1000 ]; then CHUNK=$((CHUNK + 1)); : > /tmp/drafter_chunk_${CHUNK}.diff; CHUNK_LINES=0; fi ;; esac; echo \"$line\" >> /tmp/drafter_chunk_${CHUNK}.diff; CHUNK_LINES=$((CHUNK_LINES + 1)); done < pr.diff && echo \"Chunks: $CHUNK\" && for c in $(seq 1 $CHUNK); do echo \"Chunk $c: $(wc -l < /tmp/drafter_chunk_${c}.diff) lines\"; done && printf 'pkg/api/handlers.go\\npkg/api/middleware.go\\npkg/storage/db.go\\npkg/config/loader.go\\ninternal/worker/processor.go\\n' > changed_files.txt && jq -n '{title: \"Add API handlers, storage layer, config loader, and worker\", author: {login: \"testuser\"}, body: \"Large feature PR adding multiple packages.\", baseRefName: \"main\", headRefName: \"feature/big-feature\"}' > pr_metadata.json && echo 'Setup complete'",
6+
"relevance": [
7+
"The agent ran 'echo $GITHUB_ACTIONS' before performing the review to detect the output mode",
8+
"The agent output the review to the console as formatted markdown instead of posting via gh api",
9+
"The agent detected and used pre-split chunk files at /tmp/drafter_chunk_*.diff for delegation rather than splitting the diff itself",
10+
"The drafter response is valid JSON containing a 'findings' array and a 'summary' field",
11+
"At least one finding flags the SQL injection vulnerability in pkg/storage/db.go where user input is interpolated into a SQL query via fmt.Sprintf",
12+
"The SQL injection finding has severity 'high' because unsanitized user input in SQL queries is a critical security vulnerability"
13+
]
14+
},
15+
"messages": [
16+
{
17+
"message": {
18+
"agentName": "",
19+
"message": {
20+
"role": "user",
21+
"content": "Review the following PR.\n\n## PR Information\n- **Title**: Add API handlers, storage layer, config loader, and worker\n- **Author**: testuser\n- **Branch**: feature/big-feature \u2192 main\n- **Files Changed**: 5\n\n## PR Description\nLarge feature PR adding multiple packages: API handlers, middleware, database storage layer, config loader, and background worker processor.\n\n## Changed Files\n\npkg/api/handlers.go\npkg/api/middleware.go\npkg/storage/db.go\npkg/config/loader.go\ninternal/worker/processor.go\n\n---\n\n## Instructions\n\nExecute the review pipeline:\n\n1. **Gather**: Read the pre-fetched `pr.diff` file. If missing, run `gh pr diff` (use the full URL, not just the number)\n2. **Draft**: Delegate to `drafter` agent to generate bug hypotheses\n3. **Verify**: For each hypothesis, delegate to `verifier` agent\n4. **Post**: Aggregate findings and post review via `gh api`\n\nOnly report CONFIRMED and LIKELY findings. Always post as COMMENT (never APPROVE or REQUEST_CHANGES)."
22+
}
23+
}
24+
}
25+
]
26+
}

review-pr/agents/pr-review.yaml

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -87,18 +87,13 @@ agents:
8787
```
8888
3. Use `get_memories` to check for any learned patterns from previous feedback
8989
4. **Delegate to drafter(s)**:
90-
a. Count the total lines in the diff (e.g., `wc -l pr.diff`).
91-
b. If the diff is ≤ 1500 lines, write it to `/tmp/drafter_chunk_1.diff` and delegate
92-
to the drafter in a single call.
93-
c. If the diff is > 1500 lines:
94-
- Split the diff at file boundaries (`diff --git` headers) using shell commands.
95-
- Group files into chunks targeting ~1000 lines each. Keep related files
96-
together when possible (same directory).
97-
- Write each chunk to `/tmp/drafter_chunk_N.diff` (numbered sequentially).
98-
- Delegate each chunk to the drafter separately.
99-
- Merge all findings arrays into a single list before proceeding to step 5.
100-
- Combine summaries into one overall summary.
101-
d. Include any relevant learned patterns from memory in each delegation.
90+
a. Check for pre-split chunk files at `/tmp/drafter_chunk_*.diff`
91+
(the CI workflow splits the diff deterministically before the agent runs).
92+
b. For each chunk file, delegate to the drafter with the file path.
93+
c. If no chunk files exist (console mode), read the diff and write it
94+
to `/tmp/drafter_chunk_1.diff` yourself, then delegate.
95+
d. Merge all findings arrays into a single list before proceeding.
96+
e. Include any relevant learned patterns from memory in each delegation.
10297
10398
## CRITICAL: How to delegate to the drafter
10499
@@ -114,6 +109,9 @@ agents:
114109
Project context (from AGENTS.md):
115110
<contents of AGENTS.md, or "No AGENTS.md found" if absent>
116111
112+
File risk scores (prioritize high-risk files):
113+
<contents of /tmp/file_risk_scores.json, or "No risk scores available" if absent>
114+
117115
File history (recent commits):
118116
<contents of /tmp/file_history.txt, or "No file history available" if absent>
119117
@@ -122,7 +120,11 @@ agents:
122120
123121
The drafter has `read_file` access and will read the chunk from disk. Keep the
124122
delegation message short — just the file path, chunk number, project context,
125-
file history, and any learned patterns.
123+
risk scores, file history, and any learned patterns.
124+
125+
If `/tmp/file_risk_scores.json` exists, read it and include the scores. The drafter
126+
should spend more time on high-scoring files. If it doesn't exist (console mode),
127+
skip it.
126128
127129
If `/tmp/file_history.txt` exists, read it and include relevant entries for the
128130
files in the current chunk. This helps the drafter understand what kinds of bugs

0 commit comments

Comments
 (0)