Target Workflow: Smoke Claude
Source report: #5555
Workflow file: .github/workflows/smoke-claude.md
Estimated cost per run: ~53.5 AIC (instrumented avg; USD/token data unavailable — AIC only)
Total AIC today: 749.0 units across 16 runs (14 instrumented)
Actions minutes today: 85 min total (~5.3/run avg)
GitHub API calls today: 72 total (avg 4.5/run)
LLM turns: 1–2 (estimated); max-turns: 8 still configured ⚠️
Previous issue: #5477 (closed 2026-06-25 — all 3 recommendations remain unimplemented)
Current Configuration
| Setting |
Value |
| Model |
claude-haiku-4-5 |
| Tools loaded |
1 — bash only |
| Tools actually used |
1 — bash |
| GitHub tools |
false (disabled ✅) |
| Pre-agent steps |
Yes — 5 steps ✅ |
| Prompt body size |
~800 chars (concise ✅) |
max-turns |
8 ⚠️ carry-over from #5477 |
| Pre-fetch step scope |
All triggers (schedule + PR) ⚠️ |
| Prompt HTML comment |
8-line developer note in agent context ⚠️ |
The workflow is fundamentally well-structured (minimal tools, heavy pre-computation). Three carry-over items from #5477 remain unimplemented. Additionally, one new optimization opportunity was identified from today's bimodal AIC analysis.
Recommendations
1. Reduce max-turns: 8 → max-turns: 2 (carry-over from #5477)
Estimated savings: ~15–25% AIC reduction on multi-turn runs; caps the bimodal high-cost group
The smoke test workflow requires at most 2 agent turns:
bash: cat /tmp/gh-aw/agent/final-result.json
safeoutputs: add_comment / noop
The current ceiling of 8 allows 6 unnecessary turns. Today's bimodal AIC data shows 4 runs at ~37 AIC and 10 runs at ~61 AIC — a 65% premium on the high-cost group. While the root cause is not fully isolated, max-turns: 8 means that if the model retries or re-reads the JSON, AIC compounds with no upside. Reducing to 2 eliminates the retry tail entirely.
Implementation — in .github/workflows/smoke-claude.md:
-max-turns: 8
+max-turns: 2
Then recompile:
gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts
2. Make Pre-fetch GitHub API data step conditional on pull_request trigger (new)
Estimated savings: ~1 GitHub API call per schedule run; eliminates dead computation on 2 runs/day
The "Pre-fetch GitHub API data" step runs gh pr list on all triggers, including the twice-daily schedule. For schedule runs, the final prompt instructs the agent to call noop (no PR context), so recent-prs.json is fetched, fed through the "Compute final smoke result" step, and never used by the agent.
Today: 2 schedule runs averaged 10 GitHub API calls each vs 3.7 for PR runs — a 2.5× overhead that persists from the previous report. While the full 10-call gap has other contributors (checkout, artifact steps in verify_token_usage job), the unconditional pre-fetch step is one avoidable call.
Implementation — in .github/workflows/smoke-claude.md, add a conditional to the pre-fetch step:
- name: Pre-fetch GitHub API data
+ if: github.event_name == 'pull_request'
run: |
gh pr list --repo $EXPR_GITHUB_REPOSITORY --limit 2 --state merged --json number,title,mergedAt \
> /tmp/gh-aw/agent/recent-prs.json
echo "GitHub API pre-check: $(wc -c < /tmp/gh-aw/agent/recent-prs.json) bytes"
Also update the "Compute final smoke result" step to handle the missing file on schedule runs:
- API_COUNT=$(jq 'length' /tmp/gh-aw/agent/recent-prs.json)
+ API_COUNT=$([ -f /tmp/gh-aw/agent/recent-prs.json ] && jq 'length' /tmp/gh-aw/agent/recent-prs.json || echo 0)
3. Trim the 8-line load-bearing HTML comment from the prompt body (carry-over from #5477)
Estimated savings: ~125 tokens/run (~1–2 AIC/run; ~16–32 AIC/day at current volume)
The prompt body contains a developer note explaining why ${{ github.run_id }} is present:
<!--
The `${{ github.run_id }}` reference below is intentional and load-bearing.
gh-aw only emits the prompt "Interpolate variables and render templates" step
(which resolves `{{#runtime-import}}` directives) when the prompt body contains
a GitHub Actions expression. Without it, this workflow's self-import is left
literal, the agent receives no task, and it calls `noop` — failing the
pull_request `add_comment` post-check. Run: ${{ github.run_id }}
-->
This is developer documentation, not agent-facing instructions. The agent receives this verbatim on every run. Replace the full block with the minimum expression needed to trigger template rendering:
-<!--
- The `${{ github.run_id }}` reference below is intentional and load-bearing.
- gh-aw only emits the prompt "Interpolate variables and render templates" step
- (which resolves `{{#runtime-import}}` directives) when the prompt body contains
- a GitHub Actions expression. Without it, this workflow's self-import is left
- literal, the agent receives no task, and it calls `noop` — failing the
- pull_request `add_comment` post-check. Run: ${{ github.run_id }}
--->
+<!-- run: ${{ github.run_id }} -->
This preserves the load-bearing ${{ github.run_id }} expression while eliminating ~500 characters of developer documentation from the agent's input context on every run.
4. Investigate bimodal AIC distribution (37 vs 61) (new — follow-up from #5555)
Potential savings: ~24 AIC/run on the high-cost group if root-caused (~40% of runs)
Today's 14 instrumented runs split cleanly into two groups:
| Group |
AIC range |
Runs |
Branch pattern |
| Low |
36.5–37.4 |
4 |
extract-sliding-window, optimize-duplicate-code-detector, refactor-split-agent-volumes-mounts-test, update-runner-doctor-a12 |
| High |
56.7–61.8 |
10 |
refactor-split-writeconfigs, fix-duplicate-oauth-header, duplicate-port-validation, fix-firewall-logs-eaccess, schedule runs |
The low group consists of refactors to smaller, isolated files. The high group includes more complex cross-module changes, network fixes, and the scheduled runs. The AIC gap (~24 units, 65% overhead) is consistent with the agent using more turns or receiving more context on "heavy" PRs.
Diagnostic approach:
# Compare agent artifacts from a low-AIC vs high-AIC run
gh run download 28180388529 --name agent --dir /tmp/low-aic # 37.2 AIC
gh run download 28179615485 --name agent --dir /tmp/high-aic # 61.4 AIC
wc -c /tmp/low-aic/final-result.json /tmp/high-aic/final-result.json
# Check for turn count difference
cat /tmp/low-aic/usage.jsonl 2>/dev/null || echo "no usage data"
cat /tmp/high-aic/usage.jsonl 2>/dev/null || echo "no usage data"
If max-turns is reduced to 2 (Recommendation 1) and the bimodal pattern persists, then the prompt itself may be loading different content per PR — investigate whether the smoke-context.txt or final-result.json content varies by PR complexity.
Cache Analysis (Anthropic-Specific)
⚠️ Per-turn token data (cache_read, cache_write, input_tokens, output_tokens) was not available — only AIC units are present. The --enable-api-proxy sidecar instrumentation coverage improved from 50% (yesterday) to 87.5% (14/16 runs today), which is a significant improvement.
What we know from AIC patterns:
- AIC is proportional to inference cost; today's 53.5 avg AIC/run is down slightly from 54.9 yesterday (−2.5%)
- The bimodal distribution (37 vs 61 AIC) may reflect cache misses on cold-start runs vs warm-cache runs — Anthropic's automatic cache TTL is ~5 min
- Schedule runs (2× daily) always start cold; their 10 API calls suggest more steps executed, but AIC (56.7 avg) is not dramatically higher than PR runs (53.4 avg)
When per-turn data becomes available (requires --enable-api-proxy consistently), check:
- Cache write vs read split per turn (Anthropic charges 12.5× more for writes than reads at Sonnet pricing)
- Whether Turn 1 cache writes are reused by Turn 2 in 2-turn runs
claude-haiku-4-5 cache pricing: write $1.00/M tokens, read $0.08/M tokens (vs Sonnet $3.75/$0.30)
Expected Impact
| Metric |
Current |
Projected |
Change |
| AIC/run (avg) |
~53.5 |
~40–45 |
−15 to −25% |
| AIC/run (high group) |
~61 |
~40–45 |
~−30% |
| max-turns ceiling |
8 |
2 |
−75% |
| Prompt size |
~800 chars |
~675 chars |
−16% |
| Schedule API calls/run |
~10 |
~9 |
−10% |
| Instrumentation coverage |
87.5% |
87.5% |
(no change from this PR) |
Implementation Checklist
Generated by Daily Claude Token Optimization Advisor · 59 AIC · ⊞ 6.6K · ◷
Target Workflow:
Smoke ClaudeSource report: #5555⚠️
Workflow file:
.github/workflows/smoke-claude.mdEstimated cost per run: ~53.5 AIC (instrumented avg; USD/token data unavailable — AIC only)
Total AIC today: 749.0 units across 16 runs (14 instrumented)
Actions minutes today: 85 min total (~5.3/run avg)
GitHub API calls today: 72 total (avg 4.5/run)
LLM turns: 1–2 (estimated);
max-turns: 8still configuredPrevious issue: #5477 (closed 2026-06-25 — all 3 recommendations remain unimplemented)
Current Configuration
claude-haiku-4-5bashonlybashfalse(disabled ✅)max-turnsThe workflow is fundamentally well-structured (minimal tools, heavy pre-computation). Three carry-over items from #5477 remain unimplemented. Additionally, one new optimization opportunity was identified from today's bimodal AIC analysis.
Recommendations
1. Reduce
max-turns: 8→max-turns: 2(carry-over from #5477)Estimated savings: ~15–25% AIC reduction on multi-turn runs; caps the bimodal high-cost group
The smoke test workflow requires at most 2 agent turns:
bash:cat /tmp/gh-aw/agent/final-result.jsonsafeoutputs:add_comment/noopThe current ceiling of 8 allows 6 unnecessary turns. Today's bimodal AIC data shows 4 runs at ~37 AIC and 10 runs at ~61 AIC — a 65% premium on the high-cost group. While the root cause is not fully isolated,
max-turns: 8means that if the model retries or re-reads the JSON, AIC compounds with no upside. Reducing to 2 eliminates the retry tail entirely.Implementation — in
.github/workflows/smoke-claude.md:Then recompile:
2. Make
Pre-fetch GitHub API datastep conditional onpull_requesttrigger (new)Estimated savings: ~1 GitHub API call per schedule run; eliminates dead computation on 2 runs/day
The "Pre-fetch GitHub API data" step runs
gh pr liston all triggers, including the twice-daily schedule. For schedule runs, the final prompt instructs the agent to callnoop(no PR context), sorecent-prs.jsonis fetched, fed through the "Compute final smoke result" step, and never used by the agent.Today: 2 schedule runs averaged 10 GitHub API calls each vs 3.7 for PR runs — a 2.5× overhead that persists from the previous report. While the full 10-call gap has other contributors (checkout, artifact steps in
verify_token_usagejob), the unconditional pre-fetch step is one avoidable call.Implementation — in
.github/workflows/smoke-claude.md, add a conditional to the pre-fetch step:- name: Pre-fetch GitHub API data + if: github.event_name == 'pull_request' run: | gh pr list --repo $EXPR_GITHUB_REPOSITORY --limit 2 --state merged --json number,title,mergedAt \ > /tmp/gh-aw/agent/recent-prs.json echo "GitHub API pre-check: $(wc -c < /tmp/gh-aw/agent/recent-prs.json) bytes"Also update the "Compute final smoke result" step to handle the missing file on schedule runs:
3. Trim the 8-line load-bearing HTML comment from the prompt body (carry-over from #5477)
Estimated savings: ~125 tokens/run (~1–2 AIC/run; ~16–32 AIC/day at current volume)
The prompt body contains a developer note explaining why
${{ github.run_id }}is present:This is developer documentation, not agent-facing instructions. The agent receives this verbatim on every run. Replace the full block with the minimum expression needed to trigger template rendering:
This preserves the load-bearing
${{ github.run_id }}expression while eliminating ~500 characters of developer documentation from the agent's input context on every run.4. Investigate bimodal AIC distribution (37 vs 61) (new — follow-up from #5555)
Potential savings: ~24 AIC/run on the high-cost group if root-caused (~40% of runs)
Today's 14 instrumented runs split cleanly into two groups:
extract-sliding-window,optimize-duplicate-code-detector,refactor-split-agent-volumes-mounts-test,update-runner-doctor-a12refactor-split-writeconfigs,fix-duplicate-oauth-header,duplicate-port-validation,fix-firewall-logs-eaccess, schedule runsThe low group consists of refactors to smaller, isolated files. The high group includes more complex cross-module changes, network fixes, and the scheduled runs. The AIC gap (~24 units, 65% overhead) is consistent with the agent using more turns or receiving more context on "heavy" PRs.
Diagnostic approach:
If
max-turnsis reduced to 2 (Recommendation 1) and the bimodal pattern persists, then the prompt itself may be loading different content per PR — investigate whether thesmoke-context.txtorfinal-result.jsoncontent varies by PR complexity.Cache Analysis (Anthropic-Specific)
What we know from AIC patterns:
When per-turn data becomes available (requires
--enable-api-proxyconsistently), check:claude-haiku-4-5cache pricing: write $1.00/M tokens, read $0.08/M tokens (vs Sonnet $3.75/$0.30)Expected Impact
Implementation Checklist
max-turns: 8→max-turns: 2insmoke-claude.mdif: github.event_name == 'pull_request'to "Pre-fetch GitHub API data" stepAPI_COUNTcomputation against missingrecent-prs.jsonon schedule runs<!-- run: ${{ github.run_id }} -->gh aw compile .github/workflows/smoke-claude.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts