Skip to content

⚡ Claude Token Optimization2026-06-26 — Smoke Claude #5557

Description

@github-actions

Target Workflow: Smoke Claude

Source report: #5555
Workflow file: .github/workflows/smoke-claude.md
Estimated cost per run: ~53.5 AIC (instrumented avg; USD/token data unavailable — AIC only)
Total AIC today: 749.0 units across 16 runs (14 instrumented)
Actions minutes today: 85 min total (~5.3/run avg)
GitHub API calls today: 72 total (avg 4.5/run)
LLM turns: 1–2 (estimated); max-turns: 8 still configured ⚠️
Previous issue: #5477 (closed 2026-06-25 — all 3 recommendations remain unimplemented)


Current Configuration

Setting Value
Model claude-haiku-4-5
Tools loaded 1 — bash only
Tools actually used 1 — bash
GitHub tools false (disabled ✅)
Pre-agent steps Yes — 5 steps ✅
Prompt body size ~800 chars (concise ✅)
max-turns 8 ⚠️ carry-over from #5477
Pre-fetch step scope All triggers (schedule + PR) ⚠️
Prompt HTML comment 8-line developer note in agent context ⚠️

The workflow is fundamentally well-structured (minimal tools, heavy pre-computation). Three carry-over items from #5477 remain unimplemented. Additionally, one new optimization opportunity was identified from today's bimodal AIC analysis.


Recommendations

1. Reduce max-turns: 8max-turns: 2 (carry-over from #5477)

Estimated savings: ~15–25% AIC reduction on multi-turn runs; caps the bimodal high-cost group

The smoke test workflow requires at most 2 agent turns:

  1. bash: cat /tmp/gh-aw/agent/final-result.json
  2. safeoutputs: add_comment / noop

The current ceiling of 8 allows 6 unnecessary turns. Today's bimodal AIC data shows 4 runs at ~37 AIC and 10 runs at ~61 AIC — a 65% premium on the high-cost group. While the root cause is not fully isolated, max-turns: 8 means that if the model retries or re-reads the JSON, AIC compounds with no upside. Reducing to 2 eliminates the retry tail entirely.

Implementation — in .github/workflows/smoke-claude.md:

-max-turns: 8
+max-turns: 2

Then recompile:

gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts

2. Make Pre-fetch GitHub API data step conditional on pull_request trigger (new)

Estimated savings: ~1 GitHub API call per schedule run; eliminates dead computation on 2 runs/day

The "Pre-fetch GitHub API data" step runs gh pr list on all triggers, including the twice-daily schedule. For schedule runs, the final prompt instructs the agent to call noop (no PR context), so recent-prs.json is fetched, fed through the "Compute final smoke result" step, and never used by the agent.

Today: 2 schedule runs averaged 10 GitHub API calls each vs 3.7 for PR runs — a 2.5× overhead that persists from the previous report. While the full 10-call gap has other contributors (checkout, artifact steps in verify_token_usage job), the unconditional pre-fetch step is one avoidable call.

Implementation — in .github/workflows/smoke-claude.md, add a conditional to the pre-fetch step:

   - name: Pre-fetch GitHub API data
+    if: github.event_name == 'pull_request'
     run: |
       gh pr list --repo $EXPR_GITHUB_REPOSITORY --limit 2 --state merged --json number,title,mergedAt \
         > /tmp/gh-aw/agent/recent-prs.json
       echo "GitHub API pre-check: $(wc -c < /tmp/gh-aw/agent/recent-prs.json) bytes"

Also update the "Compute final smoke result" step to handle the missing file on schedule runs:

-      API_COUNT=$(jq 'length' /tmp/gh-aw/agent/recent-prs.json)
+      API_COUNT=$([ -f /tmp/gh-aw/agent/recent-prs.json ] && jq 'length' /tmp/gh-aw/agent/recent-prs.json || echo 0)

3. Trim the 8-line load-bearing HTML comment from the prompt body (carry-over from #5477)

Estimated savings: ~125 tokens/run (~1–2 AIC/run; ~16–32 AIC/day at current volume)

The prompt body contains a developer note explaining why ${{ github.run_id }} is present:

<!--
  The `${{ github.run_id }}` reference below is intentional and load-bearing.
  gh-aw only emits the prompt "Interpolate variables and render templates" step
  (which resolves `{{#runtime-import}}` directives) when the prompt body contains
  a GitHub Actions expression. Without it, this workflow's self-import is left
  literal, the agent receives no task, and it calls `noop` — failing the
  pull_request `add_comment` post-check. Run: ${{ github.run_id }}
-->

This is developer documentation, not agent-facing instructions. The agent receives this verbatim on every run. Replace the full block with the minimum expression needed to trigger template rendering:

-<!--
-  The `${{ github.run_id }}` reference below is intentional and load-bearing.
-  gh-aw only emits the prompt "Interpolate variables and render templates" step
-  (which resolves `{{#runtime-import}}` directives) when the prompt body contains
-  a GitHub Actions expression. Without it, this workflow's self-import is left
-  literal, the agent receives no task, and it calls `noop` — failing the
-  pull_request `add_comment` post-check. Run: ${{ github.run_id }}
--->
+<!-- run: ${{ github.run_id }} -->

This preserves the load-bearing ${{ github.run_id }} expression while eliminating ~500 characters of developer documentation from the agent's input context on every run.


4. Investigate bimodal AIC distribution (37 vs 61) (new — follow-up from #5555)

Potential savings: ~24 AIC/run on the high-cost group if root-caused (~40% of runs)

Today's 14 instrumented runs split cleanly into two groups:

Group AIC range Runs Branch pattern
Low 36.5–37.4 4 extract-sliding-window, optimize-duplicate-code-detector, refactor-split-agent-volumes-mounts-test, update-runner-doctor-a12
High 56.7–61.8 10 refactor-split-writeconfigs, fix-duplicate-oauth-header, duplicate-port-validation, fix-firewall-logs-eaccess, schedule runs

The low group consists of refactors to smaller, isolated files. The high group includes more complex cross-module changes, network fixes, and the scheduled runs. The AIC gap (~24 units, 65% overhead) is consistent with the agent using more turns or receiving more context on "heavy" PRs.

Diagnostic approach:

# Compare agent artifacts from a low-AIC vs high-AIC run
gh run download 28180388529 --name agent --dir /tmp/low-aic    # 37.2 AIC
gh run download 28179615485 --name agent --dir /tmp/high-aic   # 61.4 AIC
wc -c /tmp/low-aic/final-result.json /tmp/high-aic/final-result.json
# Check for turn count difference
cat /tmp/low-aic/usage.jsonl 2>/dev/null || echo "no usage data"
cat /tmp/high-aic/usage.jsonl 2>/dev/null || echo "no usage data"

If max-turns is reduced to 2 (Recommendation 1) and the bimodal pattern persists, then the prompt itself may be loading different content per PR — investigate whether the smoke-context.txt or final-result.json content varies by PR complexity.


Cache Analysis (Anthropic-Specific)

⚠️ Per-turn token data (cache_read, cache_write, input_tokens, output_tokens) was not available — only AIC units are present. The --enable-api-proxy sidecar instrumentation coverage improved from 50% (yesterday) to 87.5% (14/16 runs today), which is a significant improvement.

What we know from AIC patterns:

  • AIC is proportional to inference cost; today's 53.5 avg AIC/run is down slightly from 54.9 yesterday (−2.5%)
  • The bimodal distribution (37 vs 61 AIC) may reflect cache misses on cold-start runs vs warm-cache runs — Anthropic's automatic cache TTL is ~5 min
  • Schedule runs (2× daily) always start cold; their 10 API calls suggest more steps executed, but AIC (56.7 avg) is not dramatically higher than PR runs (53.4 avg)

When per-turn data becomes available (requires --enable-api-proxy consistently), check:

  • Cache write vs read split per turn (Anthropic charges 12.5× more for writes than reads at Sonnet pricing)
  • Whether Turn 1 cache writes are reused by Turn 2 in 2-turn runs
  • claude-haiku-4-5 cache pricing: write $1.00/M tokens, read $0.08/M tokens (vs Sonnet $3.75/$0.30)

Expected Impact

Metric Current Projected Change
AIC/run (avg) ~53.5 ~40–45 −15 to −25%
AIC/run (high group) ~61 ~40–45 ~−30%
max-turns ceiling 8 2 −75%
Prompt size ~800 chars ~675 chars −16%
Schedule API calls/run ~10 ~9 −10%
Instrumentation coverage 87.5% 87.5% (no change from this PR)

Implementation Checklist

  • max-turns: 8max-turns: 2 in smoke-claude.md
  • Add if: github.event_name == 'pull_request' to "Pre-fetch GitHub API data" step
  • Guard API_COUNT computation against missing recent-prs.json on schedule runs
  • Replace 8-line HTML comment with single-line <!-- run: ${{ github.run_id }} -->
  • Recompile: gh aw compile .github/workflows/smoke-claude.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Open PR and verify CI passes
  • Compare AIC on next run vs 53.5 baseline; target ≤45 AIC/run
  • Once per-turn token data available, validate cache write/read ratio

Generated by Daily Claude Token Optimization Advisor · 59 AIC · ⊞ 6.6K ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions