⚡ Claude Token Optimization2026-06-26 — Smoke Claude

## Target Workflow: `Smoke Claude`

**Source report:** #5555
**Workflow file:** `.github/workflows/smoke-claude.md`
**Estimated cost per run:** ~53.5 AIC (instrumented avg; USD/token data unavailable — AIC only)
**Total AIC today:** 749.0 units across 16 runs (14 instrumented)
**Actions minutes today:** 85 min total (~5.3/run avg)
**GitHub API calls today:** 72 total (avg 4.5/run)
**LLM turns:** 1–2 (estimated); `max-turns: 8` still configured ⚠️
**Previous issue:** #5477 (closed 2026-06-25 — all 3 recommendations remain unimplemented)

---

## Current Configuration

| Setting | Value |
|---------|-------|
| Model | `claude-haiku-4-5` |
| Tools loaded | 1 — `bash` only |
| Tools actually used | 1 — `bash` |
| GitHub tools | `false` (disabled ✅) |
| Pre-agent steps | Yes — 5 steps ✅ |
| Prompt body size | ~800 chars (concise ✅) |
| `max-turns` | **8** ⚠️ carry-over from #5477 |
| Pre-fetch step scope | All triggers (schedule + PR) ⚠️ |
| Prompt HTML comment | 8-line developer note in agent context ⚠️ |

The workflow is fundamentally well-structured (minimal tools, heavy pre-computation). Three carry-over items from #5477 remain unimplemented. Additionally, one new optimization opportunity was identified from today's bimodal AIC analysis.

---

## Recommendations

### 1. Reduce `max-turns: 8` → `max-turns: 2` *(carry-over from #5477)*

**Estimated savings:** ~15–25% AIC reduction on multi-turn runs; caps the bimodal high-cost group

The smoke test workflow requires at most 2 agent turns:
1. `bash`: `cat /tmp/gh-aw/agent/final-result.json`
2. `safeoutputs`: `add_comment` / `noop`

The current ceiling of 8 allows 6 unnecessary turns. Today's bimodal AIC data shows 4 runs at ~37 AIC and 10 runs at ~61 AIC — a **65% premium** on the high-cost group. While the root cause is not fully isolated, `max-turns: 8` means that if the model retries or re-reads the JSON, AIC compounds with no upside. Reducing to 2 eliminates the retry tail entirely.

**Implementation** — in `.github/workflows/smoke-claude.md`:
```diff
-max-turns: 8
+max-turns: 2
```

Then recompile:
```bash
gh aw compile .github/workflows/smoke-claude.md
npx tsx scripts/ci/postprocess-smoke-workflows.ts
```

---

### 2. Make `Pre-fetch GitHub API data` step conditional on `pull_request` trigger *(new)*

**Estimated savings:** ~1 GitHub API call per schedule run; eliminates dead computation on 2 runs/day

The "Pre-fetch GitHub API data" step runs `gh pr list` on **all** triggers, including the twice-daily schedule. For schedule runs, the final prompt instructs the agent to call `noop` (no PR context), so `recent-prs.json` is fetched, fed through the "Compute final smoke result" step, and never used by the agent.

Today: 2 schedule runs averaged **10 GitHub API calls each** vs 3.7 for PR runs — a 2.5× overhead that persists from the previous report. While the full 10-call gap has other contributors (checkout, artifact steps in `verify_token_usage` job), the unconditional pre-fetch step is one avoidable call.

**Implementation** — in `.github/workflows/smoke-claude.md`, add a conditional to the pre-fetch step:
```diff
   - name: Pre-fetch GitHub API data
+    if: github.event_name == 'pull_request'
     run: |
       gh pr list --repo $EXPR_GITHUB_REPOSITORY --limit 2 --state merged --json number,title,mergedAt \
         > /tmp/gh-aw/agent/recent-prs.json
       echo "GitHub API pre-check: $(wc -c < /tmp/gh-aw/agent/recent-prs.json) bytes"
```

Also update the "Compute final smoke result" step to handle the missing file on schedule runs:
```diff
-      API_COUNT=$(jq 'length' /tmp/gh-aw/agent/recent-prs.json)
+      API_COUNT=$([ -f /tmp/gh-aw/agent/recent-prs.json ] && jq 'length' /tmp/gh-aw/agent/recent-prs.json || echo 0)
```

---

### 3. Trim the 8-line load-bearing HTML comment from the prompt body *(carry-over from #5477)*

**Estimated savings:** ~125 tokens/run (~1–2 AIC/run; ~16–32 AIC/day at current volume)

The prompt body contains a developer note explaining why `${{ github.run_id }}` is present:

```html

```

This is developer documentation, not agent-facing instructions. The agent receives this verbatim on every run. Replace the full block with the minimum expression needed to trigger template rendering:

```diff
-
+
```

This preserves the load-bearing `${{ github.run_id }}` expression while eliminating ~500 characters of developer documentation from the agent's input context on every run.

---

### 4. Investigate bimodal AIC distribution (37 vs 61) *(new — follow-up from #5555)*

**Potential savings:** ~24 AIC/run on the high-cost group if root-caused (~40% of runs)

Today's 14 instrumented runs split cleanly into two groups:

| Group | AIC range | Runs | Branch pattern |
|-------|-----------|------|----------------|
| Low | 36.5–37.4 | 4 | `extract-sliding-window`, `optimize-duplicate-code-detector`, `refactor-split-agent-volumes-mounts-test`, `update-runner-doctor-a12` |
| High | 56.7–61.8 | 10 | `refactor-split-writeconfigs`, `fix-duplicate-oauth-header`, `duplicate-port-validation`, `fix-firewall-logs-eaccess`, schedule runs |

The low group consists of refactors to smaller, isolated files. The high group includes more complex cross-module changes, network fixes, and the scheduled runs. The AIC gap (~24 units, 65% overhead) is consistent with the agent using more turns or receiving more context on "heavy" PRs.

**Diagnostic approach:**
```bash
# Compare agent artifacts from a low-AIC vs high-AIC run
gh run download 28180388529 --name agent --dir /tmp/low-aic    # 37.2 AIC
gh run download 28179615485 --name agent --dir /tmp/high-aic   # 61.4 AIC
wc -c /tmp/low-aic/final-result.json /tmp/high-aic/final-result.json
# Check for turn count difference
cat /tmp/low-aic/usage.jsonl 2>/dev/null || echo "no usage data"
cat /tmp/high-aic/usage.jsonl 2>/dev/null || echo "no usage data"
```

If `max-turns` is reduced to 2 (Recommendation 1) and the bimodal pattern persists, then the prompt itself may be loading different content per PR — investigate whether the `smoke-context.txt` or `final-result.json` content varies by PR complexity.

---

## Cache Analysis (Anthropic-Specific)

> ⚠️ Per-turn token data (`cache_read`, `cache_write`, `input_tokens`, `output_tokens`) was **not** available — only AIC units are present. The `--enable-api-proxy` sidecar instrumentation coverage improved from 50% (yesterday) to **87.5%** (14/16 runs today), which is a significant improvement.

**What we know from AIC patterns:**
- AIC is proportional to inference cost; today's 53.5 avg AIC/run is down slightly from 54.9 yesterday (−2.5%)
- The bimodal distribution (37 vs 61 AIC) may reflect cache misses on cold-start runs vs warm-cache runs — Anthropic's automatic cache TTL is ~5 min
- Schedule runs (2× daily) always start cold; their 10 API calls suggest more steps executed, but AIC (56.7 avg) is not dramatically higher than PR runs (53.4 avg)

**When per-turn data becomes available** (requires `--enable-api-proxy` consistently), check:
- Cache write vs read split per turn (Anthropic charges 12.5× more for writes than reads at Sonnet pricing)
- Whether Turn 1 cache writes are reused by Turn 2 in 2-turn runs
- `claude-haiku-4-5` cache pricing: write $1.00/M tokens, read $0.08/M tokens (vs Sonnet $3.75/$0.30)

---

## Expected Impact

| Metric | Current | Projected | Change |
|--------|---------|-----------|--------|
| AIC/run (avg) | ~53.5 | ~40–45 | −15 to −25% |
| AIC/run (high group) | ~61 | ~40–45 | ~−30% |
| max-turns ceiling | 8 | 2 | −75% |
| Prompt size | ~800 chars | ~675 chars | −16% |
| Schedule API calls/run | ~10 | ~9 | −10% |
| Instrumentation coverage | 87.5% | 87.5% | (no change from this PR) |

---

## Implementation Checklist

- [ ] `max-turns: 8` → `max-turns: 2` in `smoke-claude.md`
- [ ] Add `if: github.event_name == 'pull_request'` to "Pre-fetch GitHub API data" step
- [ ] Guard `API_COUNT` computation against missing `recent-prs.json` on schedule runs
- [ ] Replace 8-line HTML comment with single-line ``
- [ ] Recompile: `gh aw compile .github/workflows/smoke-claude.md`
- [ ] Post-process: `npx tsx scripts/ci/postprocess-smoke-workflows.ts`
- [ ] Open PR and verify CI passes
- [ ] Compare AIC on next run vs 53.5 baseline; target ≤45 AIC/run
- [ ] Once per-turn token data available, validate cache write/read ratio




> Generated by [Daily Claude Token Optimization Advisor](https://github.com/github/gh-aw-firewall/actions/runs/28229499622) · 59 AIC · ⊞ 6.6K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fclaude-token-optimizer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡ Claude Token Optimization2026-06-26 — Smoke Claude #5557

Target Workflow: `Smoke Claude`

Current Configuration

Recommendations

1. Reduce `max-turns: 8` → `max-turns: 2` (carry-over from #5477)

2. Make `Pre-fetch GitHub API data` step conditional on `pull_request` trigger (new)

3. Trim the 8-line load-bearing HTML comment from the prompt body (carry-over from #5477)

4. Investigate bimodal AIC distribution (37 vs 61) (new — follow-up from #5555)

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Value
Model	`claude-haiku-4-5`
Tools loaded	1 — `bash` only
Tools actually used	1 — `bash`
GitHub tools	`false` (disabled ✅)
Pre-agent steps	Yes — 5 steps ✅
Prompt body size	~800 chars (concise ✅)
`max-turns`	8 ⚠️ carry-over from #5477
Pre-fetch step scope	All triggers (schedule + PR) ⚠️
Prompt HTML comment	8-line developer note in agent context ⚠️

Group	AIC range	Runs	Branch pattern
Low	36.5–37.4	4	`extract-sliding-window`, `optimize-duplicate-code-detector`, `refactor-split-agent-volumes-mounts-test`, `update-runner-doctor-a12`
High	56.7–61.8	10	`refactor-split-writeconfigs`, `fix-duplicate-oauth-header`, `duplicate-port-validation`, `fix-firewall-logs-eaccess`, schedule runs

Metric	Current	Projected	Change
AIC/run (avg)	~53.5	~40–45	−15 to −25%
AIC/run (high group)	~61	~40–45	~−30%
max-turns ceiling	8	2	−75%
Prompt size	~800 chars	~675 chars	−16%
Schedule API calls/run	~10	~9	−10%
Instrumentation coverage	87.5%	87.5%	(no change from this PR)

Uh oh!

⚡ Claude Token Optimization2026-06-26 — Smoke Claude #5557

Description

Target Workflow: Smoke Claude

Current Configuration

Recommendations

1. Reduce max-turns: 8 → max-turns: 2 (carry-over from #5477)

2. Make Pre-fetch GitHub API data step conditional on pull_request trigger (new)

3. Trim the 8-line load-bearing HTML comment from the prompt body (carry-over from #5477)

4. Investigate bimodal AIC distribution (37 vs 61) (new — follow-up from #5555)

Cache Analysis (Anthropic-Specific)

Expected Impact

Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Target Workflow: `Smoke Claude`

1. Reduce `max-turns: 8` → `max-turns: 2` (carry-over from #5477)

2. Make `Pre-fetch GitHub API data` step conditional on `pull_request` trigger (new)