You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On ARC (Actions Runner Controller) runners with a Docker-in-Docker (DinD) sidecar — i.e. a split runner/daemon filesystem — AWF chroot mode cannot currently run an agent end-to-end. The community thread github/gh-aw#34896 has tracked this layer-by-layer across many gh-aw releases. As of gh-aw v0.81.3 (firewall v0.27.10, mcpg v0.3.30) two of three recent blockers are fixed:
✅ MCP gateway gateway.domain: awmg-mcpg accepted by mcpg v0.3.30.
❌ Remaining blocker: after chrooting into /host, the daemon's base userland is absent:
[entrypoint][WARN] one-shot-token.so failed to load on host dynamic linker (host libc incompatibility, e.g. musl/Alpine)
chroot: failed to run command '/bin/sh': No such file or directory
[entrypoint][ERROR] capsh not found on host system
This issue is a deep-dive on the root cause of the remaining blocker, the security constraints that must shape the fix, the latent gaps queued behind it, and a comprehensive test plan that simulates ARC split-fs conditions (notably an empty mounted /host) so this class of failure is caught in CI instead of being discovered one layer at a time on real runners.
Background: how the chroot /host is assembled
buildSystemMounts() (src/services/agent-volumes/system-mounts.ts:13-37) emits fixed read-only bind mounts for the chroot base system:
On a normal runner the source paths (/usr, /bin, …) are the runner's own glibc userland and everything works. On a split-fs ARC/DinD runner, gh-aw emits --docker-host-path-prefix /tmp/gh-aw, and translateBindMountHostPath() (src/services/host-path-prefix.ts) rewrites every source to the daemon-visible staging root:
(Kernel VFS /dev, /sys, /proc and /dev/null are correctly excluded from prefixing — host-path-prefix.ts:41-48.)
The defect: nothing ever populates/tmp/gh-aw/{usr,bin,lib,…} with a base userland. The mounts point at empty staged directories, so inside the chroot /host/bin/sh and /host/usr/sbin/capsh do not exist. The entrypoint's chroot preflight (containers/agent/entrypoint.sh:681-704) then fails exactly as reported. The "musl/Alpine" wording in the warning is misleading: the reporter's daemon is Debian/glibc with both /bin/sh and capsh present — the chroot simply enters an empty/host, and the generic warning blames musl because no dynamic loader is found at all.
Why no existing AWF primitive fixes this
The thread (and our own docs/arc-dind.md) points at dind.preStageDirs as the staging step. It does not populate the system tree.DEFAULT_PRE_STAGE_DIRS (src/dind-bootstrap.ts:11-19) only mkdirs empty work dirs:
.cache .config .local .local/state home mcp-logs sandbox
stageEngineBinary() stages a single binary. runDindBootstrap() (src/dind-bootstrap.ts:103-127) returns early unless config.dind.preStageDirs/stageEngineBinary is set — and gh-aw does not emit those, so the resolved config shows enableDind=false even when dockerHostPathPrefix is set.
Conclusion: there is no capability today that stages a base userland into the chroot. Any fix that just "emits dind.preStageDirs" will produce empty system dirs and still fail. This is a missing capability, not a config-emission oversight.
Security considerations (these must shape the fix)
The remaining blocker has two superficially attractive fixes that are security-regressive and should be rejected:
Bind the daemon's real /bin, /usr, /lib into /host. This sources the chroot base userland — including the binaries that run before capability drop — from the runner/daemon filesystem, which on ARC is attacker-influenceable (a malicious or compromised DinD image, or anything that can write the shared /tmp/gh-aw emptyDir, controls the code AWF executes as root pre-capsh). This is the exact trust boundary AWF deliberately moved away from in the iptables → network-isolation work: egress/identity enforcement must not depend on untrusted runner-side state. Trusting the daemon rootfs for the chroot base reintroduces that dependency at an even more sensitive point (pre-privilege-drop code execution).
Copy the daemon's userland into the staging root at runtime. Same problem — provenance is the daemon image, not a verified AWF artifact.
Security-preserving direction: source the chroot base userland from AWF's own signed agent image (ghcr.io/github/gh-aw-firewall/agent), which already ships a glibc base + bash + libcap2-bin (capsh) + the loader needed by one-shot-token.so. Two viable mechanisms, both keeping provenance inside AWF's trust boundary:
(A) Self-bind from the agent container. In entrypoint.sh, before chroot, detect an empty/foreign /host and overlay the agent image's own /bin, /usr/sbin/capsh, /lib, loader, and a minimal busybox/coreutils set into /host (e.g. via a writable overlay assembled in /host/tmp and PATH/loader redirection). No daemon trust; the binaries come from the image AWF was built and signed as.
(B) Stage from the signed image via a helper container. Extend dind-bootstrap.ts with a real stageBaseSystem() that runs DEFAULT_STAGING_IMAGE (already ghcr.io/github/gh-aw-firewall/agent:latest, src/dind-bootstrap.ts:8) to copy a curated base userland into the daemon-visible staging root before compose start. Provenance is the AWF image, but it crosses the daemon filesystem — so it must be paired with integrity checks (see below).
Whichever mechanism is chosen, the following invariants must hold and be tested:
The base userland executed before capsh privilege-drop must originate from the AWF-signed image, never from runner/daemon-writable paths.
The staged tree must not be writable by the agent (post-drop UID) at exec time.
Credential-isolation guarantees (procfs hidepid=2, /dev/null credential overlays, /etc/shadow exclusion) must remain intact when /host is synthesized.
If integrity cannot be assured (e.g. an unverifiable shared staging path), AWF should fail closed with a clear diagnostic rather than silently chrooting into an attacker-influenceable /host.
Latent gaps queued behind the current blocker
The thread's recent progression table tracks only three layers (gateway → container start → chroot exec). Once /bin/sh + capsh are present, the originally-enumerated gaps will resurface in order. They should be designed for now, not rediscovered serially:
Engine identity vars through capsh — engine.envHOME/USER/LOGNAME were historically clobbered to the pre-drop values; verify chroot.identity (now emitted by gh-aw) actually wins after the user switch.
/etc/passwd, /etc/group, /etc/hosts synthesis — AWF should synthesize minimal identity + host.docker.internal entries for the UID it switches to, without requiring workflow-level sandbox.agent.mounts.
Threat-detection silent no-op (security regression) — the auto-generated detection job runs without the agent job's pre-steps and, on chroot setup failure (spawn ENOENT), is marked successful because GH_AW_DETECTION_CONTINUE_ON_ERROR !== 'false'. A correctly-configured workflow then believes outputs were screened when the detector no-op'd. AWF/gh-aw must distinguish "engine never started" (fail loud) from "model produced unparseable output" (continue-on-error). This is the highest-severity latent gap.
The meta-gap: no CI reproduces split-fs DinD
Every fix so far has advanced exactly one layer, then a new layer breaks weeks later on real runners — because no automated test reproduces an empty/foreign /host. The existing chroot integration tests (tests/integration/chroot-*.test.ts) and smoke-chroot all run on a normal runner where /host is the runner's own populated glibc tree, so they never exercise the split-fs staging path. The reporter independently noted "why CI likely doesn't catch it" (the chroot patch is gated on a tcp://localhostDOCKER_HOST absent on GitHub-hosted runners). Closing this meta-gap is arguably more valuable than any single layer fix.
Related gap: pre-agent toolchain installs don't reach the chroot on ARC split-fs
A second architectural gap, distinct from the empty-/host base-userland problem above and worth solving in the same effort. The base-userland fix gets the chroot a working /bin/sh + capsh; it does not get a build-test workflow's compilers and SDKs into the chroot.
The mental model that breaks on ARC
For a build-test-style workflow the assumption is: pre-agent steps install packages/toolchains on the host, then the agent sees them via chroot /host. This holds on a normal runner (one filesystem) but breaks on ARC/DinD (two filesystems), because the installs land on the wrong one.
Normal runner — one filesystem:
Pre-agent steps (apt-get install, setup-go/setup-node, npm i -g, tool caches) run in the runner shell, writing the runner's/usr, /opt/hostedtoolcache, $HOME, …
AWF bind-mounts that same FS read-only: /usr:/host/usr:ro, /opt:/host/opt:ro, etc. (src/services/agent-volumes/system-mounts.ts:13-24).
chroot /host = the runner's world → the agent sees everything pre-agent steps installed. ✅
ARC/DinD — runner FS ≠ daemon FS:
Pre-agent workflow steps run in the runner container, on the runner's filesystem — that's where apt/setup-*/tool caches land, same as a normal runner.
AWF's agent container is launched by the daemon (compose over DOCKER_HOST=tcp://…). Its bind-mount sources (/usr, /bin, /opt, …) are resolved by the daemon, against the daemon's filesystem — or, with --docker-host-path-prefix /tmp/gh-aw, against the shared /tmp/gh-aw staging dir.
So chroot /host is assembled from the daemon's world, not the runner's. The toolchains the pre-agent steps installed on the runner are invisible to the agent.
This is upstream Gap 4 (runner-installed copilot not visible in chroot) generalized to every package and toolchain a build-test workflow installs — and it compounds the empty-/host problem: on split-fs the daemon's /tmp/gh-aw/{usr,bin,lib} isn't even populated, so /host is empty rather than "the daemon's toolchain."
What actually crosses the split into the chroot on ARC
Only things on a path both containers can see, or baked/staged into the daemon side:
The workspace and /tmp — ${workspaceDir}:/host…:rw and /tmp:/host/tmp:rw (system-mounts.ts:23-24). In ARC these are typically the shared gh-aw-tmp emptyDir, so writes there are visible.
The runner tool cache, only if explicitly wired — container.runnerToolCachePath (src/awf-config-schema.json:607, src/runner-tool-cache.ts) mounts /opt/hostedtoolcache RO into the chroot. This knob exists specifically because the tool cache doesn't otherwise cross the split — but it only helps if that cache lives on a volume the daemon can also see.
Whatever is baked into the DinD image — which is why the upstream reporter had to build a custom Ubuntu DinD with Node/capsh pre-installed.
Implications for build-test on ARC
A workflow that installs toolchains in pre-agent steps won't expose them to the agent on ARC unless one of:
Install into a shared volume the daemon also mounts (workspace, /tmp/gh-aw, or a shared tool-cache) instead of the runner's /usr//opt.
Bake toolchains into the DinD daemon image.
Stage them daemon-side via a helper container (the manual bootstrap pattern from the upstream thread).
Move the installs inside the agent/chroot (post-firewall, network permitting via the egress allowlist) rather than pre-agent.
Security note (same trust boundary as above)
Staging runner-side toolchains into the daemon-visible path is acceptable because the provenance is the workflow's own pre-agent steps mounted RO — but it must not become a vector for the agent (post privilege-drop UID) to write paths that earlier/other privileged steps then execute. Anything staged for the chroot must be RO at agent exec time, and this must not weaken the /host integrity / fail-closed posture proposed for the base userland.
Suggested scope
Generalize the runnerToolCachePath + awf-runner-bin overlay into a first-class "stage runner toolchains into a daemon-visible chroot path" capability, with build-test as the motivating workflow.
Proposed implementation plan
1. Add a stageBaseSystem() capability sourced from the AWF-signed image
Implement base-userland staging from DEFAULT_STAGING_IMAGE (mechanism A self-bind preferred; B as fallback) in src/dind-bootstrap.ts and/or containers/agent/entrypoint.sh.
Curate the minimal set: dynamic loader + libc/libcap/libutil, /bin/sh (+ bash), capsh, and the coreutils the entrypoint uses (mkdir, chmod, cat, head, tee, cp, tar).
Wire detection: when dockerHostPathPrefix is set (or an empty /host is detected at entrypoint), run staging automatically. Today enableDind=false even with the prefix set — close that half-configured state.
2. Preserve security invariants
Base userland provenance = AWF image only; never daemon/runner-writable paths for pre-drop execution.
Fail-closed diagnostic when /host is empty/foreign and a verified base cannot be staged.
Re-assert procfs hidepid=2, /dev/null credential overlays, and /etc/shadow exclusion under the synthesized /host.
Ensure the engine binary overlay (/host/tmp/awf-runner-bin) is on PATH for agent and detection jobs.
Synthesize /etc/passwd//etc/group//etc/hosts in chroot.
Make threat-detection fail loud on engine-spawn failure (distinguish from parse failure).
4. Comprehensive ARC-simulation test suite (the core deliverable)
Add tests that reproduce split-fs DinD without needing a real ARC cluster:
Empty /host integration test — start the agent with the system mounts pointed at a freshly-created empty staging dir (simulating /tmp/gh-aw/{usr,bin,lib} that was never populated). Assert: (a) without the fix, the chroot preflight fails with the documented /bin/sh/capsh error; (b) with stageBaseSystem(), the agent runs a trivial command to completion inside the chroot.
Foreign/musl /host test — point the base mounts at an Alpine/musl rootfs (or a deliberately-incompatible loader) and assert AWF either stages its own glibc base and succeeds, or fails closed with the actionable diagnostic — never silently proceeds.
Split-fs path-prefix test — exercise translateBindMountHostPath() with --docker-host-path-prefix and assert the staged tree is what the chroot actually enters (no empty-dir passthrough), with kernel VFS still excluded.
Provenance/integrity test — assert the staged base userland originates from the AWF image and that agent-UID is not able to write the staged tree before exec.
Identity-vars probe test — a probe binary prints id/$HOME/$USER/$LOGNAME from inside the chroot; assert chroot.identity values win post-capsh.
Engine-binary visibility test — in simulated chroot mode, the installed copilot is discoverable on PATH from inside /host.
Threat-detection ARC test — with safe-outputs.threat-detection enabled in the simulated split-fs environment: a successful run produces a parseable result; a deliberately unstaged engine causes the detection job to fail, not silently pass.
CI wiring — add a smoke-chroot-style job (or extend the existing one) that runs the empty-/host and foreign-/host scenarios on every PR, so this layer is permanently guarded.
Toolchain-visibility test — install a toolchain in a simulated pre-agent step on a "runner" path distinct from the daemon-visible staging root; assert it is not visible in the chroot by default, and is visible once staged via the capability in step 5.
5. Stage runner toolchains into a daemon-visible chroot path (build-test on ARC)
Generalize runnerToolCachePath + the /host/tmp/awf-runner-bin overlay into a first-class capability that stages workflow-installed toolchains (compilers, SDKs, tool caches) into a daemon-visible path the chroot mounts RO.
Keep provenance = the workflow's own pre-agent steps; staged tree must be RO at agent exec time.
6. Docs
Update docs/arc-dind.md / docs/chroot-mode.md to describe base-userland staging, correct the dind.preStageDirs expectation (it does not stage the system tree), and document the security model (image-sourced base, fail-closed behavior).
Acceptance criteria
On a simulated split-fs runner with an empty mounted /host, the AWF agent chroots and runs a command to completion using an AWF-image-sourced base userland — with no dependency on the daemon's rootfs for pre-capsh execution.
A foreign/musl or unverifiable /host causes a fail-closed error, never a silent chroot into untrusted state.
The safe-outputs.threat-detection job runs end-to-end in the simulated environment and fails loudly if the engine cannot be spawned.
The empty-/host and foreign-/host scenarios run in CI on every PR.
A toolchain installed in a simulated pre-agent step is invisible in the chroot by default and visible once staged via the daemon-visible staging capability (regression-guarded in CI).
Summary
On ARC (Actions Runner Controller) runners with a Docker-in-Docker (DinD) sidecar — i.e. a split runner/daemon filesystem — AWF chroot mode cannot currently run an agent end-to-end. The community thread github/gh-aw#34896 has tracked this layer-by-layer across many gh-aw releases. As of gh-aw v0.81.3 (firewall v0.27.10, mcpg v0.3.30) two of three recent blockers are fixed:
gateway.domain: awmg-mcpgaccepted by mcpg v0.3.30.binariesSourcePathread-only collision fixed in fix(chroot): mount binaries overlay at /host/tmp/awf-runner-bin to avoid read-only /host/usr collision on ARC/DinD #5482 (runner-binaries overlay now mounts at/host/tmp/awf-runner-bin, AWF v0.27.10)./host, the daemon's base userland is absent:This issue is a deep-dive on the root cause of the remaining blocker, the security constraints that must shape the fix, the latent gaps queued behind it, and a comprehensive test plan that simulates ARC split-fs conditions (notably an empty mounted
/host) so this class of failure is caught in CI instead of being discovered one layer at a time on real runners.Background: how the chroot
/hostis assembledbuildSystemMounts()(src/services/agent-volumes/system-mounts.ts:13-37) emits fixed read-only bind mounts for the chroot base system:On a normal runner the source paths (
/usr,/bin, …) are the runner's own glibc userland and everything works. On a split-fs ARC/DinD runner, gh-aw emits--docker-host-path-prefix /tmp/gh-aw, andtranslateBindMountHostPath()(src/services/host-path-prefix.ts) rewrites every source to the daemon-visible staging root:(Kernel VFS
/dev,/sys,/procand/dev/nullare correctly excluded from prefixing —host-path-prefix.ts:41-48.)The defect: nothing ever populates
/tmp/gh-aw/{usr,bin,lib,…}with a base userland. The mounts point at empty staged directories, so inside the chroot/host/bin/shand/host/usr/sbin/capshdo not exist. The entrypoint's chroot preflight (containers/agent/entrypoint.sh:681-704) then fails exactly as reported. The "musl/Alpine" wording in the warning is misleading: the reporter's daemon is Debian/glibc with both/bin/shandcapshpresent — the chroot simply enters an empty/host, and the generic warning blames musl because no dynamic loader is found at all.Why no existing AWF primitive fixes this
The thread (and our own
docs/arc-dind.md) points atdind.preStageDirsas the staging step. It does not populate the system tree.DEFAULT_PRE_STAGE_DIRS(src/dind-bootstrap.ts:11-19) onlymkdirs empty work dirs:stageEngineBinary()stages a single binary.runDindBootstrap()(src/dind-bootstrap.ts:103-127) returns early unlessconfig.dind.preStageDirs/stageEngineBinaryis set — and gh-aw does not emit those, so the resolved config showsenableDind=falseeven whendockerHostPathPrefixis set.Conclusion: there is no capability today that stages a base userland into the chroot. Any fix that just "emits
dind.preStageDirs" will produce empty system dirs and still fail. This is a missing capability, not a config-emission oversight.Security considerations (these must shape the fix)
The remaining blocker has two superficially attractive fixes that are security-regressive and should be rejected:
Bind the daemon's real
/bin,/usr,/libinto/host. This sources the chroot base userland — including the binaries that run before capability drop — from the runner/daemon filesystem, which on ARC is attacker-influenceable (a malicious or compromised DinD image, or anything that can write the shared/tmp/gh-awemptyDir, controls the code AWF executes as root pre-capsh). This is the exact trust boundary AWF deliberately moved away from in the iptables → network-isolation work: egress/identity enforcement must not depend on untrusted runner-side state. Trusting the daemon rootfs for the chroot base reintroduces that dependency at an even more sensitive point (pre-privilege-drop code execution).Copy the daemon's userland into the staging root at runtime. Same problem — provenance is the daemon image, not a verified AWF artifact.
Security-preserving direction: source the chroot base userland from AWF's own signed agent image (
ghcr.io/github/gh-aw-firewall/agent), which already ships a glibc base +bash+libcap2-bin(capsh) + the loader needed byone-shot-token.so. Two viable mechanisms, both keeping provenance inside AWF's trust boundary:entrypoint.sh, before chroot, detect an empty/foreign/hostand overlay the agent image's own/bin,/usr/sbin/capsh,/lib, loader, and a minimal busybox/coreutils set into/host(e.g. via a writable overlay assembled in/host/tmpandPATH/loader redirection). No daemon trust; the binaries come from the image AWF was built and signed as.dind-bootstrap.tswith a realstageBaseSystem()that runsDEFAULT_STAGING_IMAGE(alreadyghcr.io/github/gh-aw-firewall/agent:latest,src/dind-bootstrap.ts:8) to copy a curated base userland into the daemon-visible staging root before compose start. Provenance is the AWF image, but it crosses the daemon filesystem — so it must be paired with integrity checks (see below).Whichever mechanism is chosen, the following invariants must hold and be tested:
capshprivilege-drop must originate from the AWF-signed image, never from runner/daemon-writable paths.hidepid=2,/dev/nullcredential overlays,/etc/shadowexclusion) must remain intact when/hostis synthesized./host.Latent gaps queued behind the current blocker
The thread's recent progression table tracks only three layers (gateway → container start → chroot exec). Once
/bin/sh+capshare present, the originally-enumerated gaps will resurface in order. They should be designed for now, not rediscovered serially:capsh—engine.envHOME/USER/LOGNAMEwere historically clobbered to the pre-drop values; verifychroot.identity(now emitted by gh-aw) actually wins after the user switch.copilot/engine binary lands in the fix(chroot): mount binaries overlay at /host/tmp/awf-runner-bin to avoid read-only /host/usr collision on ARC/DinD #5482 overlay (/host/tmp/awf-runner-bin) and is onPATHinside the chroot for both the agent job and thesafe-outputs.threat-detectionjob./etc/passwd,/etc/group,/etc/hostssynthesis — AWF should synthesize minimal identity +host.docker.internalentries for the UID it switches to, without requiring workflow-levelsandbox.agent.mounts.spawn ENOENT), is marked successful becauseGH_AW_DETECTION_CONTINUE_ON_ERROR !== 'false'. A correctly-configured workflow then believes outputs were screened when the detector no-op'd. AWF/gh-aw must distinguish "engine never started" (fail loud) from "model produced unparseable output" (continue-on-error). This is the highest-severity latent gap.The meta-gap: no CI reproduces split-fs DinD
Every fix so far has advanced exactly one layer, then a new layer breaks weeks later on real runners — because no automated test reproduces an empty/foreign
/host. The existing chroot integration tests (tests/integration/chroot-*.test.ts) andsmoke-chrootall run on a normal runner where/hostis the runner's own populated glibc tree, so they never exercise the split-fs staging path. The reporter independently noted "why CI likely doesn't catch it" (the chroot patch is gated on atcp://localhostDOCKER_HOSTabsent on GitHub-hosted runners). Closing this meta-gap is arguably more valuable than any single layer fix.Related gap: pre-agent toolchain installs don't reach the chroot on ARC split-fs
A second architectural gap, distinct from the empty-
/hostbase-userland problem above and worth solving in the same effort. The base-userland fix gets the chroot a working/bin/sh+capsh; it does not get a build-test workflow's compilers and SDKs into the chroot.The mental model that breaks on ARC
For a build-test-style workflow the assumption is: pre-agent steps install packages/toolchains on the host, then the agent sees them via chroot
/host. This holds on a normal runner (one filesystem) but breaks on ARC/DinD (two filesystems), because the installs land on the wrong one.Normal runner — one filesystem:
apt-get install,setup-go/setup-node,npm i -g, tool caches) run in the runner shell, writing the runner's/usr,/opt/hostedtoolcache,$HOME, …/usr:/host/usr:ro,/opt:/host/opt:ro, etc. (src/services/agent-volumes/system-mounts.ts:13-24)./host= the runner's world → the agent sees everything pre-agent steps installed. ✅ARC/DinD — runner FS ≠ daemon FS:
apt/setup-*/tool caches land, same as a normal runner.DOCKER_HOST=tcp://…). Its bind-mount sources (/usr,/bin,/opt, …) are resolved by the daemon, against the daemon's filesystem — or, with--docker-host-path-prefix /tmp/gh-aw, against the shared/tmp/gh-awstaging dir./hostis assembled from the daemon's world, not the runner's. The toolchains the pre-agent steps installed on the runner are invisible to the agent.This is upstream Gap 4 (runner-installed
copilotnot visible in chroot) generalized to every package and toolchain a build-test workflow installs — and it compounds the empty-/hostproblem: on split-fs the daemon's/tmp/gh-aw/{usr,bin,lib}isn't even populated, so/hostis empty rather than "the daemon's toolchain."What actually crosses the split into the chroot on ARC
Only things on a path both containers can see, or baked/staged into the daemon side:
/tmp—${workspaceDir}:/host…:rwand/tmp:/host/tmp:rw(system-mounts.ts:23-24). In ARC these are typically the sharedgh-aw-tmpemptyDir, so writes there are visible.container.runnerToolCachePath(src/awf-config-schema.json:607,src/runner-tool-cache.ts) mounts/opt/hostedtoolcacheRO into the chroot. This knob exists specifically because the tool cache doesn't otherwise cross the split — but it only helps if that cache lives on a volume the daemon can also see./host/tmp/awf-runner-bin, a narrow path for staging a couple of CLIs into the daemon side.capshpre-installed.Implications for build-test on ARC
A workflow that installs toolchains in pre-agent steps won't expose them to the agent on ARC unless one of:
/tmp/gh-aw, or a shared tool-cache) instead of the runner's/usr//opt.Security note (same trust boundary as above)
Staging runner-side toolchains into the daemon-visible path is acceptable because the provenance is the workflow's own pre-agent steps mounted RO — but it must not become a vector for the agent (post privilege-drop UID) to write paths that earlier/other privileged steps then execute. Anything staged for the chroot must be RO at agent exec time, and this must not weaken the
/hostintegrity / fail-closed posture proposed for the base userland.Suggested scope
Generalize the
runnerToolCachePath+awf-runner-binoverlay into a first-class "stage runner toolchains into a daemon-visible chroot path" capability, withbuild-testas the motivating workflow.Proposed implementation plan
1. Add a
stageBaseSystem()capability sourced from the AWF-signed imageDEFAULT_STAGING_IMAGE(mechanism A self-bind preferred; B as fallback) insrc/dind-bootstrap.tsand/orcontainers/agent/entrypoint.sh.libc/libcap/libutil,/bin/sh(+bash),capsh, and the coreutils the entrypoint uses (mkdir,chmod,cat,head,tee,cp,tar).dockerHostPathPrefixis set (or an empty/hostis detected at entrypoint), run staging automatically. TodayenableDind=falseeven with the prefix set — close that half-configured state.2. Preserve security invariants
/hostis empty/foreign and a verified base cannot be staged.hidepid=2,/dev/nullcredential overlays, and/etc/shadowexclusion under the synthesized/host.3. Close the queued layers (design now)
chroot.identityHOME/USER/LOGNAME survivecapsh./host/tmp/awf-runner-bin) is onPATHfor agent and detection jobs./etc/passwd//etc/group//etc/hostsin chroot.4. Comprehensive ARC-simulation test suite (the core deliverable)
Add tests that reproduce split-fs DinD without needing a real ARC cluster:
/hostintegration test — start the agent with the system mounts pointed at a freshly-created empty staging dir (simulating/tmp/gh-aw/{usr,bin,lib}that was never populated). Assert: (a) without the fix, the chroot preflight fails with the documented/bin/sh/capsherror; (b) withstageBaseSystem(), the agent runs a trivial command to completion inside the chroot./hosttest — point the base mounts at an Alpine/musl rootfs (or a deliberately-incompatible loader) and assert AWF either stages its own glibc base and succeeds, or fails closed with the actionable diagnostic — never silently proceeds.translateBindMountHostPath()with--docker-host-path-prefixand assert the staged tree is what the chroot actually enters (no empty-dir passthrough), with kernel VFS still excluded.id/$HOME/$USER/$LOGNAMEfrom inside the chroot; assertchroot.identityvalues win post-capsh.copilotis discoverable onPATHfrom inside/host.safe-outputs.threat-detectionenabled in the simulated split-fs environment: a successful run produces a parseable result; a deliberately unstaged engine causes the detection job to fail, not silently pass.smoke-chroot-style job (or extend the existing one) that runs the empty-/hostand foreign-/hostscenarios on every PR, so this layer is permanently guarded.5. Stage runner toolchains into a daemon-visible chroot path (build-test on ARC)
runnerToolCachePath+ the/host/tmp/awf-runner-binoverlay into a first-class capability that stages workflow-installed toolchains (compilers, SDKs, tool caches) into a daemon-visible path the chroot mounts RO.6. Docs
docs/arc-dind.md/docs/chroot-mode.mdto describe base-userland staging, correct thedind.preStageDirsexpectation (it does not stage the system tree), and document the security model (image-sourced base, fail-closed behavior).Acceptance criteria
/host, the AWF agent chroots and runs a command to completion using an AWF-image-sourced base userland — with no dependency on the daemon's rootfs for pre-capshexecution./hostcauses a fail-closed error, never a silent chroot into untrusted state.safe-outputs.threat-detectionjob runs end-to-end in the simulated environment and fails loudly if the engine cannot be spawned./hostand foreign-/hostscenarios run in CI on every PR.References
binariesSourcePathoverlay relocated to/host/tmp/awf-runner-bin(AWF v0.27.10).src/services/agent-volumes/system-mounts.ts:13-37,src/services/host-path-prefix.ts:23-48,src/dind-bootstrap.ts:8-127,containers/agent/entrypoint.sh:681-704.docs/arc-dind.md,docs/chroot-mode.md.