Skip to the content.

Auditing a project: an agent’s runbook

You arrived here from an evidence bundle. The bundle gave you provenance — the source running on a remote CVM matches a specific GitHub commit. Your job now is to evaluate whether that source actually does what the project claims, against an adversary that can control inputs and operate the host.

This page is structured to be runnable by an agent. Each section is a concrete step. The intent is that a competent coding agent with shell access can complete a useful first-pass audit by following it.

What you are and aren’t auditing

A project on a tee-daemon instance has two layers of code:

  1. The trusted base. The dstack-webhost daemon, the shared language runtime (e.g. Deno) that loads the project, and the dstack platform underneath. This is the substrate — assume it does what its repo and platform docs claim, or audit it separately. It’s the same base for every project. You don’t redo this audit per project.
  2. The project under review. The handler the project author wrote, plus whatever environment knobs they declared in their manifest. This is what you audit.

The point of the split: the project author can write small, custom code without forcing you to re-audit web servers, container runtimes, or the TEE platform. The base does the heavy lifting; the project is small enough to read.

For the substrate’s contract, see this repo: github.com/amiller/dstack-webhost. In particular proxy/runtimes.py defines the deno router that loads project handlers, and proxy/templates/ has the env vars handlers receive. These are the inputs your project’s handler can rely on; everything outside them is project-specific code you must read.

Setting up a local replica

The same daemon image that runs the production CVM runs locally. A local replica gives you what production deliberately doesn’t: filesystem access to the runtime container, control over time sources and storage, and live logs.

# Clone the substrate
git clone https://github.com/amiller/dstack-webhost
cd dstack-webhost

# Start the daemon. This pulls the deno runtime image on first run.
docker compose up -d

# Wait until the daemon answers
until curl -sf http://localhost:8080/_api/projects -H "Authorization: Bearer $TOKEN" >/dev/null; do sleep 1; done

# Deploy the project under review from its public source.
# Use the same commit_sha and ref recorded in the evidence bundle.
curl -X POST http://localhost:8080/_api/projects \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "<project-name>",
    "source": "<source-url-from-bundle>",
    "ref": "<commit-sha-or-branch-from-bundle>",
    "mode": "dev"
  }'

The project is now reachable at http://localhost:8080/<project-name>/. You don’t promote it locally — dev mode is fine for audit work, and avoids burning a TEE quote.

The container running the project is tee-runtime-deno-dev. You can docker exec into it, tail its logs, and modify files in its volume between requests. None of this is possible in production; the local replica gives it to you for free.

What the trusted base provides

What the project handler can rely on, by contract:

These are the building blocks. Anything beyond them is project code you must read.

Reading the source

Open the source URL pinned in the bundle. Don’t read top-to-bottom; read with a question.

The shape of the question, for any project:

  1. What is the trust claim? Find it in the README or top-of-file comments. If it’s not stated, that’s the first audit finding.
  2. What invariants would have to hold for that claim to be true? Express them as testable statements (“the key is not returned before releaseTime”, “the receipt’s prompt field equals the prompt actually sent to the model”).
  3. What inputs and state can affect those invariants? Trace from the handler signature outward: request body, env vars, dataDir, network calls, time sources.
  4. For each invariant, can the inputs or state break it? Read the code paths that touch each one.

Probing the local replica

Reading is necessary but not sufficient. Prepare adversarial inputs and run them against the local replica. Examples of probes that generalize:

Each probe maps to an invariant. Document which invariant you tested, what you tried, and whether the project upheld it.

Project-specific runbook: timelock

timelock’s trust claim: a sealed message’s key is not returned until a release time the user specified.

The invariants you’d test:

A first-pass agent could implement each probe as a shell script under audits/timelock/. The output of each probe (request, response, latency, side-effects) is itself audit evidence.

What an audit report should contain

The deliverable is a structured artifact the original requester (and their counterparties) can inspect.

Sign it (PGP, sigstore, whatever convention the requester uses) and post it where the requester can find it. The bundle plus a signed audit report is the actual artifact a counterparty acts on.

How this fits the larger flow

A project author writes a small handler. They deploy it on a CVM and promote it. They publish the evidence bundle URL. A relying party who needs the output to mean something forwards the URL — to themselves if they can read code, or to an auditor (human or agent) if they can’t. The auditor follows this runbook, produces a report, and the relying party reads the report instead of the source.

The substrate’s job is to make the auditor’s job small and repeatable. The project’s job is to be small enough that “read it and probe it” is genuinely possible. This page is the contract between those two halves.