Sandboxed Agent Tasks

Spawn isolated Firecracker VMs with a Pi coding agent that has authenticated CLI access to the platform. The agent can do anything the caller's API key permits — scoped, metered, auto-killed.

When to Use

An agent needs to do multi-step work — instead of chaining API calls, spawn a task with a natural language instruction and let the sub-agent figure it out.
You need isolated compute — code execution, data processing, file generation — anything that shouldn't run in your serverless function.
Delegating to a specialist — the caller agent knows what needs to happen but not how. The sub-agent has the CLI tools to execute.
Background work — fire-and-forget via webhook notification on completion.

API

Create and run a task

# Via CLI (once obs command exists)
seed tasks run --instruction "Check our API key inventory and revoke any expired ones" --timeout 60000

# Via REST
curl -X POST https://your-app.com/api/v1/tasks/run \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "instruction": "Check our API key inventory and revoke any expired ones",
    "scopes": ["keys:read", "keys:write"],
    "timeout": 60000
  }'

# Via MCP (Claude Desktop, Cursor, etc.)
# The runTask tool is automatically available

Check task status

curl https://your-app.com/api/v1/tasks/{taskId} \
  -H "Authorization: Bearer sk_live_..."

Stop a running task

curl -X POST https://your-app.com/api/v1/tasks/{taskId}/stop \
  -H "Authorization: Bearer sk_live_..."

Response Shape

{
  "task": {
    "id": "56bcba9b-...",
    "status": "completed",
    "stdout": "Found 9 keys. 5 already revoked. Revoked 2 expired keys: task:aa3517b8 (expired 10:09), task:28f48ac5 (expired 10:08). 2 active keys remain.",
    "stderr": null,
    "exitCode": 0,
    "inputTokens": 847,
    "outputTokens": 156,
    "createdAt": "2026-03-25T10:15:00.000Z",
    "completedAt": "2026-03-25T10:15:12.000Z"
  }
}

Scoping Tasks

The scopes parameter controls what the sub-agent can do. It's intersected with the caller's scopes — you can only grant permissions you have.

// Agent can only read observability data — can't create keys, can't write anything
{ "scopes": ["obs:read"] }

// Agent can manage keys and view observability
{ "scopes": ["keys:read", "keys:write", "obs:read"] }

// Omit scopes to inherit all of the caller's permissions
{}

Industry Examples

SaaS / Developer Tools

Instruction: "Audit our API key usage. List all keys, check which ones haven't been used in 30 days, and generate a cleanup recommendation."

Scopes: ["keys:read", "obs:read"]

What happens: The agent runs seed keys list, cross-references with observability data, and returns a summary of stale keys with recommendations.

CRM (Customer Relationship Management)

Instruction: "Look up the contact record for acme-corp, check their subscription status, and draft a renewal email if their plan expires within 14 days."

Scopes: ["contacts:read", "subscriptions:read", "emails:write"]

What happens: The agent uses domain-specific CLI commands (seed contacts get acme-corp, seed subscriptions status acme-corp), evaluates the expiry, and calls seed emails draft if needed.

E-Commerce / Inventory

Instruction: "Check inventory levels for all products in the 'electronics' category. Flag anything below reorder threshold and create purchase orders for the top 5 most critical items."

Scopes: ["inventory:read", "purchase-orders:write"]

What happens: The agent queries inventory via CLI, applies business logic, and creates purchase orders — all within the scope boundary. It can't touch pricing, users, or billing.

DevOps / Infrastructure

Instruction: "Check the health of all our deployment environments. If any are failing health checks, gather the last 50 log lines and create an incident report."

Scopes: ["deployments:read", "logs:read", "incidents:write"]

What happens: The agent iterates through environments, checks health endpoints, pulls logs for unhealthy ones, and creates structured incident reports.

Finance / Compliance

Instruction: "Generate a monthly usage report for all API consumers. Calculate per-consumer costs based on their token usage and output a CSV summary."

Scopes: ["obs:read", "keys:read"]

What happens: The agent pulls observability data, groups by API key, calculates costs using the configured rate, and outputs a formatted report.

Content / Marketing

Instruction: "Review our webhook event history from the last 7 days. Identify the most active event types and generate a weekly platform activity summary for the team."

Scopes: ["webhooks:read", "obs:read"]

What happens: The agent queries webhook events and observability data, analyzes patterns, and produces a human-readable summary.

Healthcare / Data Processing

Instruction: "Process the batch of patient intake forms uploaded today. Validate required fields, flag incomplete records, and generate a compliance summary."

Scopes: ["records:read", "records:write", "compliance:write"]

What happens: The agent processes records via CLI, validates against rules, and generates the compliance report — all in an isolated VM with no network access to external systems.

Implementation Notes

For forkers: adding domain-specific tasks

Define your CLI commands — seed contacts list, seed inventory check, etc.
Define your scopes — contacts:read, inventory:write, etc.
That's it. The task system discovers CLI commands automatically. The agent uses whatever tools are available.

You don't need to write task-specific code. The agent runtime (Pi) figures out which CLI commands to call based on the natural language instruction. Your job is to make the CLI commands comprehensive and the scope boundaries correct.

Snapshot management

The snapshot (SANDBOX_SNAPSHOT_ID) contains Pi and its dependencies. It does NOT contain the CLI — the CLI is downloaded fresh on every task run so it's always the latest version.

To recreate the snapshot (e.g., after upgrading Pi or changing the default model):

npx tsx scripts/setup-sandbox-snapshot.ts
# Then update SANDBOX_SNAPSHOT_ID in Vercel env vars

Cost model

Sandbox compute: ~$0.01-0.03 per 5-min task (CPU + memory)
LLM tokens: $0.00 with OpenRouter free models
CLI install: ~2-3 seconds per task (cached in future snapshot versions)
Pro plan $20/mo credit covers ~650 five-minute tasks or ~2000 quick tasks

Webhook events

Subscribe to task lifecycle events in the /webhooks dashboard:

task.completed — task finished successfully, includes exit code and token counts
task.failed — task errored, includes error message

Timeout guidelines

Task type	Recommended timeout
Simple query / lookup	30,000 ms (30s)
Multi-step workflow	60,000 ms (1 min)
Data processing / generation	180,000 ms (3 min)
Complex multi-tool agent work	300,000 ms (5 min)
Maximum allowed	600,000 ms (10 min)