Sandboxed Agent Tasks
Spawn isolated Firecracker VMs with a Pi coding agent that has authenticated CLI access to the platform. The agent can do anything the caller's API key permits — scoped, metered, auto-killed.
When to Use
- An agent needs to do multi-step work — instead of chaining API calls, spawn a task with a natural language instruction and let the sub-agent figure it out.
- You need isolated compute — code execution, data processing, file generation — anything that shouldn't run in your serverless function.
- Delegating to a specialist — the caller agent knows what needs to happen but not how. The sub-agent has the CLI tools to execute.
- Background work — fire-and-forget via webhook notification on completion.
API
Create and run a task
# Via CLI (once obs command exists)
seed tasks run --instruction "Check our API key inventory and revoke any expired ones" --timeout 60000
# Via REST
curl -X POST https://your-app.com/api/v1/tasks/run \
-H "Authorization: Bearer sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"instruction": "Check our API key inventory and revoke any expired ones",
"scopes": ["keys:read", "keys:write"],
"timeout": 60000
}'
# Via MCP (Claude Desktop, Cursor, etc.)
# The runTask tool is automatically available
Check task status
curl https://your-app.com/api/v1/tasks/{taskId} \
-H "Authorization: Bearer sk_live_..."
Stop a running task
curl -X POST https://your-app.com/api/v1/tasks/{taskId}/stop \
-H "Authorization: Bearer sk_live_..."
Response Shape
{
"task": {
"id": "56bcba9b-...",
"status": "completed",
"stdout": "Found 9 keys. 5 already revoked. Revoked 2 expired keys: task:aa3517b8 (expired 10:09), task:28f48ac5 (expired 10:08). 2 active keys remain.",
"stderr": null,
"exitCode": 0,
"inputTokens": 847,
"outputTokens": 156,
"createdAt": "2026-03-25T10:15:00.000Z",
"completedAt": "2026-03-25T10:15:12.000Z"
}
}
Scoping Tasks
The scopes parameter controls what the sub-agent can do. It's intersected with the caller's scopes — you can only grant permissions you have.
// Agent can only read observability data — can't create keys, can't write anything
{ "scopes": ["obs:read"] }
// Agent can manage keys and view observability
{ "scopes": ["keys:read", "keys:write", "obs:read"] }
// Omit scopes to inherit all of the caller's permissions
{}
Industry Examples
SaaS / Developer Tools
Instruction: "Audit our API key usage. List all keys, check which ones haven't been used in 30 days, and generate a cleanup recommendation."
Scopes: ["keys:read", "obs:read"]
What happens: The agent runs seed keys list, cross-references with observability data, and returns a summary of stale keys with recommendations.
CRM (Customer Relationship Management)
Instruction: "Look up the contact record for acme-corp, check their subscription status, and draft a renewal email if their plan expires within 14 days."
Scopes: ["contacts:read", "subscriptions:read", "emails:write"]
What happens: The agent uses domain-specific CLI commands (seed contacts get acme-corp, seed subscriptions status acme-corp), evaluates the expiry, and calls seed emails draft if needed.
E-Commerce / Inventory
Instruction: "Check inventory levels for all products in the 'electronics' category. Flag anything below reorder threshold and create purchase orders for the top 5 most critical items."
Scopes: ["inventory:read", "purchase-orders:write"]
What happens: The agent queries inventory via CLI, applies business logic, and creates purchase orders — all within the scope boundary. It can't touch pricing, users, or billing.
DevOps / Infrastructure
Instruction: "Check the health of all our deployment environments. If any are failing health checks, gather the last 50 log lines and create an incident report."
Scopes: ["deployments:read", "logs:read", "incidents:write"]
What happens: The agent iterates through environments, checks health endpoints, pulls logs for unhealthy ones, and creates structured incident reports.
Finance / Compliance
Instruction: "Generate a monthly usage report for all API consumers. Calculate per-consumer costs based on their token usage and output a CSV summary."
Scopes: ["obs:read", "keys:read"]
What happens: The agent pulls observability data, groups by API key, calculates costs using the configured rate, and outputs a formatted report.
Content / Marketing
Instruction: "Review our webhook event history from the last 7 days. Identify the most active event types and generate a weekly platform activity summary for the team."
Scopes: ["webhooks:read", "obs:read"]
What happens: The agent queries webhook events and observability data, analyzes patterns, and produces a human-readable summary.
Healthcare / Data Processing
Instruction: "Process the batch of patient intake forms uploaded today. Validate required fields, flag incomplete records, and generate a compliance summary."
Scopes: ["records:read", "records:write", "compliance:write"]
What happens: The agent processes records via CLI, validates against rules, and generates the compliance report — all in an isolated VM with no network access to external systems.
Implementation Notes
For forkers: adding domain-specific tasks
- Define your CLI commands —
seed contacts list,seed inventory check, etc. - Define your scopes —
contacts:read,inventory:write, etc. - That's it. The task system discovers CLI commands automatically. The agent uses whatever tools are available.
You don't need to write task-specific code. The agent runtime (Pi) figures out which CLI commands to call based on the natural language instruction. Your job is to make the CLI commands comprehensive and the scope boundaries correct.
Snapshot management
The snapshot (SANDBOX_SNAPSHOT_ID) contains Pi and its dependencies. It does NOT contain the CLI — the CLI is downloaded fresh on every task run so it's always the latest version.
To recreate the snapshot (e.g., after upgrading Pi or changing the default model):
npx tsx scripts/setup-sandbox-snapshot.ts
# Then update SANDBOX_SNAPSHOT_ID in Vercel env vars
Cost model
- Sandbox compute: ~$0.01-0.03 per 5-min task (CPU + memory)
- LLM tokens: $0.00 with OpenRouter free models
- CLI install: ~2-3 seconds per task (cached in future snapshot versions)
- Pro plan $20/mo credit covers ~650 five-minute tasks or ~2000 quick tasks
Webhook events
Subscribe to task lifecycle events in the /webhooks dashboard:
task.completed— task finished successfully, includes exit code and token countstask.failed— task errored, includes error message
Timeout guidelines
| Task type | Recommended timeout |
|---|---|
| Simple query / lookup | 30,000 ms (30s) |
| Multi-step workflow | 60,000 ms (1 min) |
| Data processing / generation | 180,000 ms (3 min) |
| Complex multi-tool agent work | 300,000 ms (5 min) |
| Maximum allowed | 600,000 ms (10 min) |