Deploying in Production
Production Centaur is a Kubernetes deployment with durable API state in Postgres, sandbox pods for agent execution, and iron-proxy for credential injection. The goal is a small working deployment with a clear operator before you add more tools, workflows, harnesses, or overlays.
Production shape
The API saves threads, runs, and events in Postgres. The Kubernetes backend creates sandbox pods for agent work. iron-proxy handles outbound requests that need credentials:
Each pod receives the prompt files, environment, proxy CA, proxy settings, and command it needs for one assigned thread. It should not receive raw model keys or third-party API keys.
1. Choose the operating boundary
Before installing, decide:
| Question | Why it matters |
|---|---|
| Who is the operator? | Someone must own secrets, upgrades, incidents, and access reviews. |
| What Slack workspace and channels matter? | Defines the first user and permission boundary. |
| What repos should agents work on? | Determines GitHub token scope and repo cache needs. |
| What tools or data sources matter first? | Keeps setup focused on one useful loop. |
| What is sensitive? | Determines private channels, tool scopes, and review requirements. |
Good first deployments have one narrow engineering, research, support, security, data, or operations workflow where agents can call real tools.
2. Create the infra secret
The Helm chart reads infrastructure values from an existing Kubernetes Secret.
By default that Secret is named centaur-infra-env:
secretManager:
existingSecretName: centaur-infra-env
envPrefix: ""For local development, just bootstrap-secrets creates this Secret from your
shell environment. In production, create it through your normal secret delivery
path before installing the chart.
Minimum keys:
| Secret | Required for | Notes |
|---|---|---|
DATABASE_URL | API | Postgres connection string. |
IRON_MANAGEMENT_API_KEY | iron-proxy management API | Generate with openssl rand -hex 32. |
SANDBOX_SIGNING_KEY | Sandbox API tokens | Generate with openssl rand -hex 32; keeps sandbox tokens valid across API restarts. |
SLACK_BOT_TOKEN | Slackbot | Bot User OAuth Token from the Slack app. |
SLACK_SIGNING_SECRET | Slackbot/API | Used to verify Slack webhook signatures. |
SLACKBOT_API_KEY | Slackbot to API | Static service token; API bootstraps it into Postgres on startup with agent scope. |
OP_CONNECT_TOKEN | iron-proxy 1Password Connect source (preferred) | Needed when ironProxy.secretSource is onepassword-connect. |
OP_SERVICE_ACCOUNT_TOKEN | iron-proxy 1Password service-account source | Needed when ironProxy.secretSource is onepassword. |
OP_VAULT | iron-proxy 1Password source | Vault name or id used for op:// references (either mode). |
SLACKBOT_API_KEY is not created with the admin API during initial boot, because
the API process requires it before it can start. Generate a high-entropy value,
store it in the infra Secret, and reuse the same value in Slackbot.
3. Configure harness credentials
Store one secret per enabled harness credential:
| Harness | API value | Slack selector | Credential to store | Upstream |
|---|---|---|---|---|
| Codex default | codex | none or --codex | OPENAI_API_KEY | api.openai.com |
| Amp | amp | --amp | AMP_API_KEY | ampcode.com |
| Claude Code | claude-code | --claude | ANTHROPIC_API_KEY | api.anthropic.com |
| pi-mono | pi-mono | --pi | ANTHROPIC_API_KEY | api.anthropic.com |
In normal sandbox mode, containers receive placeholder values such as
OPENAI_API_KEY=OPENAI_API_KEY. iron-proxy swaps the
placeholder for the real key on outbound requests, only on the hosts and
headers the secret is bound to.
When ironProxy.secretSource is onepassword, iron-proxy resolves these values
from op://$OP_VAULT/<SECRET_NAME>/credential. For example, store the default
Codex credential in a 1Password item named OPENAI_API_KEY.
Whatever source you pick, the vault is shared across the whole deployment, so any thread can use any configured credential. Per-user and per-channel scoping is on the roadmap; until then, scope tool and harness access accordingly. See Security for the full threat model.
4. Configure Slack
Create the Slackbot app at api.slack.com/apps.
Use the app page to install the bot, copy the Bot User OAuth Token for
SLACK_BOT_TOKEN, and copy the Signing Secret for SLACK_SIGNING_SECRET.
- Add the bot scopes required by the Slackbot features you enable.
- Install the app to the workspace.
- Store the Bot User OAuth Token as
SLACK_BOT_TOKEN. - Store the app Signing Secret as
SLACK_SIGNING_SECRET. - Enable Event Subscriptions.
- Set the Request URL to
https://<your-host>/api/webhooks/slack. - Subscribe to
app_mentionand to the message events you want Centaur to see:message.channels,message.groups, andmessage.im.
The Slackbot currently normalizes Slack app_mention and message events.
Do not rely on assistant-specific Slack event types unless the Slackbot code has
explicit support for them.
Do not put Centaur API-key auth in front of /api/webhooks/slack; the Slackbot
validates Slack's signature and then calls the Centaur API separately.
The Slackbot accepts Slack events at /api/webhooks/slack. It also registers
compatibility paths for /api/slack/events, /api/slack/actions,
/api/slack/options, and /api/slack/commands.
5. Deploy with Helm
The chart lives at contrib/chart. Select service images, iron-proxy secret
source, sandbox image, and optional runtime class in your values file:
secretManager:
existingSecretName: centaur-infra-env
envPrefix: ""
api:
executionWorkerEnabled: true
warmPoolEnabled: true
ironProxy:
secretSource: onepassword-connect
secretTtl: 10m
onepasswordConnect:
connect:
create: true
credentialsName: centaur-onepassword-connect-credentials
credentialsKey: 1password-credentials.json
sandbox:
image:
repository: centaur-agent
tag: latest
pullPolicy: IfNotPresent
runtimeClassName: gvisorThe Kubernetes sandbox backend is the active runtime backend; there is no chart
switch named api.sandboxBackend.
Install or upgrade:
helm lint contrib/chart
helm upgrade --install centaur contrib/chart \
--namespace centaur-system \
--create-namespace \
-f values.production.yaml6. Verify the deployment
Check health from inside the API deployment first. Localhost is accepted for operator-only routes, so this avoids needing an external admin key for the first smoke check:
kubectl exec -n centaur-system deploy/centaur-centaur-api -- \
curl -fsS http://localhost:8000/health
kubectl exec -n centaur-system deploy/centaur-centaur-api -- \
curl -fsS http://localhost:8000/health/ready | jq
kubectl exec -n centaur-system deploy/centaur-centaur-api -- \
curl -fsS http://localhost:8000/health/tools | jqIf you need to call operator routes from outside the cluster, create an admin API key from inside the API deployment and save the returned plaintext key:
kubectl exec -n centaur-system deploy/centaur-centaur-api -- \
curl -fsS -X POST http://localhost:8000/admin/api-keys \
-H "Content-Type: application/json" \
-d '{"name":"operator","scopes":["admin"],"created_by":"ops"}' | jqExternal operator calls then use:
curl -s "$CENTAUR_API_URL/health/tools" \
-H "X-Api-Key: $ADMIN_KEY" | jqRun one agent turn from inside the API deployment:
THREAD_KEY=production-smoke-codex
SPAWN=$(kubectl exec -n centaur-system deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/spawn \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\"}")
ASSIGNMENT_GENERATION=$(printf '%s' "$SPAWN" | jq -r '.assignment_generation')
kubectl exec -n centaur-system deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/message \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"role\":\"user\",\"parts\":[{\"type\":\"text\",\"text\":\"Reply with exactly PONG.\"}]}"
EXECUTE=$(kubectl exec -n centaur-system deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/execute \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"delivery\":{\"platform\":\"dev\"}}")
EXECUTION_ID=$(printf '%s' "$EXECUTE" | jq -r '.execution_id')
kubectl exec -n centaur-system deploy/centaur-centaur-api -- curl -s \
"http://localhost:8000/agent/executions/${EXECUTION_ID}" | jqThen run the same prompt through Slack:
reply with exactly PONGSlack messages without a harness flag use Codex. Use --amp, --claude,
--codex, or --pi only when you want to select a specific harness.
Inspect sandbox pods with the labels Centaur actually sets:
kubectl get pods -n centaur-system -l centaur.ai/managed=trueIf a run fails because the sandbox pod exits or is deleted, inspect the durable execution before retrying:
kubectl exec -n centaur-system deploy/centaur-centaur-api -- curl -s \
"http://localhost:8000/agent/executions/${EXECUTION_ID}" | jq
kubectl logs -n centaur-system deploy/centaur-centaur-api --tail=200
kubectl get pods -n centaur-system -l centaur.ai/managed=trueCentaur preserves the execution row and event trail; retry by starting a new turn after you understand whether the failure was credentials, image pull, network policy, harness startup, or the upstream model/tool call.
7. Keep the operating loop small
Before expanding the deployment, record:
- The operator.
- Where secrets live.
- How to restart the stack.
- The first working Slack channel.
- The enabled harnesses.
- The first useful tool or workflow.
- How to inspect logs and failed runs.
The operator's job is to leave behind a repeatable operating loop, not a one-time demo.