Troubleshooting

Common failure modes across isolated environments, overlays, snapshots, diagnostics, and governed operations.

Use this page when an environment fails to provision, route, restore data, or pass governance checks.

First Triage Commands

microstax env get <env-id>
microstax env status <env-id>
microstax env logs <env-id>
microstax env traces <env-id>
microstax env diagnose <env-id>

If you need raw cluster detail:

kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl get ingress,svc,networkpolicy -n <namespace>

Provisioning Problems

Environment stuck in `provisioning`

Likely causes:

image pull failure
invalid probe configuration
missing secret or volume
snapshot restore job still running or failing

Check:

pod events for ErrImagePull or CrashLoopBackOff
init or restore jobs
Blueprint paths and resource settings

Environment enters `error`

Run:

microstax env diagnose <env-id>
microstax env logs <env-id>

Common causes:

bad image tag
invalid resource spec
missing dependency service
malformed snapshot or mock configuration

Routing And Overlay Problems

Overlay traffic is not hitting the overlay

Check:

the overlay was created with routing.mode: overlay
the request includes the correct header value, usually x-msx-env
overlayId matches the intended routing target
propagateHeader is enabled when downstream services must remain in overlay context

Overlay provisions too much or too little

Possible causes:

baseline mismatch
service names differ between baseline and overlay
provider mappings or inheritance assumptions are wrong

What to verify:

service names are stable across Blueprints
baselineId points to the intended parent environment
only changed services are present in the overlay Blueprint when using sparse workflows

Baseline promotion is blocked

This usually means governance policy rejected the action. Check:

microstax governance logs
microstax org compliance

Snapshot And Seed Problems

Snapshot restore fails

Check:

engine matches the actual datastore
sourceSecretRef points to a real secret and key
snapshot storage settings are correct
sanitization rules reference valid fields

Seed packages do not run as expected

Check:

the seed package exists in the registry
target service naming matches the environment
the environment is already healthy before seed execution

If both snapshot and seeds are configured, expect snapshot restore first and additive seeding after.

Mocking And Shadow Problems

Mock does not replace or mirror traffic correctly

Verify:

mock.enabled: true
the behavior mode is valid
deployment.mode is one of replace, sidecar, or mirror
the referenced OpenAPI or proto file exists

Behavioral diffs are empty or confusing

Check:

traffic is actually reaching the mirrored path
mirror percentage is non-zero
the compared baseline and shadow environments are the intended pair

API And UI Access Problems

CLI cannot reach the API

Check:

--api URL or MICROSTAX_API
API health endpoint
firewall, DNS, or local port assumptions

curl http://localhost:3001/health

Dashboard shows missing logs or stale status

Check:

API health
WebSocket or polling path availability
environment namespace still exists

When To Escalate

Escalate to platform operators when:

the cluster is healthy but multiple environments fail the same way
governance blocks an action you believe should be allowed
snapshot storage or sanitization policies are failing centrally
routing problems affect multiple overlays or baselines