Troubleshooting
Common failure modes across isolated environments, overlays, snapshots, diagnostics, and governed operations.
Use this page when an environment fails to provision, route, restore data, or pass governance checks.
First Triage Commands
microstax env get <env-id>
microstax env status <env-id>
microstax env logs <env-id>
microstax env traces <env-id>
microstax env diagnose <env-id>
If you need raw cluster detail:
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl get ingress,svc,networkpolicy -n <namespace>
Provisioning Problems
Environment stuck in provisioning
Likely causes:
- image pull failure
- invalid probe configuration
- missing secret or volume
- snapshot restore job still running or failing
Check:
- pod events for
ErrImagePullorCrashLoopBackOff - init or restore jobs
- Blueprint paths and resource settings
Environment enters error
Run:
microstax env diagnose <env-id>
microstax env logs <env-id>
Common causes:
- bad image tag
- invalid resource spec
- missing dependency service
- malformed snapshot or mock configuration
Routing And Overlay Problems
Overlay traffic is not hitting the overlay
Check:
- the overlay was created with
routing.mode: overlay - the request includes the correct header value, usually
x-msx-env overlayIdmatches the intended routing targetpropagateHeaderis enabled when downstream services must remain in overlay context
Overlay provisions too much or too little
Possible causes:
- baseline mismatch
- service names differ between baseline and overlay
- provider mappings or inheritance assumptions are wrong
What to verify:
- service names are stable across Blueprints
baselineIdpoints to the intended parent environment- only changed services are present in the overlay Blueprint when using sparse workflows
Baseline promotion is blocked
This usually means governance policy rejected the action. Check:
microstax governance logs
microstax org compliance
Snapshot And Seed Problems
Snapshot restore fails
Check:
enginematches the actual datastoresourceSecretRefpoints to a real secret and key- snapshot storage settings are correct
- sanitization rules reference valid fields
Seed packages do not run as expected
Check:
- the seed package exists in the registry
- target service naming matches the environment
- the environment is already healthy before seed execution
If both snapshot and seeds are configured, expect snapshot restore first and additive seeding after.
Mocking And Shadow Problems
Mock does not replace or mirror traffic correctly
Verify:
mock.enabled: true- the behavior
modeis valid deployment.modeis one ofreplace,sidecar, ormirror- the referenced OpenAPI or proto file exists
Behavioral diffs are empty or confusing
Check:
- traffic is actually reaching the mirrored path
- mirror percentage is non-zero
- the compared baseline and shadow environments are the intended pair
API And UI Access Problems
CLI cannot reach the API
Check:
--apiURL orMICROSTAX_API- API health endpoint
- firewall, DNS, or local port assumptions
curl http://localhost:3001/health
Dashboard shows missing logs or stale status
Check:
- API health
- WebSocket or polling path availability
- environment namespace still exists
When To Escalate
Escalate to platform operators when:
- the cluster is healthy but multiple environments fail the same way
- governance blocks an action you believe should be allowed
- snapshot storage or sanitization policies are failing centrally
- routing problems affect multiple overlays or baselines