Production checklist

Liveness: GET /health returns 200 when the server is listening. It does not verify model load.

Readiness: GET /v1/models includes a per-model status field (loaded, unloaded, reloading). Probe for the specific model id the caller needs, not just process liveness.

Scrape GET /metrics: Prometheus text format with per-request counters, latency histograms, in-flight request gauges, and request-body histograms. Details live in observability and the HTTP API reference.