Skip to content

Persist agent sessions

Agentic requests on the HTTP server are stateful. State is keyed by session id with LRU eviction at 128 entries and a 30-minute idle TTL.

Two cases:

  • Explicit session_id on the request: the server looks it up. Existing session continues; missing id creates a new one.
  • No session_id: a new id is created and returned in the response.
Terminal window
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [{"role": "user", "content": "What is 2 to the 10th?"}],
"session_id": "user-42-chat-abc"
}'

The response body includes a top-level session_id field.

Terminal window
curl http://localhost:1234/v1/sessions/user-42-chat-abc

Returns 404 if the session does not exist.

Terminal window
curl -X PUT http://localhost:1234/v1/sessions/user-42-chat-abc \
-H "Content-Type: application/json" \
-d @saved-session.json

Body is a SerializedSession produced by a previous GET. Replaces any existing session with the same id.

Terminal window
curl -X DELETE http://localhost:1234/v1/sessions/user-42-chat-abc

Always returns 200 regardless of session existence.

  • Idle expiry: 30 minutes of inactivity.
  • Capacity: 128-session cap with LRU eviction.
  • Server restart: full loss.

Sessions with code execution hold a Python subprocess. The subprocess is not part of the exportable state. After import on another server, the new server starts a fresh subprocess on the next code execution call.

Both SDKs expose the same session operations as the HTTP endpoints: export_session, import_session, delete_session, list_session_ids.

  • Rust: Model. Request-level session_id is set via RequestBuilder::with_session_id.
  • Python: Runner. Request-level session_id is set via the session_id keyword on ChatCompletionRequest.