The chat completions endpoint is the primary way to run inference on Darkbloom. It is fully compatible with the OpenAI Chat Completions API, so any OpenAI SDK works by setting the base URL toDocumentation Index
Fetch the complete documentation index at: https://docs.darkbloom.dev/llms.txt
Use this file to discover all available pages before exploring further.
https://api.darkbloom.dev/v1. The coordinator routes your request to a hardware-attested Apple Silicon provider, encrypts the request body with the provider’s X25519 key before forwarding it, and returns the response through the standard OpenAI response shape.
Every response includes provider attestation headers so you can independently verify the hardware that served your request.
POST /v1/responses is an alias for the OpenAI Responses API. The coordinator auto-detects whether the body uses input (Responses format) or messages (Chat Completions format) and routes through the same handler.
Authentication
All inference endpoints require a Bearer token. Pass your API key in theAuthorization header:
Request
The model ID to use for this request. See Models for the list of available model IDs.Example:
"qwen3.5-27b-claude-opus-8bit"The conversation history as an array of message objects. Each object must have a
role and a content field.role—"system","user", or"assistant"content— the message text (string)
When
true, the response is delivered as a stream of Server-Sent Events (SSE). Each event carries a data: field with a JSON chunk. The stream ends with data: [DONE].Maximum number of tokens to generate. If not set, the coordinator injects a default of 8192 to bound the worst-case cost. Set this explicitly if you need longer generations.
Sampling temperature between 0 and 2. Higher values make output more random; lower values make it more deterministic.
Nucleus sampling parameter. The model considers only the tokens comprising the top
top_p probability mass.One or more sequences at which generation stops. The stop sequence itself is not included in the output.
Fixed random seed for reproducible outputs. Two requests with the same seed and parameters will produce identical results when served by the same provider.
Response (non-streaming)
Unique identifier for the request.
Always
"chat.completion" for non-streaming responses.Array of completion choices. Standard requests return exactly one choice.
Token counts for the request.
Streaming
Whenstream: true, the response is a sequence of SSE events:
[DONE], the coordinator may emit a final event carrying the provider’s Secure Enclave signature over the response hash:
Response headers
Every response (streaming and non-streaming) includes attestation headers that describe the provider that served the request.| Header | Description |
|---|---|
X-Provider-Attested | "true" if the provider passed the most recent challenge-response attestation cycle |
X-Provider-Trust-Level | "self_signed" or "hardware" — the provider’s verified trust level |
X-Provider-Chip | Apple Silicon chip name, e.g. "Apple M3 Max" |
X-Provider-Id | Internal provider identifier |
X-Provider-Model | Mac hardware model identifier |
X-Provider-Serial | Provider device serial number |
X-Provider-Secure-Enclave | "true" if the provider has a Secure Enclave |
X-Provider-Mda-Verified | "true" if Apple MDA hardware attestation was verified |
X-Attestation-Se-Public-Key | Provider’s Secure Enclave P-256 public key (base64) |
X-Attestation-Device-Serial | Device serial matching the attestation record |
X-Inference-Job-ID | Job UUID for this request, useful for correlating with usage records (streaming only) |