Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.darkbloom.dev/llms.txt

Use this file to discover all available pages before exploring further.

POST /v1/chat/completions is the primary inference endpoint. It is fully compatible with the OpenAI Chat Completions API, so you can point any OpenAI SDK at Darkbloom by changing the base URL — no other code changes required. The endpoint supports both streaming (server-sent events) and non-streaming responses.

Request parameters

model
string
required
The model ID to use for this request. See Models for the full list of available IDs.
messages
object[]
required
The conversation history as an array of message objects. Each message has a role (system, user, or assistant) and a content string.
stream
boolean
default:"false"
When true, the response is delivered as a stream of server-sent events. Each event contains a delta chunk in the standard OpenAI SSE format, ending with data: [DONE].
max_tokens
number
default:"8192"
Maximum number of tokens to generate. If you do not set this, the coordinator injects a default of 8192 to bound the worst-case cost and ensure the pre-flight balance reservation covers the full generation.
temperature
number
default:"1.0"
Sampling temperature between 0 and 2. Lower values produce more deterministic output.

Examples

Basic request

curl https://api.darkbloom.dev/v1/chat/completions \
  -H "Authorization: Bearer eigeninference-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-27b-claude-opus-8bit",
    "messages": [
      {"role": "user", "content": "Explain gradient descent in one paragraph."}
    ]
  }'

Streaming

curl https://api.darkbloom.dev/v1/chat/completions \
  -H "Authorization: Bearer eigeninference-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-27b-claude-opus-8bit",
    "messages": [
      {"role": "user", "content": "Write a haiku about distributed systems."}
    ],
    "stream": true
  }'

With a system prompt

curl https://api.darkbloom.dev/v1/chat/completions \
  -H "Authorization: Bearer eigeninference-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen3.5-122B-A10B-8bit",
    "messages": [
      {"role": "system", "content": "You are a concise technical assistant. Answer in bullet points only."},
      {"role": "user", "content": "What are the trade-offs of using SSE vs WebSockets?"}
    ],
    "max_tokens": 512,
    "temperature": 0.3
  }'

Response headers

Every response from the inference endpoint includes trust metadata headers that let you verify which provider handled the request and inspect their attestation status.
HeaderDescription
x-provider-attestedtrue if the provider passed the most recent attestation challenge
x-provider-trust-levelself_signed or hardware — see Trust levels
x-provider-chipApple Silicon chip model (e.g. M3 Ultra)
x-provider-serialProvider machine serial number
x-se-signatureSecure Enclave signature over the response hash
x-response-hashSHA-256 hash of the response body, signed by the Secure Enclave
You can use x-provider-serial to pin requests to a specific provider machine you have independently verified. Pass it as provider_serial in the request body and the coordinator will route only to that machine.
POST /v1/responses — OpenAI Responses API alias. The coordinator auto-detects the input vs messages field shape and routes to the same underlying handler. POST /v1/completions — Legacy text completions endpoint. Accepts a prompt string instead of a messages array. This is the older OpenAI /v1/completions format, not recommended for new integrations. POST /v1/messages — Anthropic Messages API compatible endpoint. Use this if you are working with the Anthropic Python or TypeScript SDK. See the note below.
To use the Anthropic SDK, set base_url="https://api.darkbloom.dev" (without /v1) and pass your Darkbloom API key as api_key. The Anthropic SDK appends /v1/messages automatically.

Thinking tags

Some models emit extended thinking wrapped in <think>...</think> tags. These tags are stripped from responses by default before the content reaches you. The final content field contains only the model’s visible output.