Chat completions — POST /v1/chat/completions

The chat completions endpoint is the primary way to run inference on Darkbloom. It is fully compatible with the OpenAI Chat Completions API, so any OpenAI SDK works by setting the base URL to https://api.darkbloom.dev/v1. The coordinator routes your request to a hardware-attested Apple Silicon provider, encrypts the request body with the provider’s X25519 key before forwarding it, and returns the response through the standard OpenAI response shape. Every response includes provider attestation headers so you can independently verify the hardware that served your request. POST /v1/responses is an alias for the OpenAI Responses API. The coordinator auto-detects whether the body uses input (Responses format) or messages (Chat Completions format) and routes through the same handler.

Authentication

All inference endpoints require a Bearer token. Pass your API key in the Authorization header:

Authorization: Bearer eigeninference-...

Get your API key from the Darkbloom console.

Request

model

string

required

The model ID to use for this request. See Models for the list of available model IDs.Example: "qwen3.5-27b-claude-opus-8bit"

messages

object[]

required

The conversation history as an array of message objects. Each object must have a role and a content field.

role — "system", "user", or "assistant"
content — the message text (string)

stream

boolean

default:"false"

When true, the response is delivered as a stream of Server-Sent Events (SSE). Each event carries a data: field with a JSON chunk. The stream ends with data: [DONE].

max_tokens

integer

Maximum number of tokens to generate. If not set, the coordinator injects a default of 8192 to bound the worst-case cost. Set this explicitly if you need longer generations.

temperature

number

Sampling temperature between 0 and 2. Higher values make output more random; lower values make it more deterministic.

top_p

number

Nucleus sampling parameter. The model considers only the tokens comprising the top top_p probability mass.

stop

string | string[]

One or more sequences at which generation stops. The stop sequence itself is not included in the output.

seed

integer

Fixed random seed for reproducible outputs. Two requests with the same seed and parameters will produce identical results when served by the same provider.

Response (non-streaming)

string

Unique identifier for the request.

object

string

Always "chat.completion" for non-streaming responses.

choices

object[]

Array of completion choices. Standard requests return exactly one choice.

Show choice fields

choices[].message

object

The generated message.

Show message fields

choices[].message.role

string

Always "assistant".

choices[].message.content

string

The generated text.

choices[].finish_reason

string

Why generation stopped. "stop" means the model reached a natural end; "length" means max_tokens was reached.

usage

object

Token counts for the request.

Show usage fields

usage.prompt_tokens

integer

Tokens consumed by the input messages.

usage.completion_tokens

integer

Tokens generated in the response.

usage.total_tokens

integer

Sum of prompt and completion tokens.

Streaming

When stream: true, the response is a sequence of SSE events:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"},"finish_reason":null}]}

data: [DONE]

Just before [DONE], the coordinator may emit a final event carrying the provider’s Secure Enclave signature over the response hash:

{"choices":[],"se_signature":"<base64>","response_hash":"<hex>"}

Response headers

Every response (streaming and non-streaming) includes attestation headers that describe the provider that served the request.

Header	Description
`X-Provider-Attested`	`"true"` if the provider passed the most recent challenge-response attestation cycle
`X-Provider-Trust-Level`	`"self_signed"` or `"hardware"` — the provider’s verified trust level
`X-Provider-Chip`	Apple Silicon chip name, e.g. `"Apple M3 Max"`
`X-Provider-Id`	Internal provider identifier
`X-Provider-Model`	Mac hardware model identifier
`X-Provider-Serial`	Provider device serial number
`X-Provider-Secure-Enclave`	`"true"` if the provider has a Secure Enclave
`X-Provider-Mda-Verified`	`"true"` if Apple MDA hardware attestation was verified
`X-Attestation-Se-Public-Key`	Provider’s Secure Enclave P-256 public key (base64)
`X-Attestation-Device-Serial`	Device serial matching the attestation record
`X-Inference-Job-ID`	Job UUID for this request, useful for correlating with usage records (streaming only)

Examples

curl https://api.darkbloom.dev/v1/chat/completions \
  -H "Authorization: Bearer eigeninference-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-27b-claude-opus-8bit",
    "messages": [
      {"role": "user", "content": "Explain what a Merkle tree is."}
    ],
    "max_tokens": 512
  }'

Example response

{
  "id": "a3f1c2e7-84bb-4d9a-91f3-2b7e6d4a0c18",
  "object": "chat.completion",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "A Merkle tree is a hash-based data structure where every leaf node contains the hash of a data block, and every non-leaf node contains the hash of its child nodes..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 87,
    "total_tokens": 101
  }
}

Inference

Account

Network

Chat completions — POST /v1/chat/completions

Authentication

Request

Response (non-streaming)

Streaming

Response headers

Examples

Example response

Inference

Account

Network

Documentation Index

​Authentication

​Request

​Response (non-streaming)

​Streaming

​Response headers

​Examples

​Example response

Authentication

Request

Response (non-streaming)

Streaming

Response headers

Examples

Example response