Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.darkbloom.dev/llms.txt

Use this file to discover all available pages before exploring further.

Darkbloom exposes an OpenAI-compatible API at https://api.darkbloom.dev/v1. If you’ve used the OpenAI SDK before, the only change is the base URL and your API key — everything else works the same way. This guide walks you through getting a key, making your first request, and exploring what’s available.
1

Get an API key

Go to darkbloom.dev and sign up for an account. Once you’re logged in, open the console and create an API key. Your key will start with eigeninference-.
Store your API key somewhere safe. You won’t be able to view it again after closing the creation dialog.
2

Send your first chat completion

Point any OpenAI-compatible SDK at https://api.darkbloom.dev/v1 and pass your API key. The request format is identical to the OpenAI Chat Completions API.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.darkbloom.dev/v1",
    api_key="eigeninference-..."  # replace with your key
)

response = client.chat.completions.create(
    model="qwen3.5-27b-claude-opus-8bit",
    messages=[
        {"role": "user", "content": "Explain the difference between a mutex and a semaphore."}
    ]
)

print(response.choices[0].message.content)
qwen3.5-27b-claude-opus-8bit is a good general-purpose starting point — it’s a frontier-quality reasoning model distilled from Claude Opus. See available models below for the full catalog.
3

List available models

To see which models are currently online and accepting requests, call GET /v1/models.
models = client.models.list()

for model in models.data:
    print(model.id)
The response lists only models that have at least one provider currently online. If a model isn’t in the list, no provider is serving it right now.
Model IDBest for
qwen3.5-27b-claude-opus-8bitFrontier reasoning, Claude Opus distilled
mlx-community/gemma-4-26b-a4b-it-8bitFast multimodal requests
mlx-community/Trinity-Mini-8bitFast agentic inference
mlx-community/Qwen3.5-122B-A10B-8bitHighest quality reasoning
mlx-community/MiniMax-M2.5-8bitState-of-the-art coding, ~100 tok/s
4

Stream a response

Add stream=True (Python) or stream: true (Node.js) to receive tokens as they’re generated instead of waiting for the full response.
stream = client.chat.completions.create(
    model="qwen3.5-27b-claude-opus-8bit",
    messages=[
        {"role": "user", "content": "Write a haiku about distributed systems."}
    ],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
The streaming response uses server-sent events, the same format as the OpenAI API. Each line is prefixed with data: and the stream ends with data: [DONE].

Using the Anthropic Messages API

Darkbloom also supports the Anthropic Messages API format at /v1/messages. Use this if your code is already written for the Anthropic SDK or if you prefer the system parameter as a top-level field.
curl https://api.darkbloom.dev/v1/messages \
  -H "Authorization: Bearer eigeninference-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-27b-claude-opus-8bit",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'
Darkbloom is an experimental research prototype. Do not use it in production applications.