Browse and select Darkbloom inference models

GET /v1/models returns the list of models currently available on the Darkbloom network. The response follows the OpenAI models list format, with additional fields for trust level, provider count, and hardware metadata. All listed models have been verified by the coordinator and have at least one active provider online.

Request

curl

curl https://api.darkbloom.dev/v1/models \
  -H "Authorization: Bearer eigeninference-your-key-here"

Response format

object

string

Always "list".

data

object[]

Array of model objects.

Show model object fields

string

The model ID to pass in inference requests.

object

string

Always "model".

owned_by

string

The organization that released the model weights.

provider_count

number

Number of active providers currently serving this model.

trust_level

string

Highest trust level among active providers: self_signed or hardware.

attested

boolean

true if at least one provider serving this model has passed recent attestation.

display_name

string

Human-readable model name for display purposes.

Available models

The catalog currently includes the following models. All are quantized to 8-bit for efficient Apple Silicon inference.

Qwen3.5 27B Claude Opus 8-bit

Model ID: qwen3.5-27b-claude-opus-8bit A 27-billion-parameter dense model distilled from Claude Opus. Delivers frontier-quality reasoning at a fraction of the compute cost of the full Opus model. Well-suited for complex reasoning, analysis, and code generation tasks that benefit from extended thinking.

Property	Value
Architecture	27B dense
Quantization	8-bit
Min provider RAM	36 GB
Input price	$0.10 / 1M tokens
Output price	$0.78 / 1M tokens

Gemma 4 26B 8-bit

Model ID: mlx-community/gemma-4-26b-a4b-it-8bit Google’s Gemma 4 in a 26-billion-parameter mixture-of-experts configuration with only 4 billion parameters active per forward pass. Fast and memory-efficient, with multimodal instruction following. A good default for general-purpose tasks where cost and latency matter.

Property	Value
Architecture	26B MoE, 4B active
Quantization	8-bit
Min provider RAM	36 GB
Input price	$0.065 / 1M tokens
Output price	$0.20 / 1M tokens

Trinity Mini 8-bit

Model ID: mlx-community/Trinity-Mini-8bit A 27-billion-parameter adaptive mixture-of-experts model optimized for agentic use cases — tool use, multi-step reasoning, and long-context tasks. The adaptive routing keeps active parameter count low while maintaining quality on structured tasks.

Property	Value
Architecture	27B Adaptive MoE
Quantization	8-bit
Min provider RAM	48 GB

Qwen3.5 122B MoE 8-bit

Model ID: mlx-community/Qwen3.5-122B-A10B-8bit The highest-quality model in the catalog. 122 billion total parameters with 10 billion active per token — delivering near-full-model quality at significantly reduced inference cost. Best for tasks where output quality is the primary constraint.

Property	Value
Architecture	122B MoE, 10B active
Quantization	8-bit
Min provider RAM	128 GB
Input price	$0.13 / 1M tokens
Output price	$1.04 / 1M tokens

MiniMax M2.5 8-bit

Model ID: mlx-community/MiniMax-M2.5-8bit A state-of-the-art coding and reasoning model with 239 billion total parameters and 11 billion active per token. Achieves approximately 100 tokens per second on Apple Silicon, making it competitive with much smaller models on throughput while delivering top-tier coding quality.

Property	Value
Architecture	239B MoE, 11B active
Quantization	8-bit
Min provider RAM	256 GB
Input price	$0.06 / 1M tokens
Output price	$0.50 / 1M tokens

Choosing a model

General-purpose tasks

Start with mlx-community/gemma-4-26b-a4b-it-8bit. It has the lowest output cost, runs on the widest range of provider hardware (36 GB+), and is fast enough for interactive use.

Complex reasoning and analysis

Use qwen3.5-27b-claude-opus-8bit for tasks requiring multi-step logic, careful analysis, or nuanced writing. The Claude Opus distillation gives it reasoning depth beyond its parameter count.

Best possible output quality

Use mlx-community/Qwen3.5-122B-A10B-8bit. With 122B total parameters, it produces the highest quality output in the catalog across most benchmarks.

Coding and software tasks

Use mlx-community/MiniMax-M2.5-8bit. It was trained for coding tasks and achieves approximately 100 tokens per second, which makes it practical for long code generation.

Agentic and tool-use workflows

Use mlx-community/Trinity-Mini-8bit. Its adaptive MoE routing is tuned for the structured reasoning patterns that appear in tool-use and multi-step agent loops.

The provider_count field tells you how many providers are currently online for each model. A count of zero means the model is in the catalog but no providers are serving it right now — your request will queue. The coordinator retries up to three times before returning an error.

Get Started

Using the API

Become a Provider

Security & Trust

Billing

Browse and select Darkbloom inference models

Request

Response format

Available models

Qwen3.5 27B Claude Opus 8-bit

Gemma 4 26B 8-bit

Trinity Mini 8-bit

Qwen3.5 122B MoE 8-bit

MiniMax M2.5 8-bit

Choosing a model

Get Started

Using the API

Become a Provider

Security & Trust

Billing

Documentation Index

​Request

​Response format

​Available models

​Qwen3.5 27B Claude Opus 8-bit

​Gemma 4 26B 8-bit

​Trinity Mini 8-bit

​Qwen3.5 122B MoE 8-bit

​MiniMax M2.5 8-bit

​Choosing a model

Request

Response format

Available models

Qwen3.5 27B Claude Opus 8-bit

Gemma 4 26B 8-bit

Trinity Mini 8-bit

Qwen3.5 122B MoE 8-bit

MiniMax M2.5 8-bit

Choosing a model