Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.darkbloom.dev/llms.txt

Use this file to discover all available pages before exploring further.

GET /v1/models returns the list of models currently available on the Darkbloom network. The response follows the OpenAI models list format, with additional fields for trust level, provider count, and hardware metadata. All listed models have been verified by the coordinator and have at least one active provider online.

Request

curl
curl https://api.darkbloom.dev/v1/models \
  -H "Authorization: Bearer eigeninference-your-key-here"

Response format

object
string
Always "list".
data
object[]
Array of model objects.

Available models

The catalog currently includes the following models. All are quantized to 8-bit for efficient Apple Silicon inference.

Qwen3.5 27B Claude Opus 8-bit

Model ID: qwen3.5-27b-claude-opus-8bit A 27-billion-parameter dense model distilled from Claude Opus. Delivers frontier-quality reasoning at a fraction of the compute cost of the full Opus model. Well-suited for complex reasoning, analysis, and code generation tasks that benefit from extended thinking.
PropertyValue
Architecture27B dense
Quantization8-bit
Min provider RAM36 GB
Input price$0.10 / 1M tokens
Output price$0.78 / 1M tokens

Gemma 4 26B 8-bit

Model ID: mlx-community/gemma-4-26b-a4b-it-8bit Google’s Gemma 4 in a 26-billion-parameter mixture-of-experts configuration with only 4 billion parameters active per forward pass. Fast and memory-efficient, with multimodal instruction following. A good default for general-purpose tasks where cost and latency matter.
PropertyValue
Architecture26B MoE, 4B active
Quantization8-bit
Min provider RAM36 GB
Input price$0.065 / 1M tokens
Output price$0.20 / 1M tokens

Trinity Mini 8-bit

Model ID: mlx-community/Trinity-Mini-8bit A 27-billion-parameter adaptive mixture-of-experts model optimized for agentic use cases — tool use, multi-step reasoning, and long-context tasks. The adaptive routing keeps active parameter count low while maintaining quality on structured tasks.
PropertyValue
Architecture27B Adaptive MoE
Quantization8-bit
Min provider RAM48 GB

Qwen3.5 122B MoE 8-bit

Model ID: mlx-community/Qwen3.5-122B-A10B-8bit The highest-quality model in the catalog. 122 billion total parameters with 10 billion active per token — delivering near-full-model quality at significantly reduced inference cost. Best for tasks where output quality is the primary constraint.
PropertyValue
Architecture122B MoE, 10B active
Quantization8-bit
Min provider RAM128 GB
Input price$0.13 / 1M tokens
Output price$1.04 / 1M tokens

MiniMax M2.5 8-bit

Model ID: mlx-community/MiniMax-M2.5-8bit A state-of-the-art coding and reasoning model with 239 billion total parameters and 11 billion active per token. Achieves approximately 100 tokens per second on Apple Silicon, making it competitive with much smaller models on throughput while delivering top-tier coding quality.
PropertyValue
Architecture239B MoE, 11B active
Quantization8-bit
Min provider RAM256 GB
Input price$0.06 / 1M tokens
Output price$0.50 / 1M tokens

Choosing a model

Start with mlx-community/gemma-4-26b-a4b-it-8bit. It has the lowest output cost, runs on the widest range of provider hardware (36 GB+), and is fast enough for interactive use.
Use qwen3.5-27b-claude-opus-8bit for tasks requiring multi-step logic, careful analysis, or nuanced writing. The Claude Opus distillation gives it reasoning depth beyond its parameter count.
Use mlx-community/Qwen3.5-122B-A10B-8bit. With 122B total parameters, it produces the highest quality output in the catalog across most benchmarks.
Use mlx-community/MiniMax-M2.5-8bit. It was trained for coding tasks and achieves approximately 100 tokens per second, which makes it practical for long code generation.
Use mlx-community/Trinity-Mini-8bit. Its adaptive MoE routing is tuned for the structured reasoning patterns that appear in tool-use and multi-step agent loops.
The provider_count field tells you how many providers are currently online for each model. A count of zero means the model is in the catalog but no providers are serving it right now — your request will queue. The coordinator retries up to three times before returning an error.