Documentation Index
Fetch the complete documentation index at: https://docs.darkbloom.dev/llms.txt
Use this file to discover all available pages before exploring further.
GET /v1/models returns the list of models currently available on the Darkbloom network. The response follows the OpenAI models list format, with additional fields for trust level, provider count, and hardware metadata. All listed models have been verified by the coordinator and have at least one active provider online.
Request
curl
Response format
Always
"list".Array of model objects.
Available models
The catalog currently includes the following models. All are quantized to 8-bit for efficient Apple Silicon inference.Qwen3.5 27B Claude Opus 8-bit
Model ID:qwen3.5-27b-claude-opus-8bit
A 27-billion-parameter dense model distilled from Claude Opus. Delivers frontier-quality reasoning at a fraction of the compute cost of the full Opus model. Well-suited for complex reasoning, analysis, and code generation tasks that benefit from extended thinking.
| Property | Value |
|---|---|
| Architecture | 27B dense |
| Quantization | 8-bit |
| Min provider RAM | 36 GB |
| Input price | $0.10 / 1M tokens |
| Output price | $0.78 / 1M tokens |
Gemma 4 26B 8-bit
Model ID:mlx-community/gemma-4-26b-a4b-it-8bit
Google’s Gemma 4 in a 26-billion-parameter mixture-of-experts configuration with only 4 billion parameters active per forward pass. Fast and memory-efficient, with multimodal instruction following. A good default for general-purpose tasks where cost and latency matter.
| Property | Value |
|---|---|
| Architecture | 26B MoE, 4B active |
| Quantization | 8-bit |
| Min provider RAM | 36 GB |
| Input price | $0.065 / 1M tokens |
| Output price | $0.20 / 1M tokens |
Trinity Mini 8-bit
Model ID:mlx-community/Trinity-Mini-8bit
A 27-billion-parameter adaptive mixture-of-experts model optimized for agentic use cases — tool use, multi-step reasoning, and long-context tasks. The adaptive routing keeps active parameter count low while maintaining quality on structured tasks.
| Property | Value |
|---|---|
| Architecture | 27B Adaptive MoE |
| Quantization | 8-bit |
| Min provider RAM | 48 GB |
Qwen3.5 122B MoE 8-bit
Model ID:mlx-community/Qwen3.5-122B-A10B-8bit
The highest-quality model in the catalog. 122 billion total parameters with 10 billion active per token — delivering near-full-model quality at significantly reduced inference cost. Best for tasks where output quality is the primary constraint.
| Property | Value |
|---|---|
| Architecture | 122B MoE, 10B active |
| Quantization | 8-bit |
| Min provider RAM | 128 GB |
| Input price | $0.13 / 1M tokens |
| Output price | $1.04 / 1M tokens |
MiniMax M2.5 8-bit
Model ID:mlx-community/MiniMax-M2.5-8bit
A state-of-the-art coding and reasoning model with 239 billion total parameters and 11 billion active per token. Achieves approximately 100 tokens per second on Apple Silicon, making it competitive with much smaller models on throughput while delivering top-tier coding quality.
| Property | Value |
|---|---|
| Architecture | 239B MoE, 11B active |
| Quantization | 8-bit |
| Min provider RAM | 256 GB |
| Input price | $0.06 / 1M tokens |
| Output price | $0.50 / 1M tokens |
Choosing a model
General-purpose tasks
General-purpose tasks
Start with
mlx-community/gemma-4-26b-a4b-it-8bit. It has the lowest output cost, runs on the widest range of provider hardware (36 GB+), and is fast enough for interactive use.Complex reasoning and analysis
Complex reasoning and analysis
Use
qwen3.5-27b-claude-opus-8bit for tasks requiring multi-step logic, careful analysis, or nuanced writing. The Claude Opus distillation gives it reasoning depth beyond its parameter count.Best possible output quality
Best possible output quality
Use
mlx-community/Qwen3.5-122B-A10B-8bit. With 122B total parameters, it produces the highest quality output in the catalog across most benchmarks.Coding and software tasks
Coding and software tasks
Use
mlx-community/MiniMax-M2.5-8bit. It was trained for coding tasks and achieves approximately 100 tokens per second, which makes it practical for long code generation.Agentic and tool-use workflows
Agentic and tool-use workflows
Use
mlx-community/Trinity-Mini-8bit. Its adaptive MoE routing is tuned for the structured reasoning patterns that appear in tool-use and multi-step agent loops.The
provider_count field tells you how many providers are currently online for each model. A count of zero means the model is in the catalog but no providers are serving it right now — your request will queue. The coordinator retries up to three times before returning an error.