Qwen2.5 14B Instruct
Alibaba Cloud · Cross-Lingual Order
RTX 3090 + 64GB host
llama.cpp b4000 · Q5_K_M
HumanEval
MBPP
MultiPL-E
LiveCodeBench
Eight layers migrated to host memory. Activity reduced to 23 pulses/second. Code-generation behaviors exceptional in C++ and Python environments. Rust task failures attributed to context exhaustion, not capability deficiency.
Mistral Small 24B Instruct
Mistral AI · European Clade
Dual RTX 3090 NVLink
vLLM 0.6.5 · BF16 native
MATH
GSM8K
BBH
Native 24B architecture occupies 48GB aggregate without compression. 89 pulses/second — highest recorded activity. Chain-of-thought behaviors significantly exceed all compressed specimens. Inter-GPU latency via NVLink negligible.