FACILITY 7

AI Model Testing Laboratory

Domestic Inference Benchmarking — Industrial Standard

Model Tests Results

FORM 734-B

DWG: AMTL-2025-001

Date / Shift	Model Specification	Hardware Configuration	Test Protocols	Measured Output	Engineer's Notes
2025-01-15 03:47 — Night Shift	Llama 3.1 8B Instruct Meta AI Consortium	NVIDIA GeForce RTX 3090 Ollama v0.4.0 runtime Q4_K_M quantization 24GB GDDR6X	MMLU-Pro GPQA IFEval	Retrieve Data	VRAM utilization peaked at 14.2GB. Throughput measured at 47 tok/s sustained. Six-hour continuous operation without thermal throttling or OOM termination. Quantization loss within acceptable parameters for reasoning tasks.
2025-01-12 22:15 — Night Shift	Qwen2.5 14B Instruct Alibaba DAMO Academy	RTX 3090 + 64GB System RAM llama.cpp build 4000 Q5_K_M quantization Mixed GPU/CPU inference	HumanEval MBPP MultiPL-E LiveCodeBench	Retrieve Data	Eight transformer layers offloaded to system memory. Throughput degraded to 23 tok/s with CPU bottleneck. Code generation capabilities exceptional in C++ and Python domains. Rust compilation tasks failed due to context window exhaustion, not model incompetence.
2025-01-08 17:33 — Day Shift	Mistral Small 24B Instruct Mistral AI — Paris, France	Dual RTX 3090 SLI-NVLink vLLM 0.6.5 engine BF16 native precision 48GB aggregate VRAM	MATH GSM8K BBH	Retrieve Data	Native 24B parameter architecture fits within 48GB aggregate without compression. 89 tok/s sustained — highest throughput recorded. Chain-of-thought reasoning substantially outperforms all quantized alternatives tested to date. NVLink inter-GPU latency negligible.

Auxiliary Testing Chamber

Roleplaying evaluation protocols, creative generation stress tests, character consistency verification procedures, and experimental prompt engineering trials. Materials resist standard quantitative analysis but contribute essential qualitative data to model assessment matrix.

Enter Chamber