CASE LOG: AI-LAB-2024 · CLEARANCE: UNRESTRICTED

Local Model Evaluation Terminal

Methodical benchmarking · Evidence tracking · Redaction-free analysis

EVIDENCE DOSSIER: MODEL TESTS

Date Model Resource Used Tests Result Notes
2024-05-12 Llama-3-8B-Instruct Ollama + RTX 3060 12GB MMLU · TruthfulQA · Code View Output Exceptional instruction compliance [REF: SFT-V3]. Minor hallucination on low-resource languages [FLAG: LOW-PR].
2024-05-14 Phi-3-mini-4k LM Studio + Apple M2 Roleplay consistency · JSON parsing View Output Remarkably efficient for 4B parameters [NOTE: MID-RIG OPTIMAL]. Context window stable at 3840 tokens.
2024-05-18 Mistral-7B-v0.3 TGI WebUI + Dual 2080Ti Math reasoning · Long-context View Output Crisp numerical logic [VERIFIED]. Context decay noted past 4k tokens [ACTION: TRUNCATE>].

ACCESS REDACTED DOSSIER

Unconventional stress tests, extended roleplay transcripts, and edge-case benchmarks logged outside the primary index.

Open Case File