Case File: AI Model Evaluations

EVIDENCE DOSSIER: MODEL TESTS

Date	Model	Resource Used	Tests	Result	Notes
2024-05-12	Llama-3-8B-Instruct	Ollama + RTX 3060 12GB	MMLU · TruthfulQA · Code	View Output	Exceptional instruction compliance [REF: SFT-V3]. Minor hallucination on low-resource languages [FLAG: LOW-PR].
2024-05-14	Phi-3-mini-4k	LM Studio + Apple M2	Roleplay consistency · JSON parsing	View Output	Remarkably efficient for 4B parameters [NOTE: MID-RIG OPTIMAL]. Context window stable at 3840 tokens.
2024-05-18	Mistral-7B-v0.3	TGI WebUI + Dual 2080Ti	Math reasoning · Long-context	View Output	Crisp numerical logic [VERIFIED]. Context decay noted past 4k tokens [ACTION: TRUNCATE>].

Unconventional stress tests, extended roleplay transcripts, and edge-case benchmarks logged outside the primary index.

Open Case File