EVIDENCE DOSSIER: MODEL TESTS
| Date | Model | Resource Used | Tests | Result | Notes |
|---|---|---|---|---|---|
| 2024-05-12 | Llama-3-8B-Instruct | Ollama + RTX 3060 12GB | MMLU · TruthfulQA · Code | View Output | Exceptional instruction compliance [REF: SFT-V3]. Minor hallucination on low-resource languages [FLAG: LOW-PR]. |
| 2024-05-14 | Phi-3-mini-4k | LM Studio + Apple M2 | Roleplay consistency · JSON parsing | View Output | Remarkably efficient for 4B parameters [NOTE: MID-RIG OPTIMAL]. Context window stable at 3840 tokens. |
| 2024-05-18 | Mistral-7B-v0.3 | TGI WebUI + Dual 2080Ti | Math reasoning · Long-context | View Output | Crisp numerical logic [VERIFIED]. Context decay noted past 4k tokens [ACTION: TRUNCATE>]. |
ACCESS REDACTED DOSSIER
Unconventional stress tests, extended roleplay transcripts, and edge-case benchmarks logged outside the primary index.
Open Case File