We test obsessively because trust requires proof.
We haven't found another open-source memory system that publishes stress test results across multiple LLM models. We do because the system was designed to work with ANY model — and we need to prove it actually does.
We gave 4 different AIs a folder with DARA and told them to break it. 52 autonomous tests, 7 evaluation blocks. No human intervention. No cherry-picking.
| Model | Platform | Score | % |
|---|---|---|---|
| Opus 4.6 | Claude Cowork | 670/700 | 95.7% |
| Sonnet 4.6 | Claude Cowork | 652/700 | 93.1% |
| DeepSeek V4 | TypingMind (no shell) | 650/700 | 92.9% |
| Opus 4.7 | Claude Cowork | 613/700 | 87.6% |
Click any block to see what we actually tested.
→brain: summary line, respecting file naming conventions (W8), logging in changelog (W7), and triggering compile after writes (W6).Beyond testing with AIs, the compiler itself has 103 automated tests that run instantly. These test every function in isolation: does the checksum calculator produce correct hashes? Does the deduplication engine catch similar content? Does the auto-fix not corrupt valid files?
This means every code change to the compiler is verified automatically — if something breaks, it's caught before it reaches your system.
103 tests · 100% pass rate · Runs in <2 seconds
4 models. 52 tests each. Zero system failures. Zero data corruption. The architecture works — across every model, every platform we tested. Ready to use today.