Models · M/01
Mira-Q2 · live on Hugging FaceSmall models. Private by design. They read clinical documents — on your hardware.
Dilr Mira is a class of specialised small language models (~3B parameters) that turn scans, lab reports and claim forms into source-grounded, schema-valid JSON — running fully inside your perimeter. The data never leaves the building.
Updated June 2026 · Latest release: Mira-Q2
DISCHARGE — pt c/o SOB x3d, h/o HTN + DM Hb 9.2 g/dL (L) Na 138 Creat 1.4 denies chest pain, no known allergies on T. Metf 500 BD PO, Amlo 5 OD BP 160/90 triage → 142/88 by phy
M/02
LiveApache-2.0Mira-Q2. Measured on 782 documents — including the hard ones.
| Eval set | N | Type | JSON validity | Score |
|---|---|---|---|---|
| test_gold | 200 | Held-out, training distribution | 100.0% [1.0–1.0] | |
| synthetic_v2 | 150 | Different formatting dialect | 100.0% [1.0–1.0] | |
| extraction_relevant | 150 | Real physician docs, on-schema | 94.7% [90.7–98.0] | |
| mtsamples | 282 | Real physician docs, 39 specialties | 85.8% [81.9–89.7] |
100% on training-distribution data. 86% on general real physician prose. That gap is published, not hidden — and closing it is exactly what the next generation is for.
- Base model
- Qwen2.5-3B-Instruct
- Adapter
- QLoRA, r=16
- Training data
- 8,400 examples — 6,400 gold-by-construction + 2,000 schema variants
- Train / eval loss
- 0.132 / 0.142 — overfit gap 0.010
- Vocabulary grounding
- Real ICD-10 codes · NLM drug names · curated lab reference ranges
- License
- Apache-2.0
View the model, eval files and full scorecard on Hugging Face ↗
M/03
In developmentMira-3. The enterprise generation — four bets, one trust layer.
Multilingual.
Hinglish and code-switched clinical text first, then Spanish and Portuguese — the documents English-only medical APIs can’t read.
Your schema, zero-shot.
Point Mira at any JSON schema and get exactly those fields back — built for India’s NHCX/FHIR claim profiles, no per-template setup.
PII pre-patch.
Identifiers detected and replaced before the model reads the document, with a reversible vault. The model never sees a name.
Twice the speed, half the cost.
Plus a smaller routed sibling for CPU-only fleets.
Trust pack
In developmentResearch directions
One extraction engine. Many document worlds.
Mira’s architecture is schema-agnostic — the clinical model is generation one of a method, not a one-off. The same verifier-gated extraction is under research for other regulated document worlds.
Research directions — clinical extraction is what’s shipped today.
Deployment · local-first
The whole model fits on one ordinary computer.
Download
Pull the open model from Hugging Face — Apache-2.0, ~2 GB on disk in 4-bit.
Load
One Python call. CPU is enough; a GPU just makes it faster.
Extract
Paste a document, receive schema-valid JSON.
No cluster, no GPU requirement, no API key, no per-page meter. The full quickstart lives on the model page.
# 01 · download + install
$ pip install transformers
# 02 · load — one call
$ python
>>> from transformers import pipeline
>>> mira = pipeline("text-generation", model="dilr/Mira-Q2")
# 03 · extract
>>> mira(open("discharge_note.txt").read())
{"document_type": "discharge_summary", ...}Governance · audit-ready by design
Every output gated. Every record accountable.
Mira is built for teams whose auditors read the logs. Nothing ships on model output alone.
- schema ✓
- grounding ✓
- zero-leak ✓
Governance is a dilr.ai discipline, not a feature — see how we build audit-ready operating models →
Bring Mira the document nobody else can read.
Try the open model on Hugging Face, or talk to us about a private, schema-matched deployment for your claims and clinical documents.
FAQ
Frequently asked, plainly answered.
What is Dilr Mira?
Dilr Mira is a class of small, specialised language models (~3B parameters) that read clinical documents — lab reports, discharge summaries, claim forms — and return source-grounded, schema-valid JSON. The current release, Mira-Q2, is open on Hugging Face under Apache-2.0.
Does my data leave my infrastructure?
No. Mira runs entirely on your own hardware — on-premises, even air-gapped, even CPU-only. No document, field or identifier is ever sent to a third-party API.
What does it cost to run?
The model is free and open (Apache-2.0). Running it costs only the hardware you already own — no per-page fees, no API meter, no egress. Private, schema-matched deployments are priced per engagement.
Which documents does it read today?
Mira-Q2 is strongest on the types it was trained on: lab reports, medication lists, discharge summaries, pathology reports, intake forms and progress notes, in English. Broader document types and languages are in development for Mira-3.
Can Mira make clinical decisions on its own?
No. Every output is a draft for human review, gated by a deterministic verifier that checks schema validity, source grounding and identifier leakage. Mira is an extraction tool, not a medical device.