Models · M/01

Mira-Q2 · live on Hugging Face

Small models. Private by design. They read clinical documents — on your hardware.

Dilr Mira is a class of specialised small language models (~3B parameters) that turn scans, lab reports and claim forms into source-grounded, schema-valid JSON — running fully inside your perimeter. The data never leaves the building.

Updated June 2026 · Latest release: Mira-Q2

input · scanned + messy
DISCHARGE — pt c/o SOB x3d, h/o HTN + DM
Hb 9.2 g/dL (L)   Na 138   Creat 1.4
denies chest pain, no known allergies
on T. Metf 500 BD PO, Amlo 5 OD
BP 160/90 triage → 142/88 by phy
output · schema-valid JSON
document_type:"discharge_summary"
symptoms:["shortness of breath ×3d"]
labs.hemoglobin:"9.2 g/dL"· flag: low
diagnoses:["hypertension", "diabetes"]
allergies:[]· denial captured
medications:["Metformin 500 BD", "Amlodipine 5 OD"]
vitals.bp:"142/88"· conflict noted
verifier: schema ✓·7/7 fields grounded·0 identifiers leaked
Real input → grounded output. Every value traceable to its source span.

M/02

LiveApache-2.0

Mira-Q2. Measured on 782 documents — including the hard ones.

0documents evaluated, across 4 test sets
0identifier leaks, all 782 documents
0field-F1 on held-out gold (95% CI 0.999–1.0)
JSON validity by eval set · 95% confidence intervals
Eval setNTypeJSON validityScore
test_gold200Held-out, training distribution100.0% [1.0–1.0]
synthetic_v2150Different formatting dialect100.0% [1.0–1.0]
extraction_relevant150Real physician docs, on-schema94.7% [90.7–98.0]
mtsamples282Real physician docs, 39 specialties85.8% [81.9–89.7]
Qwen2.5-3B zero-shot0%No training — invents its own schema.
Mira-Q198%3,438 training examples · 50-example eval.
Mira-Q2100%8,400 training examples · 200-example eval · field-F1 1.000.

100% on training-distribution data. 86% on general real physician prose. That gap is published, not hidden — and closing it is exactly what the next generation is for.

Base model
Qwen2.5-3B-Instruct
Adapter
QLoRA, r=16
Training data
8,400 examples — 6,400 gold-by-construction + 2,000 schema variants
Train / eval loss
0.132 / 0.142 — overfit gap 0.010
Vocabulary grounding
Real ICD-10 codes · NLM drug names · curated lab reference ranges
License
Apache-2.0

View the model, eval files and full scorecard on Hugging Face ↗

M/03

In development

Mira-3. The enterprise generation — four bets, one trust layer.

  1. Multilingual.

    Hinglish and code-switched clinical text first, then Spanish and Portuguese — the documents English-only medical APIs can’t read.

  2. Your schema, zero-shot.

    Point Mira at any JSON schema and get exactly those fields back — built for India’s NHCX/FHIR claim profiles, no per-template setup.

  3. PII pre-patch.

    Identifiers detected and replaced before the model reads the document, with a reversible vault. The model never sees a name.

  4. Twice the speed, half the cost.

    Plus a smaller routed sibling for CPU-only fleets.

Trust pack

In development
Per-field statistical guaranteesSigned extraction receiptsClick-to-evidence source spansDeterministic replayML bill of materials

Research directions

One extraction engine. Many document worlds.

Mira’s architecture is schema-agnostic — the clinical model is generation one of a method, not a one-off. The same verifier-gated extraction is under research for other regulated document worlds.

schema-as-input core
Insurance claimsPrior-authorisationKYC & onboardingInvoices & receiptsLab networksLegal intake
Insurance claimsPrior-authorisationKYC & onboardingInvoices & receiptsLab networksLegal intake
Insurance claimsPrior-authorisationKYC & onboardingInvoices & receiptsLab networksLegal intake
Insurance claimsPrior-authorisationKYC & onboardingInvoices & receiptsLab networksLegal intake

Research directions — clinical extraction is what’s shipped today.

Deployment · local-first

The whole model fits on one ordinary computer.

  1. Download

    Pull the open model from Hugging Face — Apache-2.0, ~2 GB on disk in 4-bit.

  2. Load

    One Python call. CPU is enough; a GPU just makes it faster.

  3. Extract

    Paste a document, receive schema-valid JSON.

No cluster, no GPU requirement, no API key, no per-page meter. The full quickstart lives on the model page.

quickstart — mira-q2
# 01 · download + install
$ pip install transformers

# 02 · load — one call
$ python
>>> from transformers import pipeline
>>> mira = pipeline("text-generation", model="dilr/Mira-Q2")

# 03 · extract
>>> mira(open("discharge_note.txt").read())
{"document_type": "discharge_summary", ...}
3Bparameters
~2 GBon disk (4-bit)
CPUis all it needs
0 bytesleave the machine
Cloud API: document → internet → vendor → ?Mira: document → your machine → JSON.

Governance · audit-ready by design

Every output gated. Every record accountable.

Mira is built for teams whose auditors read the logs. Nothing ships on model output alone.

record / 01Document
model / 02Mira
gate / 03Verifier
  • schema ✓
  • grounding ✓
  • zero-leak ✓
gate / 04Human review
ledger / 05Audit log
Proof, today
0 identifier leaks across 782 evaluated documentsEvery field source-groundedOutputs are drafts for human review — never autonomous clinical decisionsApache-2.0: weights you can inspect
In development
Signed receiptsDeterministic replayExternally-anchored audit chain

Bring Mira the document nobody else can read.

Try the open model on Hugging Face, or talk to us about a private, schema-matched deployment for your claims and clinical documents.

FAQ

Frequently asked, plainly answered.

What is Dilr Mira?

Dilr Mira is a class of small, specialised language models (~3B parameters) that read clinical documents — lab reports, discharge summaries, claim forms — and return source-grounded, schema-valid JSON. The current release, Mira-Q2, is open on Hugging Face under Apache-2.0.

Does my data leave my infrastructure?

No. Mira runs entirely on your own hardware — on-premises, even air-gapped, even CPU-only. No document, field or identifier is ever sent to a third-party API.

What does it cost to run?

The model is free and open (Apache-2.0). Running it costs only the hardware you already own — no per-page fees, no API meter, no egress. Private, schema-matched deployments are priced per engagement.

Which documents does it read today?

Mira-Q2 is strongest on the types it was trained on: lab reports, medication lists, discharge summaries, pathology reports, intake forms and progress notes, in English. Broader document types and languages are in development for Mira-3.

Can Mira make clinical decisions on its own?

No. Every output is a draft for human review, gated by a deterministic verifier that checks schema validity, source grounding and identifier leakage. Mira is an extraction tool, not a medical device.