Methodology

How AutoICD API turns clinical text into structured codes, and how we measure whether it gets them right.

Coding engine

The runtime is provider agnostic. By default it calls Gemini 2.5 Flash Lite via the Google Gen AI SDK; setting LLM_PROVIDER=claude swaps in Anthropic Claude through a thin adapter. Only one file imports any specific provider SDK, so adding a new model is a single adapter file and a registry entry.

Every /v1/code request runs a two pass strategy:

Pass 1, NER. The LLM extracts conditions from the clinical text with negation and context flags (current, history, family, uncertain). The schema is structured JSON, not free text.
Term index lookup. Each extracted condition is normalized and matched against a pre-built term index assembled from SNOMED CT, UMLS, WHO, manual overrides, and scraped synonyms. A hit short-circuits Pass 2 and returns the cached code directly.
Pass 2, code lookup. Only conditions that miss the cache go to a second LLM call which assigns an ICD-10-CM code with a confidence score. Every code is verified against the local code table; an invalid code triggers one retry with corrective feedback.
Postprocess. Codes are validated, snapped to billable specificity where appropriate, and a deterministic text match step picks the most specific child code consistent with the documentation. Companion codes (code-also and use-additional-code) are auto-attached.

Benchmark

We track accuracy with a 640 case TypeScript benchmark suite drawn from CDI examples, public discharge summaries, and edge cases the team has hand curated. Each case has gold standard ICD-10-CM codes, including expected negation and context outcomes.

The locked benchmark configuration runs on the full suite with --term-hints and --verify-codes enabled, against the latest commit on main. Results are written to benchmark_results/ with model, branch, commit, and cost metadata so any regression can be traced back to a specific change.

Latest pass rate: 88.4% on the 640 case suite (post fix run, 2026-05-05). Prior baselines and per case diffs are kept in the repo for audit.

Data sources

Every reference page and cross reference is built from licensed primary sources, refreshed on each upstream release.

ICD-10-CM
FY2025 release, CDC NCHS
ICD-11 MMS
WHO 2026-01 release
SNOMED CT
US Edition, NLM
UMLS Metathesaurus
MRCONSO, NLM
RxNorm
NLM monthly release
LOINC
v2.78, Regenstrief
ICF
WHO Functioning Classification

HIPAA and PHI handling

Clinical text is processed in memory only. No PHI is written to disk, persisted in the application database, or used to train any model. A signed Business Associate Agreement is available on the Pro and Enterprise tiers; see the HIPAA page and the Security page for the full data flow and architectural controls.

Questions about how we evaluate accuracy or pick a model? Email the team.