AI Diagnosis for Acute Medicine

Instead of 2 Days: AI Analyzes Complete Brain MRIs in 3 Seconds

12. Februar 2026, 11:20 Uhr | Elektronik Medical (uh)
Detect a stroke in just 3 seconds, locate it precisely, and narrow it down—AI-based, doctor-accurate diagnosis can not only save lives in emergency rooms, but also simplify triage.
© Componeers via Canva

An AI model from Michigan analyzes MRI studies with over 30 sequences in just three seconds on a single GPU – as accurate as experienced radiologists. A compression algorithm shrinks gigantic 3D volumes into manageable token packages. The system is designed to relieve the burden on emergency rooms.

Diesen Artikel anhören

While patients currently wait a median of two days for MRI results, the AI system "Prima" promises a revolution: 3 seconds for a complete brain MRI analysis. The key is not larger models, but a clever tokenization strategy that reduces the data volume by a factor of 16—without any loss of accuracy.

16x Compression Makes all the Difference

A single brain MRI study consists of 20 to 40 different sequences, each with hundreds of 2D slices. If each pixel were fed individually into a transformer model, this would result in over 32,000 tokens per study—too many for efficient real-time processing.

The solution developed by the Michigan researchers is a vector-quantized variational autoencoder (VQ-VAE) that compresses three-dimensional MRI patches of 32×32×4 voxels into a single embedding vector. The system uses a codebook with 8,192 learned entries that optimizes the balance between reconstruction quality and compression rate. Hundreds of thousands of voxels are thus reduced to just 256 tokens per sequence—a reduction to one-sixteenth of the original data volume, which brings computing times to GPU-compatible levels.

Hierarchical vision transformers for multi-level data

Prima is based on a two-stage transformer architecture that was specifically developed for medical data with multiple hierarchical levels. The first level consists of a sequence encoder (ViT_seq) with 15 transformer layers and 16 attention heads each, which processes the 256 volume tokens of a single MRI sequence. Here, 32 register tokens store the compressed features, while a character-level transformer integrates metadata such as the sequence name ("Ax_T2_FLAIR") in parallel.

On the second level, a study encoder (ViT_st) with 4 transformer layers and 8 heads aggregates the information from all 20 to 40 sequences of a complete study. Here, 10 additional register tokens are used for study-level features. The entire architecture requires only 56.6 million trainable parameters – compact enough for use on individual GPUs.

The training was based on the CLIP principle: The vision encoders use contrastive learning to learn how to represent MRI volumes in such a way that they correspond to the associated radiologist findings in the embedding space. Text processing is handled by a GPT-2-based encoder.

The database

The researchers trained Prima on an unprecedented amount of data from the Michigan Medicine Health System: 5.6 million MRI sequences from 188,000 patients examined between 2012 and 2024. This corresponds to 362 million individual slices or 3.2 billion volume tokens after compression. By comparison, previous deep learning models for medical imaging typically worked with 10,000 to 100,000 studies – Prima uses a database that is 50 to 100 times larger.​

Leading the Way in Emergency Detection

In three separate test sets, Prima significantly outperformed all comparative models. In detecting 24 acute findings such as cerebral hemorrhages or strokes, the system achieved an AUROC (area under the ROC curve) value of 82.1 percent, while the best alternative model only achieved 63.4 percent – an increase of 18.7 percentage points. The inference speed is only 3 seconds for a complete brain MRI study with over 30 sequences on a GPU, while radiologists today need a median of 2.25 days – a threefold increase from 0.75 days in 2012.

Its bias robustness is particularly noteworthy: Prima shows consistent true positive rates across all demographic groups, regardless of age, gender, ethnicity, insurance status, or geographic origin. The researchers found no statistically significant differences in performance between urban and rural patients—an important aspect for fair use in healthcare.

Practical: One GPU is sufficient

Prima does not require high-performance clusters, but runs on a single modern GPU—the researchers used NVIDIA A100 in training. The compact architecture with 56.6 million parameters even allows edge deployment in smaller hospitals or telemedicine scenarios where no extensive computing infrastructure is available.

The team led by lead author Todd Hollon from the Department of Neurosurgery is already planning to scale up to other modalities. In the future, the system will also process CT scans, X-rays, and ultrasound images—using the same architecture.

Technical Specifications

Model architecture: Hierarchical Vision Transformer with VQ-VAE tokenization
Parameters: 56.6 million (vision encoder)
Compression: 32×32×4 voxels → 1 token (factor of 16)
Codebook: 8,192 entries
Sequence encoder: 15 layers, 16 heads, 32 register tokens
Study encoder: 4 layers, 8 heads, 10 register tokens
Training: CLIP-based contrastive learning with GPT-2 text encoder
Inference: 3 seconds/study on a single GPU

From Research to the Clinic

Prima is still undergoing clinical validation. The team is planning prospective studies to test its integration into clinical workflows. The open research code is intended to enable other institutions to validate and adapt the model using their own data.

The developers' vision is an AI triage system that identifies and prioritizes critical cases within seconds of the scan, while routine cases continue to be assessed by radiologists. This could not only drastically reduce waiting times, but also save lives by enabling emergencies such as cerebral hemorrhages or acute strokes to be detected and treated immediately. Initial discussions with the FDA regarding regulatory approval are already underway. (uh)

[1] Todd C. Hollon et al., "Learning neuroimaging models from health system-scale data", Nature Biomedical Engineering, Februar 2025 (PMC11838732)[michiganmedicine]​


Matchmaker+