Latest Medical Imaging Research Papers
The newest Medical Imaging papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Medical Imaging so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Medical Imaging papers in your inbox — free →Recent papers
- WorldOlympiad: Can Your World Model Survive a Triathlon?Yuke Zhao, Wangbo Zhao, Weijie Wang, Zeyu Zhang et al. · arXiv · Jun 9, 2026
We introduce WorldOlympiad, a benchmark for diagnosing video-based world models across physical faithfulness, geometric consistency, and interaction fidelity. While existing benchmarks often focus on visual quality, semantic alignment, or s…
- Multimodal Brain Tumour Classification Using Feature FusionWajih ul Islam, Muhammad Yaqoob, Javed Ali Khan, Volker Steuber · arXiv · Jun 9, 2026
Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT …
- U-TTT: Towards Generalizable PET Image Denoising via Test-Time TrainingZhiwen Yang, Jiayin Li, Hao Lu, Hui Zhang et al. · arXiv · Jun 9, 2026
Existing deep learning models for Positron Emission Tomography (PET) image denoising often suffer from severe performance degradation under distribution shifts, fundamentally restricting their robust clinical deployment. This lack of genera…
- Cranio-Diff: Diffusion-based Cross-domain Craniofacial Reconstruction with 2D X-ray Skull Guidance and Structural Identity ConstraintsRavi Shankar Prasad, Naresh Gurjar, Shashank Baghel, Chirag et al. · arXiv · Jun 8, 2026
The state-of-the-art generative models, such as CycleGAN, Pix2Pix, and diffusion models have demonstrated remarkable performance in the face generation task. However, they fail to effectively capture cross-modality semantic information in c…
- RealDocBench: A Benchmark for Field-Level QA and Layout Understanding on Real-World Regulated DocumentsAmeya Joshi, Joon Kim, Gus Eggert, Joseph Bajor et al. · arXiv · Jun 5, 2026
Document parsing systems are increasingly deployed in high-stakes, regulated workflows such as mortgage underwriting, financial reporting, supply-chain logistics, and clinical records. Yet most public benchmarks evaluate parsers on clean ac…
- Mind the Gap: Disentangling Performance Bottlenecks in Video Instance SegmentationDanial Hamdi, Fardin Ayar, Mahdi Javanmardi · arXiv · Jun 5, 2026
In Video Instance Segmentation (VIS), classification, segmentation, and tracking objectives are jointly evaluated, but their individual contributions to performance loss remain opaque. We introduce a diagnostic framework that formulates ide…
- Impact of Synthetic Lesional MR Images in Automated Focal Cortical Dysplasia Detection in Low-Data ScenariosPrabhjot Kaur, Hakim Ouaalam, Sedat Kandemirli, Sanjay P. Prabhu et al. · arXiv · Jun 5, 2026
Background and Purpose: Automated detection of focal cortical dysplasia (FCD) requires large volumes of voxelwise lesion-delineated MRI data, which are difficult to acquire. This study aims to generate synthetic MRI data exhibiting FCD, ass…
- Mitosis Detection in the Wild: Multi-Tumor and Context-Aware Generalization in the MIDOG 2025 ChallengeMarc Aubreville, Jonas Ammeling, Sweta Banerjee, Viktoria Weiss et al. · arXiv · Jun 5, 2026
Automated mitosis detection is a well-established task in computational pathology. While previous benchmarks focused on scanner-induced domain shift, clinical "real-world" application requires models to be robust across the vast variance to…
- In-Context Multiple Instance LearningAlexander Möllers, Marvin Sextro, Julius Hense, Gabriel Dernbach et al. · arXiv · Jun 4, 2026
Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existi…
- EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language ModelsQiwei Zeng, Hao Wang, Jinghao Lin, Shuchang Ye et al. · arXiv · Jun 4, 2026
Medical vision-language models (VLMs) have shown increasing potential for clinical image interpretation, including lesion detection and report generation. However, their practical utility remains limited by insufficient sensitivity to subtl…
- Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual EventsXiaolin Liu, Yilun Zhu, Xiangyu Zhao, Xuehui Wang et al. · arXiv · Jun 1, 2026
Video multimodal large language models (MLLMs) have made rapid progress on general and long-form video understanding, yet their ability to preserve brief answer-critical visual evidence remains underexplored. Many practical questions are de…
- Deep Learning Strain Estimation: Is Physics-Based Simulation the Solution?Thierry Judge, Nicolas Duchateau, Andreas Østvik, Khuram Faraz et al. · arXiv · May 27, 2026
Speckle tracking echocardiography (STE) is the clinical standard for myocardial strain estimation. Despite good performance on global strain (GLS), its accuracy for regional strain remains limited, even though this biomarker is highly relev…
- A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and DeblurringAdina Scheinfeld, Haotan Zhang, Shang Mu, Rudolf L. M. van Herten et al. · arXiv · May 25, 2026
Light sheet fluorescence microscopy (LSM) enables high-resolution, three-dimensional (3D) imaging of biological specimens, providing rich volumetric data for studying cellular organization, pathology, and vascular networks. However, the siz…
- Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMsJongseo Lee, Hyuntak Lee, Sunghun Kim, Sooa Kim et al. · arXiv · May 21, 2026
Video Large Language Models (Video-LLMs) have made rapid progress on temporal video understanding, yet many fail at a basic perceptual primitive: signed image-plane motion direction. On simple videos of a single object moving left, right, u…
- Synthetic Data Alone is Enough? Rethinking Data Scarcity in Pediatric Rare Disease RecognitionGanlin Feng, Yuxi Long, Erin Lou, Lianghong Chen et al. · arXiv · May 21, 2026
Children with rare genetic diseases often exhibit distinctive facial phenotypes, yet developing computer vision systems for early diagnosis remains challenging due to extreme data scarcity, privacy constraints, and limited data sharing in p…
- Availability of individual patient data - a comparison of the BMJ with other major medical journalsAlison Avenell, Dorothy Bishop · medRxiv · May 21, 2026
Abstract Background In 2024, the BMJ updated its data-sharing policy for clinical trials, requiring deidentified individual patient data (IPD) to be openly deposited prior to publication. We considered whether data-sharing increased after i…
- EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric VideosRuiping Liu, Junwei Zheng, Yufan Chen, Di Wen et al. · arXiv · May 18, 2026
Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchm…
- Real‐world comprehensive care of people living with schizophrenia: recommendations across different settings and clinical stagesPaolo Fusar‐Poli, Toby Pillinger, Robert A. McCutcheon, Thara Rangaswamy et al. · World Psychiatry · May 15, 2026
The clinical management of a complex disorder such as schizophrenia remains a significant challenge worldwide. This disorder requires a comprehensive, integrated and personalized care that blends multiple approaches, and the real-world avai…
- Quantitative Video World Model Evaluation for Geometric-ConsistencyJiaxin Wu, Yihao Pi, Yinling Zhang, Yuheng Li et al. · arXiv · May 14, 2026
Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce physically plausible 3D structure and motion remains challenging. Most existing video evaluation pipelines rely heavily on human …
- Evidential Reasoning Advances Interpretable Real-World Disease ScreeningChenyu Lian, Hong-Yu Zhou, Jing Qin · arXiv · May 14, 2026
Disease screening is critical for early detection and timely intervention in clinical practice. However, most current screening models for medical images suffer from limited interpretability and suboptimal performance. They often lack effec…
- Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional ImagesYuangong Chen, Wai Keung Wong, Jiaxing Li, Ioannis Patras et al. · arXiv · May 12, 2026
Multimodal Large Language Models (MLLMs) show strong visual perception, yet remain limited in reasoning about space under changing viewpoints. We study this challenge as Perspective-Conditioned Spatial Reasoning (PCSR) in 360-degree omnidir…
- Polyendocrine metabolic ovarian syndrome, the new name for polycystic ovary syndrome: a multistep global consensus processHelena J Teede, Mahnaz Bahri Khomami, Rachel Morman, Joop S E Laven et al. · The Lancet · May 12, 2026
Polyendocrine metabolic ovarian syndrome (PMOS), previously named polycystic ovary syndrome (PCOS), affects one in eight women. However, the term PCOS is inaccurate, implying pathological ovarian cysts, obscuring diverse endocrine and metab…
- Counterfactual Stress Testing for Image Classification ModelsMoritz Stammel, Fabio De Sousa Ribeiro, Raghav Mehta, Mélanie Roschewitz et al. · arXiv · May 11, 2026
Deep learning models in medical imaging often fail when deployed in new clinical environments due to distribution shifts in demographics, scanner hardware, or acquisition protocols. A central challenge is underspecification, where models wi…
- Geometry-aware Prototype Learning for Cross-domain Few-shot Medical Image SegmentationFeifan Song, Yuntian Bo, Haofeng Zhang · arXiv · May 11, 2026
Cross-domain few-shot medical image segmentation (CD-FSMIS) requires a model to generalise simultaneously to novel anatomical categories and unseen imaging domains from only a handful of annotated examples. Existing prototypical approaches …
- Verification Mirage: Mapping the Reliability Boundary of Self-Verification in Medical VQARuinan Jin, Beidi Zhao, Myeongkyun Kang, Qiong Zhang et al. · arXiv · May 11, 2026
Self-verification, re-invoking the same vision language model (VLM) in a fresh context to check its own generated answer, is increasingly used as a default safety layer for medical visual question answering (VQA). We argue that this practic…
- PET-Adapter: Test-Time Domain Adaptation for Full and Limited-Angle PET Image ReconstructionRüveyda Yilmaz, Yuli Wu, Johannes Stegmaier, Volkmar Schulz · arXiv · May 8, 2026
Positron Emission Tomography (PET) image reconstruction is inherently challenged by Poisson noise and physical degradation factors, which are further exacerbated in limited-angle acquisitions. While deep learning methods demonstrate promisi…
- Uncertainty Quantification for Cardiac Shape Reconstruction with Deep Signed Distance Functions via MCMC methodsJan Verhülsdonk, Thomas Grandits, Francisco Sahli Costabal, Thomas Beiert et al. · arXiv · May 8, 2026
Atlas-based approaches allow high-quality, patient-specific shape reconstructions of cardiac anatomy from sparse and/or noisy data such as point clouds. However, these methods are mainly prior-driven, so the impact of uncertainty can be lar…
- TimeLesSeg: Unified Contrast-Agnostic Cross-Sectional and Longitudinal MS Lesion Segmentation via a Stochastic Generative ModelVicent Caselles-Ballester, Eloy Martínez-Heras, Giuseppe Pontillo, Zoe Mendelsohn et al. · arXiv · May 8, 2026
Multiple sclerosis (MS) expresses substantial clinical and radiological heterogeneity, which poses significant challenges for automatic lesion segmentation. The current deep learning-based SOTA is highly susceptible to changes in both distr…
- MedHorizon: Towards Long-context Medical Video Understanding in the WildBodong Du, Bowen Liu, Yang Yu, Xinpeng Ding et al. · arXiv · May 7, 2026
Medical multimodal large language models (MLLMs) have advanced image understanding and short-video analysis, but real clinical review often requires full-procedure video understanding. Unlike general long videos, medical procedures contain …
- 3D MRI Image Pretraining via Controllable 2D Slice Navigation TaskYu Wang, Qingchao Chen · arXiv · May 7, 2026
Self-supervised pretraining has become the mainstream approach for learning MRI representations from unlabeled scans. However, most existing objectives still treat each scan primarily as static aggregations of slices, patches or volumes. We…