Latest Image Segmentation Research Papers
The newest Image Segmentation papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Image Segmentation so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Image Segmentation papers in your inbox — free →Recent papers
- FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language modelMahmood Alzubaidi, Uzair Shah, Raden Muaz, Ines Abbes et al. · arXiv · Jun 9, 2026
A global shortage of trained sonographers limits prenatal ultrasound screening in low- and middle-income countries, where over half of pregnant women receive no skilled sonography. Current deep learning approaches address detection, segment…
- An Uncertainty Estimation Framework for Dose Accumulation in Adaptive Radiotherapy: Application to CBCT-Guided Radiotherapy for Cervical CancerCedric Hemon, Delphine Lebret, Jean-Claude Nunes, Valentin Boussot et al. · arXiv · Jun 9, 2026
Background and purpose: oART enables daily plan adaptation to interfraction anatomical variations, but cumulative dose estimation remains limited by DIR, segmentation, and anatomical uncertainties. We introduce IMPACT-DoseAcc, an uncertaint…
- IPSM-Bench: A New Intermediate Phase Segmentation Benchmark in Microstructure Images of Zinc-Based Absorbable BiomaterialsJinglin Xu, Shangyan Zhao, Jiabo Wang, Xinghong Mu et al. · arXiv · Jun 9, 2026
Zinc-based alloys are indispensable emerging absorbable metallic biomaterials, and their macroscopic performance is governed by microstructural characteristics. Intermediate phases-key microstructural constituents-are pivotal in regulating …
- Echo-Memory: A Controlled Study of Memory in Action World ModelsWayne King, Zeyue Xue, Yuxuan Bian, Jie Huang et al. · arXiv · Jun 8, 2026
We present \textbf{Echo-Memory}, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure i…
- Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher SupervisionMateo Diaz-Bone, Daniel Caraballo, Florian Scheidegger, Thomas Frick et al. · arXiv · Jun 8, 2026
Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods face challenges when foundational assumptions - such as consistent object scale, …
- Adversarial Attack and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic SegmentationLucas Görnhardt, Timo Bartels, Niklas Schwarz, Tim Fingscheidt · arXiv · Jun 8, 2026
Conventional one-hot encodings often yield poorly calibrated models, being overconfident under attack, and letting entropy-based detection algorithms fail. Previous image classification works have demonstrated that Hadamard-coded output rep…
- Training-Free Generalized Few-Shot Segmentation through Open-Vocabulary Semantic ArbitrationSilas Kwabla Gah, Ebenezer Owusu · arXiv · Jun 8, 2026
Generalized Few-Shot Semantic Segmentation (GFSS) has traditionally been approached as a representation-learning problem, requiring task-specific adaptation to incorporate novel classes from limited support examples. Recent foundation model…
- vesselFM-CT: Segmenting All Blood Vessels in CT Images for System-Level Cardiovascular AnalysisBastian Wittmann, Chinmay Prabhakar, Suprosanna Shit, Bjoern Menze · arXiv · Jun 8, 2026
The vascular network in the human body is characterized by blood vessels exhibiting drastic structural variations in radius, length, topological properties, and branching patterns. This heterogeneity, together with location-specific anatomi…
- Reason Twice: Segmentation via Candidate Discovery and Comparative ReasoningXinyan Gao, Haoran Hao, Xiangyu Yue · arXiv · Jun 8, 2026
The rapid development of pretrained foundation models has enabled more general image segmentation. Multimodal large language models (MLLMs) have been widely explored for image segmentation with complex queries that require high-level reason…
- Mind the Gap: Disentangling Performance Bottlenecks in Video Instance SegmentationDanial Hamdi, Fardin Ayar, Mahdi Javanmardi · arXiv · Jun 5, 2026
In Video Instance Segmentation (VIS), classification, segmentation, and tracking objectives are jointly evaluated, but their individual contributions to performance loss remain opaque. We introduce a diagnostic framework that formulates ide…
- Geometric-Aware Hypergraph Reasoning for Novel Class Discovery in Point Cloud SegmentationZihao Zhang, Aming Wu, Yang Li, Yahong Han et al. · arXiv · Jun 5, 2026
Novel class discovery in point cloud segmentation aims to transfer knowledge from known classes to automatically identify and segment unlabeled novel classes in point clouds. Existing methods mainly rely on pairwise associations for class a…
- Detecting Temporally Localized Manipulations in Authentic Video StreamsOkan Umur, Ali Emre Güşlü, Ibrahim Delibasoglu · arXiv · Jun 5, 2026
The rapid advancement of video editing and generative artificial intelligence technologies has made realistic video manipulation increasingly accessible. Although existing datasets have significantly advanced research in deepfake detection,…
- PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene UnderstandingShaohui Dai, Yansong Qu, You Shen, Shengchuan Zhang et al. · arXiv · Jun 4, 2026
Recent advances in 3D multimodal large language models (3D-MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D-MLLMs remai…
- Comparison of Deep Learning Frameworks For Rice Disease Mapping From UAV Multispectral ImagingYadav Raj Ghimire, Jagrati Talreja, Tewodros Syum Gebre, Timothy Agboada et al. · arXiv · Jun 4, 2026
In this study, UAV multispectral imagery is used to segment the severity of bacterial leaf blight (BLB) in rice using convolutional neural networks (CNNs) and transformer-based models. The evaluated architectures include U-Net with a ResNet…
- Towards One-to-Many Temporal GroundingQi Xu, Yue Tan, Shihao Chen, Jiahao Meng et al. · arXiv · Jun 4, 2026
Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments f…
- SC-MFJ: A Simple Haptic Quality Metric for Medical Image SegmentationSouraj Adhikary, Negar Chabi, Andre Mastmeyer · arXiv · Jun 4, 2026
Standard segmentation metrics such as Dice and Hausdorff distance measure geometric overlap but say nothing about whether a segmented surface is suitable for haptic rendering in surgical simulation. We propose SC-MFJ (Surface-Constrained Me…
- MS-DKC: A Dataset Knowledge Card Framework for Designing and Adapting Medical Image Segmentation ModelsTariq M. Khan, Syed Saud Naqvi, Thantrira Porntaveetus, Hamid Alinejad-Rokny et al. · arXiv · Jun 4, 2026
Medical image segmentation is often framed as a search for stronger architectures, but this can obscure a more fundamental question: what does the dataset require from the model? In medical imaging, this requirement is shaped by foreground …
- Edge Prediction for Roof Wireframe Reconstruction with TransformersGustav Hanning, Ludvig Dillén, Jonathan Astermark, Johanna Lidholm et al. · arXiv · Jun 1, 2026
This paper presents a competitive solution to the S23DR Challenge 2026, which aims to reconstruct 3D house roof wireframe models from sparse SfM point clouds and ground-level semantic segmentations and depth maps. Our proposed method utiliz…
- Explainable Forensics of Manipulated Segments in Untrimmed Long VideosYue Feng, Jingjing Li, Qijia Lu, Wei Ji et al. · arXiv · Jun 1, 2026
The rapid advancement of AI-driven video generation has transformed content creation, while simultaneously increasing the risk of misinformation through localized manipulations in long-form videos. Existing video forensic methods predominan…
- GMOS: Grounding Moving Object Segmentation in 3D Space and TimeJunyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman · arXiv · May 28, 2026
Moving Object Segmentation (MOS) aims to discover, segment, and track objects that move independently of the camera. Current MOS methods, however, exhibit two fundamental limitations: they rely on pre-computed 2D auxiliary modalities such a…
- Cycle Consistency in Video Object-Centric LearningRongzhen Zhao, Zhiyuan Li, Ruonan Wei, Juho Kannala et al. · arXiv · May 28, 2026
Self-supervised video Object-Centric Learning (OCL) aims to discover distinct objects and associate them across time, whereas self-supervised Multi-Object Tracking (MOT) focuses on associating pre-defined object detections or segmentations.…
- xModel-KD: Cross-modal Knowledge Distillation for 3D Scene Perception using LiDARThenukan Pathmanathan, Kanchan Keisham, Thangarajah Akilan · arXiv · May 28, 2026
Point cloud segmentation is a fundamental task in 3D scene understanding. Its progress is constrained by the high cost and time required for dense 3D annotations, making labeled samples difficult to obtain. Beyond annotation scarcity, diffe…
- A Multiscale Kinetic Framework for Image Segmentation: From Particle Systems to Continuum ModelsHoracio Tettamanti, Giulia Guicciardi, Mattia Zanella · arXiv · May 27, 2026
In this work, we present a multiscale kinetic framework for consensus-based image segmentation. By interpreting an image as a system of interacting particles, each pixel is characterised by its spatial position and an internal feature encod…
- Resolution-free neural surrogates for geometric parameterization and mapping with spatially varying fieldsYanwen Huang, Lok Ming Lui, Gary P. T. Choi · arXiv · May 27, 2026
Many imaging problems require computing spatial transformations induced by spatially varying intensity, feature, or density fields. Canonical examples include distortion correction, deformable image registration, atlas-based segmentation, a…
- Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation ModelsCorentin Seutin, Mohamed Amine Ettaki, Michaël Clément, Pierrick Coupé et al. · arXiv · May 27, 2026
Vision-language segmentation models have recently achieved strong performance by leveraging high-level semantic object categories expressed in natural language. However, this semantic dependence limits their ability to reason about intrinsi…
- InstructSAM: Segment Any Instance with Any InstructionsYuqian Yuan, Wentong Li, Zhaocheng Li, Yutong Lin et al. · arXiv · May 25, 2026
In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulates instruction-driven instance segmentation as a set-structured query prediction …
- Pixel-Level Pavement Distress Assessment Using Instance SegmentationLogan Dewick, Bibesh Pyakurel, Kong Pheng Yang, Nazim Choudhury et al. · arXiv · May 25, 2026
Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for …
- A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and DeblurringAdina Scheinfeld, Haotan Zhang, Shang Mu, Rudolf L. M. van Herten et al. · arXiv · May 25, 2026
Light sheet fluorescence microscopy (LSM) enables high-resolution, three-dimensional (3D) imaging of biological specimens, providing rich volumetric data for studying cellular organization, pathology, and vascular networks. However, the siz…
- EchoPilot: Training-Free Ultrasound Video Segmentation via Scale-Space Semantic Prompting and Reliability-Gated MemoryRuiqiang Xiao, Zhaohu Xing, Yijun Yang, Zhenyan Han et al. · arXiv · May 25, 2026
Ultrasound video segmentation is clinically valuable yet difficult due to speckle noise, weak boundaries, and rapid anatomical deformation. Recent promptable foundation models enable point-guided segmentation, but their direct deployment in…
- AgentGrounder: Zero-Shot 3D Visual Pointcloud Grounding using Multimodal Language ModelsCuong Huynh, Maxim Popov, Denis Gridusov, Sergey Kolyubin · arXiv · May 25, 2026
3D Visual Grounding (3DVG) is an essential capability for embodied AI, requiring agents to localize objects in 3D scenes based on natural language descriptions. Recent zero-shot methods leverage 2D vision-language models (LVLMs). However, t…