Latest Image Classification Research Papers
The newest Image Classification papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Image Classification so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Image Classification papers in your inbox — free →Recent papers
- Adversarial Attack and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic SegmentationLucas Görnhardt, Timo Bartels, Niklas Schwarz, Tim Fingscheidt · arXiv · Jun 8, 2026
Conventional one-hot encodings often yield poorly calibrated models, being overconfident under attack, and letting entropy-based detection algorithms fail. Previous image classification works have demonstrated that Hadamard-coded output rep…
- ToolFG: Towards Well-Grounded Fine-Grained Image ClassificationYu Xue, Haoxuan Qu, Zhuoling Li, Yihang Lou et al. · arXiv · Jun 1, 2026
Fine-grained image classification (FGIC) has broad applications and has attracted significant research attention. In this paper, we explore a novel paradigm for solving FGIC by proposing \textbf{ToolFG}, the first tool-integrated MLLM-based…
- BiasEdit: A Training-Free Bias-Detect-and-Edit Framework for Learning Fair Visual ClassifiersJungwook Seo, Yoonsik Park, Changmin Lee, Sungyong Baik · arXiv · May 27, 2026
Visual data from the Web power image classifiers, which often underpin many web services, such as recommendation and content moderation. However, the raw Web data often contain spurious correlations and social biases, and neural networks ar…
- EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric VideosRuiping Liu, Junwei Zheng, Yufan Chen, Di Wen et al. · arXiv · May 18, 2026
Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchm…
- A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image RecognitionEdwin Arkel Rios, Augusto Christian Surya, Oswin Gosal, Fernando Mikael et al. · arXiv · May 18, 2026
Prior work on fine-grained image recognition (FGIR) has established the importance of the backbone selection, but has neglected the accuracy-vs-cost trade-offs under different training and evaluation settings. In this work we conduct a larg…
- CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and BenchmarkWei Wang, Yuqian Yuan, Tianwei Lin, Wenqiao Zhang et al. · arXiv · May 18, 2026
Spatial intelligence requires multimodal large language models (MLLMs) to move beyond single-view perception and reason consistently about objects, visibility, geometry, and interactions across multiple viewpoints. However, progress in cros…
- Counterfactual Stress Testing for Image Classification ModelsMoritz Stammel, Fabio De Sousa Ribeiro, Raghav Mehta, Mélanie Roschewitz et al. · arXiv · May 11, 2026
Deep learning models in medical imaging often fail when deployed in new clinical environments due to distribution shifts in demographics, scanner hardware, or acquisition protocols. A central challenge is underspecification, where models wi…
- Empirical Evidence for Simply Connected Decision Regions in Image ClassifiersArjhun Swaminathan, Mete Akgün · arXiv · May 7, 2026
Understanding the topology of decision regions is central to explaining the inner workings of deep neural networks. Prior empirical work has provided evidence that these regions are path connected. We study a stronger topological question: …
- A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive WarpingMaxim V. Shugaev, Md Reshad Ul Hoque, Bridget Kennedy, Joseph T. Riley et al. · arXiv · May 6, 2026
Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no exis…
- Attention-Based Chaotic Self-Supervision for Medical Image ClassificationJoao Batista Florindo, Amanda Pontes de Oliveira Ornelas · arXiv · May 6, 2026
Deep learning models for medical image classification usually achieve promising results but typically rely on large, annotated datasets or standard transfer learning from ImageNet. Self-Supervised Learning (SSL) has emerged as a powerful al…
- A Robust Unsupervised Domain Adaptation Framework for Medical Image Classification Using RKHS-MMDSapna Sachan, Rakesh Kumar Sanodiya, Amulya Kumar Mahto · arXiv · May 5, 2026
Labeling medical images is a major bottleneck in the field of medical imaging, as it requires domain-specific expertise, and it gets further complicated due to variability across different medical centers and different imaging devices. Such…
- Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data AugmentationChenyu Hui, Xiaodi Huang, Siyu Xu, Yunke Wang et al. · arXiv · May 4, 2026
Vision-language-action (VLA) models typically rely on large-scale real-world videos, whereas simulated data, despite being inexpensive and highly parallelizable to collect, often suffers from a substantial visual domain gap and limited envi…
- Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary EnsemblesMinh-Khoa Le-Phan, Minh-Hoang Le, Trong-Le Do, Minh-Triet Tran · arXiv · Apr 28, 2026
Current deepfake detection models achieve state-of-the-art performance on pristine academic datasets but suffer severe spatial attention drift under real-world compound degradations, such as blurring and severe lossy compression. To address…
- Quantum-Inspired Robust and Scalable SAR Object ClassificationMaximilian Scharf, Marco Trenti, Felix Bock, Padraig Davidson et al. · arXiv · Apr 28, 2026
SAR image classification naturally has to deal with huge noise and a high dynamic range particularly requiring robust classification models. Additionally, the deployment of these models on edge devices, such as drones and military aircraft,…
- SARU: A Shadow-Aware and Removal Unified Framework for Remote Sensing Images with New BenchmarksZi-Yang Bo, Wei Lu, Hongruixuan Chen, Si-Bao Chen et al. · arXiv · Apr 28, 2026
Shadows are a prevalent problem in remote sensing imagery (RSI), degrading visual quality and severely limiting the performance of downstream tasks like object detection and semantic segmentation. Most prior works treat shadow detection and…
- Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer LearningYonatan Haile Medhanie, Yuanhua Ni · arXiv · Apr 22, 2026
Transformer-based OCR models have shown strong performance on Latin and CJK scripts, but their application to African syllabic writing systems remains limited. We present the first adaptation of TrOCR for printed Tigrinya using the Ge'ez sc…
- RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N RankingRoie Kazoom, Yotam Gigi, George Leifman, Tomer Shekel et al. · arXiv · Apr 22, 2026
Traditional change detection identifies where changes occur, but does not explain what changed in natural language. Existing remote sensing change captioning datasets typically describe overall image-level differences, leaving fine-grained …
- VLA Foundry: A Unified Framework for Training Vision-Language-Action ModelsJean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang et al. · arXiv · Apr 21, 2026
We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines…
- Informative Data Reweighting for Image ClassificationYancheng Wang, Ping Li, Alvin C Silva, Teresa Wu et al. · ICLR 2026 DeLTa Workshop Poster · Mar 3, 2026
Deep Neural Networks (DNNs) have achieved remarkable success in image classification tasks. However, their training typically requires large-scale, high-quality labeled datasets, which may be scarce or infeasible to obtain in certain comput…
- DAL: Dynamic Angular Loss for Imbalanced Medical Image ClassificationSalman Mohammad, Furkan Kasım, Manuel Günther · Submitted to MIDL 2026 · Nov 28, 2025
Class imbalance remains a major obstacle in medical image classification, where rare but clinically important classes are often overshadowed by majority categories. Despite current strategies for reducing bias, including angular-margin los…
- A review of hyperspectral image classification based on graph neural networksXiaofeng Zhao, Junyi Ma, Lei Wang, Zhili Zhang et al. · Artificial Intelligence Review · Mar 17, 2025
Hyperspectral images provide rich spectral-spatial information but pose significant classification challenges due to high dimensionality, noise, mixed pixels, and limited labeled samples. Graph Neural Networks (GNNs) have emerged as a promi…
- Improving pneumonia diagnosis with high-accuracy CNN-Based chest X-ray image classification and integrated gradientJalal Rabbah, Mohammed Ridouani, L. Hassouni · Biomedical Signal Processing and Control · Mar 1, 2025
- AI-Powered Lung Cancer Detection: Assessing VGG16 and CNN Architectures for CT Scan Image ClassificationRapeepat Klangbunrueang, Pongsathon Pookduang, Wirapong Chansanam, Tassanee Lunrasri · Informatics · Feb 11, 2025
Lung cancer is a leading cause of mortality worldwide, and early detection is crucial in improving treatment outcomes and reducing death rates. However, diagnosing medical images, such as Computed Tomography scans (CT scans), is complex and…
- Emerging Developments in Real-Time Edge AIoT for Agricultural Image ClassificationM. Pintus, Felice Colucci, Fabio Maggio · IoT · Feb 10, 2025
Advances in deep learning (DL) models and next-generation edge devices enable real-time image classification, driving a transition from the traditional, purely cloud-centric IoT approach to edge-based AIoT, with cloud resources reserved for…
- CNN-Transformer and Channel-Spatial Attention based network for hyperspectral image classification with few samplesChuan Fu, Tianyuan Zhou, Tan Guo, Qikui Zhu et al. · Neural Networks · Feb 1, 2025
Hyperspectral image classification is an important foundational technology in the field of Earth observation and remote sensing. In recent years, deep learning has achieved a series of remarkable achievements in this area. These deep learni…
- Hyperspectral Image Classification via Cascaded Spatial Cross-Attention NetworkBo Zhang, Yaxiong Chen, Shengwu Xiong, Xiaoqiang Lu · IEEE Transactions on Image Processing · Jan 29, 2025
In hyperspectral images (HSIs), different land cover (LC) classes have distinct reflective characteristics at various wavelengths. Therefore, relying on only a few bands to distinguish all LC classes often leads to information loss, resulti…
- SPECIAL: Zero-shot Hyperspectral Image Classification With CLIPLi Pang, Jing Yao, Kaiyu Li, Xiangyong Cao · arXiv.org · Jan 27, 2025
Hyperspectral image (HSI) classification aims to categorize each pixel in an HSI into a specific land cover class, which is crucial for applications such as remote sensing, environmental monitoring, and agriculture. Although deep learning-b…
- Vision Transformers for Image Classification: A Comparative SurveyYaoli Wang, Yaojun Deng, Yuanjin Zheng, Pratik Chattopadhyay et al. · Technologies · Jan 12, 2025
Transformers were initially introduced for natural language processing, leveraging the self-attention mechanism. They require minimal inductive biases in their design and can function effectively as set-based architectures. Additionally, tr…
- MambaHSI: Spatial–Spectral Mamba for Hyperspectral Image ClassificationYapeng Li, Yong Luo, Lefei Zhang, Zengmao Wang et al. · IEEE Transactions on Geoscience and Remote Sensing · Jan 9, 2025
Transformer has been extensively explored for hyperspectral image (HSI) classification. However, transformer poses challenges in terms of speed and memory usage because of its quadratic computational complexity. Recently, the Mamba model ha…
- Ensemble genetic and CNN model-based image classification by enhancing hyperparameter tuningWajahat Hussain, Muhammad Faheem Mushtaq, Mobeen Shahroz, Urooj Akram et al. · Scientific Reports · Jan 6, 2025
Model optimization is a problem of great concern and challenge for developing an image classification model. In image classification, selecting the appropriate hyperparameters can substantially boost the model’s ability to learn intricate p…