Latest Self-Supervised Learning Research Papers
The newest Self-Supervised Learning papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Self-Supervised Learning so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Self-Supervised Learning papers in your inbox — free →Recent papers
- OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinibAbhijoy Sarkar, Aarchi Singh Thakur · arXiv · Jun 9, 2026
Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computatio…
- Perturbative Contrastive Physical LearningKyungeun Kim, Amanuel Anteneh, Israel Klich, Olivier Pfister et al. · arXiv · Jun 8, 2026
Responses to perturbations are key to understanding physical systems. The ability to contrast such responses by comparing how a system reacts under slightly different conditions provides a mechanism for learning. Here, we introduce Perturba…
- Evaluating the Representation Space of Diffusion Models via Self-Supervised PrinciplesXiao Li, Yixuan Jia, Zekai Zhang, Xiang Li et al. · arXiv · Jun 8, 2026
Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connection between these two abilities remains less explored. Drawing inspiration from …
- Beyond Binary: Speech Representations Across the Cognitive Score HierarchySerli Kopar, Roshan Prakash Rane, Christian Mychajliw, Lydia Federmann et al. · arXiv · May 26, 2026
This study examines the relationship between speech representations and the hierarchical structure of cognitive assessment in mild cognitive impairment. Utilizing 5,754 German neuropsychological assessment recordings, we evaluate six cognit…
- FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object SegmentationZihui Zhang, Zhixuan Sun, Yafei Yang, Jinxi Li et al. · arXiv · May 26, 2026
We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primaril…
- Self-supervised Point Cloud Mining for Surface Anomaly Detection in Additive ManufacturingHao Wang, Yujing Yang, Chen Kan · Journal of Computing and In... · May 8, 2026
Abstract With rapid advances in 3-dimensional (3D) metrology, point cloud data are increasingly available for surface quality inspection in additive manufacturing (AM). Compared to images, point clouds capture richer geometric information f…
- Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision ModelsRonaldo Canizales, Divya Gopinath, Corina Păsăreanu, Ravi Mangal · arXiv · May 7, 2026
*Concept-based explanations* offer a promising approach for explaining the predictions of deep neural networks in terms of high-level, human-understandable concepts. However, existing methods either do not establish a causal connection betw…
- Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative RegimeTianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian et al. · arXiv · May 6, 2026
SWE-bench-style agentic reinforcement learning relies on expensive stateful trajectories, yet substantial compute is wasted on sampled rollout groups with skewed pass rates, where binary rewards provide a weak contrastive signal. We frame t…
- PHALAR: Phasors for Learned Musical Audio RepresentationsDavide Marincione, Michele Mancusi, Giorgio Strano, Luca Cerovaz et al. · arXiv · May 5, 2026
Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive framework achieving a relative accuracy increas…
- Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech ModelsSandra Arcos-Holzinger, Sarah M. Erfani, James Bailey, Sanjeev Khudanpur · arXiv · May 4, 2026
Self-supervised speech models (S3Ms) achieve strong downstream performance, yet their learned representations remain poorly understood under natural and adversarial perturbations. Prior studies rely on representation similarity or global di…
- Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language ModelsGongbo Zhang, Wen Wang, Ye Tian, Li Yuan · arXiv · Apr 29, 2026
Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference…
- Beyond Patient Invariance: Learning Cardiac Dynamics via Action-Conditioned JEPAsJose Geraldo Fernandes, Luiz Facury, Pedro Robles Dutenhefner, Wagner Meira · arXiv · Apr 24, 2026
Self-supervised learning in healthcare has largely relied on invariance-based objectives, which maximize similarity between different views of the same patient. While effective for static anatomy, this paradigm is fundamentally misaligned w…
- VLA Foundry: A Unified Framework for Training Vision-Language-Action ModelsJean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang et al. · arXiv · Apr 21, 2026
We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines…
- Disentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage IdentificationXudong Jian, Charikleia Stoura, Simon Scandella, Eleni Chatzi · arXiv · Apr 21, 2026
Damage identification is a core task in structural health monitoring. In practice, however, its reliability is often compromised by confounding non-damage effects, such as variations in excitation and environmental conditions, which can ind…
- Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced RegularizationHabibeh Naderi, Behrouz Haji Soleimani, Stan Matwin · arXiv · Apr 17, 2026
We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for learning document-level audio-text representations from long, segmented sequences in low-res…
- Assessing the Potential of Masked Autoencoder Foundation Models in Predicting Downhole Metrics from Surface Drilling DataAleksander Berezowski, Hassan Hassanzadeh, Gouri Ginde · arXiv · Apr 16, 2026
Oil and gas drilling operations generate extensive time-series data from surface sensors, yet accurate real-time prediction of critical downhole metrics remains challenging due to the scarcity of labelled downhole measurements. This systema…
- HorusEye: a self-supervised foundation model for generalizable X-ray tomography restorationYuetan Chu, Longxi Zhou, Gongning Luo, Kai Kang et al. · Nature Computational Science · Mar 27, 2026
X-ray tomography is widely used across scientific and clinical domains, yet image degradation remains a major obstacle to reliable analysis, particularly under low-dose or data-scarce conditions. Existing restoration methods are typically d…
- On the Alignment Between Supervised and Self-Supervised Contrastive LearningAchleshwar Luthra, Priyadarsi Mishra, Tomer Galanti · ICLR 2026 Poster · Jan 26, 2026
Self-supervised contrastive learning (CL) has achieved remarkable empirical success, often producing representations that rival supervised pre-training on downstream tasks. Recent theory explains this by showing that the CL loss closely app…
- LeJEPA: Provable and Scalable Self-Supervised Learning Without the HeuristicsRandall Balestriero, Yann LeCun · arXiv.org · Nov 11, 2025
Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a…
- Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial RepresentationsYujia Zhang, Xiaoyang Wu, Yixing Lao, Chengyao Wang et al. · arXiv.org · Oct 27, 2025
Humans learn abstract concepts through multisensory synergy, and once formed, such representations can often be recalled from a single modality. Inspired by this principle, we introduce Concerto, a minimalist simulation of human concept lea…
- SSNet: Flexible and Robust Channel Extrapolation for Fluid Antenna Systems Enabled by a Self-Supervised Learning FrameworkYuan Gao, Yiming Liu, Runze Yu, Shengli Liu et al. · IEEE Journal on Selected Areas in Communications · Sep 22, 2025
Fluid antenna systems (FAS) signify a pivotal advancement in 6G communication by enhancing spectral efficiency and robustness. However, obtaining accurate channel state information (CSI) in FAS poses challenges due to its complex physical s…
- Self-supervised learning in drug discoveryYangyang Chen, Zixu Wang, Jianmin Wang, Yanyi Chu et al. · Science China Information Sciences · Jun 23, 2025
- Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMSRoman Bushuiev, Anton Bushuiev, Raman Samusevich, Corinna Brungs et al. · Nature Biotechnology · May 23, 2025
Characterizing biological and environmental samples at a molecular level primarily uses tandem mass spectroscopy (MS/MS), yet the interpretation of tandem mass spectra from untargeted metabolomics experiments remains a challenge. Existing c…
- Joint Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction for Self Supervised LearningHugues van Assel, Mark Ibrahim, Tommaso Biancalani, Aviv Regev et al. · arXiv.org · May 18, 2025
Reconstruction and joint embedding have emerged as two leading paradigms in Self Supervised Learning (SSL). Reconstruction methods focus on recovering the original sample from a different view in input space. On the other hand, joint embedd…
- Physics-driven self-supervised learning for fast high-resolution robust 3D reconstruction of light-field microscopyZhi Lu, Manchang Jin, Shuai Chen, Xiaoge Wang et al. · Nature Methods · May 12, 2025
Light-field microscopy (LFM) and its variants have significantly advanced intravital high-speed 3D imaging. However, their practical applications remain limited due to trade-offs among processing speed, fidelity, and generalization in exist…
- Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3DSergio Arnaud, Paul Mcvay, Ada Martin, Arjun Majumdar et al. · International Conference on Machine Learning · Apr 19, 2025
We present LOCATE 3D, a model for localizing objects in 3D scenes from referring expressions like"the small coffee table between the sofa and the lamp."LOCATE 3D sets a new state-of-the-art on standard referential grounding benchmarks and s…
- MENTOR: Multi-level Self-supervised Learning for Multimodal RecommendationJinfeng Xu, Zheyu Chen, Shuo Yang, Jinze Li et al. · AAAI Conference on Artificial Intelligence · Apr 11, 2025
As multimedia information proliferates, multimodal recommendation systems have garnered significant attention. These systems leverage multimodal information to alleviate the data sparsity issue inherent in recommendation systems, thereby en…
- Self-supervised learning for vehicle bearing fault diagnosis based on time–frequency dual-domain contrast and fusionDeqiang He, Yuan Xu, Haimeng Sun, Zhenzhen Jin et al. · Nonlinear dynamics · Apr 4, 2025
- Joint Supervised and Self-supervised Learning for MRI ReconstructionGeorge Yiasemis, Nikita Moriakov, Clara I. Sánchez, Jan-Jakob Sonke et al. · MIDL 2025 Poster · Mar 27, 2025
Magnetic Resonance Imaging (MRI) is a crucial modality but, its inherently slow acquisition process poses challenges in obtaining fully-sampled $k$-space data under motion. The lack of fully-sampled acquisitions, serving as ground truths, c…
- Sonata: Self-Supervised Learning of Reliable Point RepresentationsXiaoyang Wu, Daniel DeTone, Duncan P. Frost, Tianwei Shen et al. · Computer Vision and Pattern Recognition · Mar 20, 2025
In this paper, we question whether we have a reliable self-supervised point cloud model that can be used for diverse 3D tasks via simple linear probing, even with limited data and minimal computation. We find that existing 3D self-supervise…