Latest Statistical ML Research Papers
The newest Statistical ML papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Statistical ML so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Statistical ML papers in your inbox — free →Recent papers
- Itô maps for any-step SDEsZhengkai Pan, Peter Potaptchik, Wenxi Yao, Michael S. Albergo et al. · arXiv · Jun 9, 2026
Recent one-step generative models accelerate sampling by learning deterministic flow maps of the underlying dynamics. These methods rely on learning from ordinary differential equations, leaving open how to define an exact distillation proc…
- Data assimilation for subsurface flow using latent diffusion model parameterization: performance of ensemble-Kalman and Monte Carlo techniquesGuido Di Federico, Wenchao Teng, Louis J. Durlofsky · arXiv · Jun 9, 2026
Data assimilation (DA) in subsurface flow entails calibrating model parameters to match observed data, typically at wells, while preserving geological realism. Latent diffusion models (LDMs) provide efficient mappings from high-dimensional …
- Conformal Prediction for Dyadic Regression Under Complex MissingnessRobert Lunde, Minjie Yang, Elizaveta Levina, Ji Zhu · arXiv · Jun 9, 2026
We develop a framework for conformal prediction in dyadic regression problems under complex missingness mechanisms. At the theoretical level, we establish super-uniformity of conformal prediction under distributional invariance conditions w…
- Data-Driven Dynamic Assortment in Online Platforms: Learning about Two SidesRahul Roy, Nur Sunar, Jayashankar M. Swaminathan · arXiv · Jun 9, 2026
We study a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers in a discrete-time setting. In each period, a customer arrives seeking service, and the platform chooses an assort…
- Limitations of Learning Tanh Neural Networks with Finite PrecisionPhilipp Grohs, Matěj Trödler · arXiv · Jun 9, 2026
We investigate limitations of learning $\tanh$ neural networks from point evaluations under finite-precision computations and $L^p$ accuracy guarantees, building on Berner, Grohs, and Voigtländer (2023). Our approach is based on a novel con…
- Flexible Kernels for Protein Property PredictionMartin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward et al. · arXiv · Jun 9, 2026
Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence k…
- Generalized Conformal Predictive Systems Under Distributional ShiftsJef Jonkers, Johanna Ziegel · arXiv · Jun 9, 2026
Conformal predictive systems (CPS) output calibrated bands of CDFs under exchangeability. We extend generalized CPS to non-exchangeable settings by encoding distributional shifts through observation-specific permutation weights. This yields…
- Express Language ModelingAlbert Gong, Annabelle Michael Carrell, Raaz Dwivedi, Lester Mackey · arXiv · Jun 9, 2026
We introduce a new tool, Express, for converting a non-causal attention approximation into a causal approximation with matching approximation guarantees. When combined with the state-of-the-art Thinformer approximation, Express improves upo…
- Range Penalization: Theoretical Insights with Applications in Federated LearningYiyuan She, Zhaojun Hu, Yifan Sun · arXiv · Jun 9, 2026
This paper introduces range regularization for federated learning with linear systematic components to enhance statistical accuracy and induce cross-client regularity conducive to quantization, coding, and resource efficiency. Our approach …
- Conservation Laws from Data Symmetry in Neural NetworksJakob Galley, Vahid Shahverdi, Axel Flinth · arXiv · Jun 9, 2026
We explore whether intrinsic symmetries of the training data lead to conserved quantities during gradient-flow training of neural networks. Under the assumption that the loss function is analytic and non-polynomial, we prove that data symme…
- Human-AI Teaming Through the Lens of CalibrationEric Nalisnick, Chi Zhang, Sophia Qian, Yixin Wang · arXiv · Jun 9, 2026
We study models for human-AI teaming through the lens of statistical calibration. We assume the team consists of an AI model and human -- both of which are calibrated with respect to some partitioning of the feature space -- and expose how …
- SPACR: Single-Pass Adaptive Training of Uncertainty-Aware Conformal RegressorsSoundouss Messoudi, Sylvain Rousseau, Sébastien Destercke · arXiv · Jun 9, 2026
Conformal Prediction (CP) provides robust uncertainty guarantees for predictive models, but is typically applied post hoc, which misaligns model training with the conformal goal of producing efficient (i.e, narrow) intervals. We propose SPA…
- Deterministic Denominator Design for Localized Tamed Stochastic-Gradient Langevin DynamicsYiwei Zhou, Ziheng Chen · arXiv · Jun 9, 2026
Tamed stochastic-gradient Langevin dynamics (SGLD) stabilizes large drifts by adding a denominator to the update. If this denominator uses the same stochastic-gradient sample as the update step, it can also change the conditional mean drift…
- Advancing the State-of-the-Art in Empirical Privacy AuditingNicole Mitchell, Galen Andrew, Arun Ganesh, Brendan McMahan et al. · arXiv · Jun 9, 2026
Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical privacy auditing (EPA) quantifies this risk by measuring realistic data leakage on membership in…
- A Mean-Field Analysis of Multi-Head Self-Attention under Cross-Entropy TrainingCheng Huan, Hongfwei Yuan · arXiv · Jun 9, 2026
This paper develops a mean-field theory for a simplified single-layer causal multi-head self-attention model trained by cross-entropy minimization. Each attention head is treated as a particle in parameter space, and the empirical law of th…
- Near-Exponential Convergence Rates for kNN Classification based on Boltzmann MarginLuyuan Yang, Shayan Shafaei, Chao Lan · arXiv · Jun 9, 2026
Convergence-rate analysis for classifiers is often conducted under either Tsybakov margin or Massart margin. The former is a relatively weak condition that typically yields polynomial rates, while the latter is substantially stronger but ca…
- Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual NetworksParviz Haggi-Mani, Irina Rish · arXiv · Jun 9, 2026
The analogy between deep neural network forward passes and renormalization group (RG) flows has been repeatedly noted in the literature, but existing treatments remain qualitative: depth is described as a coarse-graining scale, attention is…
- $k$-Nearest Neighbors in Gromov--Wasserstein SpaceKaitlyn Hohmeier, Nicolas Fraiman, Caroline Moosmueller · arXiv · Jun 9, 2026
The Gromov--Wasserstein (GW) distance provides a framework for comparing metric measure spaces, regardless of their underlying structure or geometry. For network-based data, it enables direct comparisons of graphs with different numbers of …
- Intrinsic Footpoint-invariant Riemannian Cross-covarianceCarlos Soto, Cheng Wang, Zipan Huang, Xiaoyu Chen · arXiv · Jun 8, 2026
Covariance estimation yields a fundamental second-order statistic underlying representation learning, dimension reduction, and dependence modeling. While covariance has been well understood in Euclidean spaces, it is ill-defined for random …
- Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming AdvertisingPrashant Shekhar, Caroline Howard · arXiv · Jun 8, 2026
We develop a decision-calibrated conformal framework for pacing decisions in streaming advertising. Pacing depends on uncertain future inventory, demand pressure, incremental response, and member-experience load. Instead of calibrating a ge…
- Robust Active Learning for Few-Shot Example Selection in Text-to-SQLArash Pourhabib · arXiv · Jun 8, 2026
Few-shot example retrieval is the dominant paradigm for grounding large language models (LLMs) in domain-specific text-to-SQL systems. However, the quality of the annotated example bank directly governs system accuracy, and expert annotatio…
- Weighted universal approximation of differentiable maps on infinite-dimensional manifoldsPhilipp Schmocker, Josef Teichmann · arXiv · Jun 8, 2026
We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the derivatives. A FNN maps the input from a possibly infinite-dimensional weighted manifo…
- Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context DriftsUdvas Das, Waris Radji, Debabrota Basu, Odalric-Ambrym Maillard · arXiv · Jun 8, 2026
We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions …
- In-Context Learning for Latent Space Bayesian OptimizationTuan A. Vu, Harri Lähdesmäki, Julien Martinelli · arXiv · Jun 8, 2026
Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN a…
- On Choosing the $μ$ Parameter in Gaussian Differential PrivacyBogdan Kulynych, Antti Honkela · arXiv · Jun 8, 2026
Recent work argues for using Gaussian differential privacy (GDP) to report the privacy guarantees in privacy-preserving machine learning. We provide principled mappings from pure-DP $\varepsilon$ to GDP $μ$ by matching the worst-case succes…
- Report the Floor: A Training-Free Conformal Interval Is a Mandatory Baseline for Probabilistic Time-Series ForecastingValery Manokhin · arXiv · Jun 8, 2026
Probabilistic forecasters are increasingly learned, yet the baselines they are compared against are often weak or omitted. We show that the simplest possible conformal interval - a last-value point forecast wrapped in a finite-sample split-…
- SAILS: Surrogate-based Analysis of Interactions via Local Effect SmoothsTimo Heiß, Julia Herbinger, Bernd Bischl, Giuseppe Casalicchio · arXiv · Jun 8, 2026
Feature interactions drive much of the predictive power of machine learning models, yet existing explanation methods only detect and quantify interactions without revealing their functional form, or visualize only restricted interaction typ…
- BSTabDiff: Block-Subunit Diffusion Priors for High-Dimensional Tabular Data GenerationAl Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Gyawali, Gianfranco Doretto et al. · arXiv · Jun 8, 2026
High-Dimensional Low-Sample Size (HDLSS) tabular domains (e.g., omics) are characterized by $n \ll m$, where $n$ = number of samples, and $m$ = number of features. Such domains often exhibit strong local correlation groups, sparse cross-gro…
- Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian RewardsJoel Q. L. Chang · arXiv · Jun 8, 2026
We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing i…
- INFUSER: Influence-Guided Self-Evolution Improves ReasoningSiyu Chen, Miao Lu, Beining Wu, Heejune Sheen et al. · arXiv · Jun 8, 2026
Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with only minimal external supervision. Yet existing methods either depend on extensively curated or teacher-generated training data, o…