Structured Data

Latest Tabular Data Research Papers

The newest Tabular Data papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Tabular Data so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Tabular Data papers in your inbox — free →

Recent papers

Learning Latent Structure in High-Dimensional Data via Geometry and Graphs
Haozhe Chen · Utah State Research and Sch... · Aug 1, 2026
Modern datasets often contain many measured variables for each observation, such as gene-expression levels, brain activity signals, or features in tabular data. These data are also often noisy, meaning that useful patterns are mixed with me…
Learning Latent Structure in High-Dimensional Data via Geometry and Graphs
Haozhe Chen · Utah State Research and Sch... · Aug 1, 2026
Modern datasets often contain many measured variables for each observation, such as gene-expression levels, brain activity signals, or features in tabular data. These data are also often noisy, meaning that useful patterns are mixed with me…
Application of artificial intelligence models in the identification of severe scrub typhus
Chunyu Chen, Xiangling Liu, Ce Yang, Chuncheng Ma et al. · Frontiers in Artificial Int... · Jul 22, 2026
This retrospective study enrolled 492 patients with scrub typhus in Jiangmen from 2013 to 2025. Clinical and laboratory data were analyzed using univariate logistic regression and LASSO regression to identify risk factors for severe illness…
Toward Auditable Fraud Detection: Combining Graph Features, Model Explanations, and Agentic Case Investigation
Rahil Sharma · arXiv · Jul 21, 2026
Fraud detection systems must scale with rising transaction volume while remaining explainable and reviewable. We study a layered pipeline on the PaySim dataset that combines a gradient-boosted classifier, graph-derived structural features, …
In-Context Time Series Classification with Random Convolutional Features
Joscha Cüppers, Jilles Vreeken · arXiv · Jul 21, 2026
Time series classification is central to domains like medical signal analysis, industrial monitoring, and sensor-based activity recognition, where class information manifests as localized shapes, specific frequencies, temporal shifts, or co…
Impact of AI-Driven Techniques in Software Impact Analysis
Hamed J. Fawareh, Abdulrhman Alkhmali, Mohammad A. Hassan · WSEAS TRANSACTIONS ON COMPU... · Jul 20, 2026
Software defect prediction (SDP) becomes extremely important in enhancing the quality of software. Recent progresses in machine learning and ensemble learning have led to a great improvement on the prediction model of defects. In this resea…
When Do Simple Formulas Beat Ensembles? A 31-Dataset Benchmark for Interpretable Binary Classification
Andrew Bond · Zenodo (CERN European Organ... · Jul 19, 2026
When can a short algebraic formula replace a gradient-boosted ensemble for binary classification on tabular data? We answer this question empirically on 31 UCI/OpenML datasets using a unified protocol: each dataset is evaluated with 200 5-f…
When Do Simple Formulas Beat Ensembles? A 31-Dataset Benchmark for Interpretable Binary Classification
Andrew Bond · Zenodo (CERN European Organ... · Jul 19, 2026
When can a short algebraic formula replace a gradient-boosted ensemble for binary classification on tabular data? We answer this question empirically on 31 UCI/OpenML datasets using a unified protocol: each dataset is evaluated with 200 5-f…
A UNIFIED MULTI-MODAL FRAMEWORK FOR CROP STRESS DETECTION: COMBINING TABULAR ENVIRONMENTAL DATA AND LEAF-IMAGE CLASSIFICATION
Ridhesh Shripad Walavalkar, Avishka Shrivastav, Shreyansh Kesharwani, Astuti Priya · OpenAlex · Jul 18, 2026
Accurate crop stress detection is essential for precision agriculture; however, most existing approaches rely on binary labels that collapse distinct stress processes water deficit, nutrient deficiency, disease, and pest damage into a singl…
Case Definition Companion Guide for Acute Intraocular Inflammation (aIOI)
Barbara Law, Hammad Ali, Miriam Sturkenboom, Marta Rojo Villaescusa · Zenodo (CERN European Organ... · Jul 17, 2026
This document collates into a single document all SPEAC Acute Intraocular Inflammation (aIOI) resources (Risk factors, background rates, ICD9/10-CM & MedDRA codes), tools (data abstraction & interpretation form, tabular summary of key case …
Conservative plausibility-filtered SMOTE for credit card fraud detection under extreme class imbalance
Fray L. Becerra-Suarez, Paolo P. Arones-Perez, Frederik F. Zamora-Del-Aguila, Manuel G. Forero · Frontiers in Artificial Int... · Jul 17, 2026
Detecting credit card fraud is a complex tabular learning problem due to the underrepresentation of fraudulent events, extreme class imbalance, and scattered minority areas. To address this limitation, a conservative minority synthesis stra…
Case Definition Companion Guide for Acute Intraocular Inflammation (aIOI)
Barbara Law, Hammad Ali, Miriam Sturkenboom, Marta Rojo Villaescusa · Zenodo (CERN European Organ... · Jul 17, 2026
This document collates into a single document all SPEAC Acute Intraocular Inflammation (aIOI) resources (Risk factors, background rates, ICD9/10-CM & MedDRA codes), tools (data abstraction & interpretation form, tabular summary of key case …
Explanation-Stable Anomaly Detection for Imbalanced Heterogeneous Tabular Data under Distribution Shift
Yunjie Ke, Qi Guo, Di Huang, Qingqing Cheng et al. · OpenAlex · Jul 16, 2026
Interpretable machine learning for identifying determinants of high hypertension burden under extreme heat vulnerability: evidence from Maryland, USA
Binbin Peng · Frontiers in Public Health · Jul 15, 2026
Extreme heat poses increasing risks to cardiovascular health, yet fine-scale determinants of heat-related hypertension burden remain insufficiently understood. This study examines high hypertension burden in a heat-vulnerability context acr…
An Explainable Comparative Study Of Regression Models For Multi-Season Crop Yield Prediction In Maharashtra Using Historical Tabular Data
Ashray Bagde* · Zenodo (CERN European Organ... · Jul 14, 2026
Accurate crop yield prediction is essential for food security planning, farmer decision-making, and agricultural policy in India. While machine learning has shown promise in this domain, most existing studies either rely on sensor-based inp…
Where to Intervene? Benchmarking Fairness-Aware Learning on Differentially Private Synthetic Tabular Data
Vinicius Gabriel Angelozzi Verona de Resende, Héber Hwang Arcolezi · Proceedings on Privacy Enha... · Jul 14, 2026
Machine learning models are increasingly deployed in high-stakes domains, raising concerns about both privacy and fairness. Differential Privacy (DP) has become a gold standard for privacy-preserving data analysis, while fairness-aware mech…
An Explainable Comparative Study Of Regression Models For Multi-Season Crop Yield Prediction In Maharashtra Using Historical Tabular Data
Ashray Bagde* · Zenodo (CERN European Organ... · Jul 14, 2026
Accurate crop yield prediction is essential for food security planning, farmer decision-making, and agricultural policy in India. While machine learning has shown promise in this domain, most existing studies either rely on sensor-based inp…
TabPFN-Based Prediction of Concrete Compressive Strength
Zhihao Zhao, Jinjin Wang, Guohui Ma, Mingjie Han · Buildings · Jul 13, 2026
The use of supplementary cementitious materials such as fly ash can reduce environmental impacts and improve the sustainability of concrete construction. However, the nonlinear interactions among mixture design parameters make accurate pred…
Machine learning-guided discovery of a bioactive hydrogel for spatiotemporally orchestrated tendon-to-bone healing
Wencai Liu, Yuhao Yu, Hui Xu, Weiming Lin et al. · Bioactive Materials · Jul 13, 2026
The rational design of biomaterials for complex tissue regeneration, such as the tendon-to-bone interface (TBI), is hindered by an immense combinatorial space that makes empirical optimization impractical. To address this challenge, a machi…
IRaMuTeQ in Qualitative Research: Preparing the Corpus and Applying Methodological Best Practices
Clarissa Coelho Vieira Guimarães · Zenodo (CERN European Organ... · Jul 13, 2026
O IRAMUTEQ® é um software livre para análises estatísticas multivariadas de dados textuais e tabulares, amplamente utilizado em pesquisas nas áreas das Ciências Humanas, Sociais e da Saúde. Esta apresentação aborda a preparação do corpus te…
Monitoring Carbonation Levels in Beverage Production Using X-Bar/R-Charts and Tabular CUSUM: A Statistical Process Control Study at Seven-Up Bottling Company, Kaduna, Nigeria
Chinedu Samuel Onyedika · Kwaghe International Journa... · Jul 11, 2026
Carbonation is a defining sensory attribute of carbonated soft drinks and a critical determinant of product quality because carbon dioxide (CO₂) volume directly influences taste, perceived freshness, mouthfeel, and consumer acceptability. D…
Enhancing Short-Term Electrical Load Forecasting Using SARIMA, XGBoost, LSTM, and VMD-Based Signal Decomposition
Pratima Patel, Seema Pal · International Journal for R... · Jul 11, 2026
Short-term electric load forecasting is essential for stable power system management, yet remains inherently uncertain due to volatile demand patterns, weather variability, and irregular consumption behaviour. The proliferation of smart met…
Towards Conversational Dataset Retrieval: A Survey
Lisa-Yao Gan, Johanna Walker, Elena Simperl, Klaus Diepold · Information Systems Frontiers · Jul 11, 2026
Abstract Large language models (LLMs) have sparked renewed interest in Conversational Information Retrieval (CIR). Within this shift, Conversational Dataset Retrieval (CDR) is emerging as a new subfield that focuses on using natural, contex…
Future changes in climate suitability, yields, and calorie optimization of four global staple food crops
Lucia Mumo, Christian L. E. Franzke, Vecchia P Ravinandrasana, Moses Ojara et al. · Scientific Reports · Jul 10, 2026
Achieving “zero hunger,” the second sustainable development goal, is challenging due to climate change, weather and climate extremes, and unabated human population growth. Understanding likely changes in spatial crop suitability and yield f…
EdgeRefine: Privacy-Utility Balance for Graphs via Jaccard Sampling under Edge Differential Privacy
Wenxiu Ding, Muzhi Liu, Zheng Yan, Mingjun Wang et al. · arXiv · Jul 9, 2026
Graph Neural Networks (GNNs) have shown considerable success in learning from graph-structured data, but their use in privacy-sensitive areas remains difficult because graph structure can leak sensitive link information. To satisfy edge-lev…
A Physics-Based Residual Ensemble Approach to Address the Limitations of GBDT Extrapolation
Zulman Gani arif · OpenAlex · Jul 9, 2026
Gradient Boosting Decision Trees (GBDT) are widely utilized in engineering for their efficiency on tabular data, yet they suffer from a fundamental architectural flaw: a structural inability to perform linear extrapolation, often resulting …
Komparasi Algoritma Random Forest dan XGBoost untuk Prediksi Risiko Penyakit Kronis Berdasarkan Data Konsumsi Kalori Harian
Rizky Aditya · Figshare · Jul 9, 2026
<b>Abstrak: </b>Pengelolaan konsumsi kalori harian yang tidak terkontrol merupakan salah satu pemicu utama munculnya penyakit kronis seperti obesitas dan diabetes melitus tipe 2. Penelitian ini bertujuan membangun model prediksi risiko peny…
Predicting Glare-Related Traffic Outcomes with Transformer-Based Explainable Tabular Deep Learning
Shriyank Somvanshi, Anika Baitullah, Sharif Ahmed Rafat, Subasish Das · Data Science for Transporta... · Jul 8, 2026
Abstract Glare from both low sun angles under daytime conditions and headlight exposure under nighttime conditions degrades driver visibility and remains a persistent challenge for safe roadway operation, yet the mechanisms governing outcom…
Explainability in action: A metric-driven assessment of local explanations for healthcare tabular models
M. Atif Qureshi, Abdul Aziz Noor, Awais Manzoor, Muhammad Deedahwar Mazhar Qureshi et al. · PLoS ONE · Jul 8, 2026
Explainable AI (XAI) is increasingly used in clinical machine learning, yet quantitative evaluation of explanation quality is often reported inconsistently across methods and datasets. We present a reproducible, metric-driven framework for …
Canopy: A Heterograph Foundation Model for Metabolic Engineering
Jake Bowden, Laurence Legon, Satnam Surae · arXiv · Jul 7, 2026
Designing microbial strains that produce high-value chemicals at commercially viable titers remains a central challenge in metabolic engineering. Existing computational approaches either rely on stoichiometric constraint-based models that c…

Track Tabular Data on Distill AI — start free →

Latest Tabular Data Research Papers

Recent papers

Related topics