Latest Privacy-Preserving ML Research Papers
The newest Privacy-Preserving ML papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Privacy-Preserving ML so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Privacy-Preserving ML papers in your inbox — free →Recent papers
- Landseer: Exploring the Machine Learning Defense LandscapeAyushi Sharma, Rosemary Agbozo, Santiago Torres-Arias, Zahra Ghodsi · arXiv · May 26, 2026
Machine learning systems face diverse threats that undermine robustness, privacy, and fairness. Although many defenses have been proposed, each typically addresses a single risk in isolation. Real-world deployments, however, require these d…
- Practical Anonymous Two-Party Gradient Boosting Decision TreeHuang Chenyu, Zhang Fan, Du Minxin, Chow Sherman SM et al. · arXiv · May 26, 2026
Structured data is well handled by gradient-boosted decision trees (GBDT), which are usually trained on vertically partitioned features across mutually distrustful parties. High speed and interpretability make GBDTs popular in finance and h…
- Privacy-Preserving Screening for Record LinkageChenyu Huang, Fan Zhang, Huangxun Chen, Yongjun Zhao et al. · arXiv · May 26, 2026
In an era dominated by big data and machine learning, establishing valuable data collaboration has never been more critical. However, such collaborations must operate under regulatory and legal constraints. Two-party Privacy-Preserving Reco…
- PRAG End-to-End Privacy-Preserving Retrieval-Augmented GenerationZhijun Li, Minghui Xu, Huayi Qi, Wenxuan Yu et al. · arXiv · Apr 29, 2026
Retrieval-Augmented Generation (RAG) is essential for enhancing Large Language Models (LLMs) with external knowledge, but its reliance on cloud environments exposes sensitive data to privacy risks. Existing privacy-preserving solutions ofte…
- Differentially Private Contrastive Learning via Bounding Group-level ContributionKecen Li, Chen Gong, Zinan Lin, Tianhao Wang et al. · arXiv · Apr 29, 2026
Differentially private (DP) contrastive learning aims to learn general-purpose representations from sensitive data, alleviating the privacy leakage concerns of organizations deploying or sharing embedding models trained on private user cont…
- Privacy-Preserving Clothing Classification using Vision Transformer for Thermal Comfort EstimationTatsuya Chuman, Yousuke Udagawa, Hitoshi Kiya · arXiv · Apr 29, 2026
A privacy-preserving clothing classification scheme is presented to enable secure occupant-centric control (OCC) systems. Although the utilization of camera images for HVAC control has been widely studied to optimize thermal comfort, privac…
- A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy OptimizationsZihan Liu, Yizhen Wang, Rui Wang, Xiu Tang et al. · arXiv · Apr 27, 2026
Fine-tuning unlocks large language models (LLMs) for specialized applications, but its high computational cost often puts it out of reach for resource-constrained organizations. While cloud platforms could provide the needed resources, data…
- X-NegoBox: An Explainable Privacy-Budget Negotiation Framework for Secure Peer-to-Peer Energy Data ExchangePoushali Sengupta, Sabita Maharjan, Frank Eliassen, Yan Zhang · arXiv · Apr 27, 2026
The decentralization of modern energy systems is transforming consumers into prosumers who continuously exchange data with aggregators, peers, and market operators. While such data is essential for peer-to-peer trading, demand response, and…
- Listen to the Voices of Everyday Users: Democratizing Privacy Ratings for Sensitive Data Access in Mobile AppsLiu Wang, Tianshu Zhou, Haoyu Wang, Yi Wang · arXiv · Apr 27, 2026
Mobile apps frequently request excessive data access, raising significant privacy concerns. While regulations like GDPR emphasize data minimization, they provide limited guidance on concretely defining and enforcing necessary data access. E…
- LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language ModelsKato Mivule · arXiv · Apr 26, 2026
This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular datasets, to privacy auditing of Large Language Models (LLMs). We propose LLM-CEG, a systematic…
- Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid ProbingYu Cui, Ruiqing Yue, Hang Fu, Sicheng Pan et al. · arXiv · Apr 26, 2026
With the wide adoption of personal AI assistants such as OpenClaw, privacy leakage in user interaction contexts with large language model (LLM) agents has become a critical issue. Existing privacy attacks against LLMs primarily target train…
- Rényi Pufferfish Privacy with Gaussian-based Priors: From Single Gaussian to Mixture ModelWenjin Yang, Ni Ding, Zijian Zhang, Zhen Li et al. · arXiv · Apr 26, 2026
Rényi Pufferfish Privacy (RPP) provides a Rényi divergence-based privacy framework for correlated data, but existing $\infty$-Wasserstein mechanisms are often conservative and sacrifice data utility. We study Gaussian mechanisms for RPP und…
- CyberCane: Neuro-Symbolic RAG for Privacy-Preserving Phishing Detection with Formal Ontology ReasoningSafayat Bin Hakim, Aniqa Afzal, Qi Zhao, Vigna Majmundar et al. · arXiv · Apr 26, 2026
Privacy-critical domains require phishing detection systems that satisfy contradictory constraints: near-zero false positives to prevent workflow disruption, transparent explanations for non-expert staff, strict regulatory compliance prohib…
- Analysis of Personal Data Exposure in ThailandSuphannee Sivakorn, Sasawat Malaivongs, Nuttaya Rujiratanapat · arXiv · Apr 26, 2026
In the digital era, personal data, particularly sensitive identifiers such as the Social Security Number and National Identification Number, have become a highly valuable asset, raising significant concerns regarding privacy and security. T…
- Benchmarking the Utility of Privacy-Preserving Cox Regression Under Data-Driven Clipping Bounds: A Multi-Dataset Simulation StudyKeita Fukuyama, Yukiko Mori, Tomohiro Kuroda, Hiroaki Kikuchi · arXiv · Apr 23, 2026
Differential privacy (DP) is a mathematical framework that guarantees individual privacy; however, systematic evaluation of its impact on statistical utility in survival analyses remains limited. In this study, we systematically evaluated t…
- Differentially Private De-identification of Dutch Clinical Notes: A Comparative EvaluationMichele Miranda, Xinlan Yan, Nishant Mishra, Rachel Murphy et al. · arXiv · Apr 23, 2026
Protecting patient privacy in clinical narratives is essential for enabling secondary use of healthcare data under regulations such as GDPR and HIPAA. While manual de-identification remains the gold standard, it is costly and slow, motivati…
- Differentially Private Clustered Federated Learning with Privacy-Preserving Initialization and Normality-Driven AggregationJie Xu, Haaris Mehmood, Rogier Van Dalen, Karthikeyan Saravanan et al. · arXiv · Apr 22, 2026
Federated learning (FL) enables training of a global model while keeping raw data on end-devices. Despite this, FL has shown to leak private user information and thus in practice, it is often coupled with methods such as differential privac…
- Towards Secure Logging: Characterizing and Benchmarking Logging Code Security Issues with LLMsHe Yang Yuan, Xin Wang, Kundi Yao, An Ran Chen et al. · arXiv · Apr 22, 2026
Logging code plays an important role in software systems by recording key events and behaviors, which are essential for debugging and monitoring. However, insecure logging practices can inadvertently expose sensitive information or enable a…
- Federated Learning over Blockchain-Enabled Cloud InfrastructureSaloni Garg, Amit Sagtani, Kamal Kant Hiran · arXiv · Apr 21, 2026
The rise of IoT devices and the uptake of cloud computing have informed a new era of data-driven intelligence. Traditional centralized machine learning models that require a large volume of data to be stored in a single location have theref…
- DECIFR: Domain-Aware Exfiltration of Circuit Information from Federated Gradient ReconstructionGijung Lee, Wavid Bowman, Olivia P. Dizon-Paradis, Reiner N. Dizon-Paradis et al. · arXiv · Apr 21, 2026
Federated Learning (FL) is a promising approach for multiparty collaboration as a privacy-preserving technique in hardware assurance, but its security against adversaries with domain-specific knowledge is underexplored. This paper demonstra…
- A Data-Free Membership Inference Attack on Federated Learning in Hardware AssuranceGijung Lee, Wavid Bowman, Olivia P. Dizon-Paradis, Reiner N. Dizon-Paradis et al. · arXiv · Apr 21, 2026
Federated Learning (FL) is an emerging solution to the data scarcity problem for training deep learning models in hardware assurance. While FL is designed to enhance privacy by not sharing raw data, it remains vulnerable to Membership Infer…
- Efficient Arithmetic-and-Comparison Homomorphic Encryption with Space SwitchingErwin Eko Wahyudi, Yan Solihin, Qian Lou · arXiv · Apr 21, 2026
Fully homomorphic encryption (FHE) enables computation on encrypted data without decryption, making it central to privacy-preserving applications. However, no existing scheme efficiently supports both arithmetic and comparison operations in…
- An AI Agent Execution Environment to Safeguard User DataRobert Stanley, Avi Verma, Lillian Tsai, Konstantinos Kallas et al. · arXiv · Apr 21, 2026
AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g., personal and financial information). This poses a serious risk to security and privacy. Advers…
- Secure Storage and Privacy-Preserving Scanpath Comparison via Garbled Circuits in Eye TrackingSuleyman Ozdel, Amr Nader, Yasmeen Abdrabou, Enkelejda Kasneci · arXiv · Apr 21, 2026
With the growing use of eye tracking on VR and mobile platforms, gaze data is increasing. While scanpath comparison is important to gaze behavior analysis, existing methods lack privacy-preserving capabilities for real-world use. We present…
- Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy IdentifiersDaniel M. Jimenez-Gutierrez, Enrique Zuazua, Georgios Kellaris, Joaquin Del Rio et al. · arXiv · Apr 21, 2026
Federated Learning (FL) enables collaborative model training among multiple parties without centralizing raw data. There are two main paradigms in FL: Horizontal FL (HFL), where all participants share the same feature space but hold differe…
- DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMsIsaiah Thompson, Tanmay Sen, Ritwik Bhattacharya · arXiv · Apr 21, 2026
Modern distributed systems generate massive volumes of log data that are critical for detecting anomalies and cyber threats. However, in real world settings, these logs are often distributed across multiple organizations and cannot be centr…
- Synthetic Data Meets Finance: Generative Models for Privacy Preserving AnalyticsYongbin Yang, Jingyun Yang · Journal of Banking and Fina... · Apr 21, 2026
The financial industry faces increasing pressure from privacy regulations, including the General Data Protection Regulation (GDPR) and sector-specific compliance frameworks, which restrict access to sensitive transaction data critical for t…
- The Personalization Paradox in AI-Driven Tourism E-Commerce: Psychological Reactance, Threat-Substitution, and the Moderating Role of Privacy ConcernsHongmei Duan, Ahmad Yahya Dawod, Guochao Wan · Journal of theoretical and ... · Apr 21, 2026
AI-driven personalization (AIP) has become a core mechanism of digital commerce platforms, yet its psychological consequences remain theoretically fragmented. Drawing on the Stimulus–Organism–Response (SOR) framework and Psychological React…
- Blockchain-Driven AI-Enhanced Post-Quantum Multivariate Identity-based Signature and Privacy-Preserving Data Aggregation Scheme for Fog-enabled Flying Ad-Hoc NetworksSufian Al majmaie, Ghazal Ghajari, Niraj Prasad Bhatta, Fathi Amsaad · arXiv · Apr 20, 2026
The integration of Fog Computing with Flying Ad-Hoc Networks (FANETs) offers promising capabilities for decentralized, low-latency intelligence in UAV-based applications. However, the distributed nature, mobility, and resource constraints o…
- No More Guessing: a Verifiable Gradient Inversion Attack in Federated LearningFrancesco Diana, Chuan Xu, André Nusser, Giovanni Neglia · arXiv · Apr 16, 2026
Gradient inversion attacks threaten client privacy in federated learning by reconstructing training samples from clients' shared gradients. Gradients aggregate contributions from multiple records and existing attacks may fail to disentangle…