Latest Retrieval-Augmented Generation Research Papers
The newest Retrieval-Augmented Generation papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Retrieval-Augmented Generation so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Retrieval-Augmented Generation papers in your inbox — free →Recent papers
- Trace Only What You Need: Structure-Aware On-Demand Hypergraph Memory for Long-Document Question AnsweringXiangjun Zai, Xingyu Tan, Chen Chen, Xiaoyang Wang et al. · arXiv · Jun 9, 2026
Long-document question answering (QA) requires large language models (LLMs) to reason over evidence scattered across lengthy documents, where answers often depend on event order, section-level context, and cross-part evidence connections. A…
- Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent MemorySuozhao Ji, Baodong Wu, Zehao Wang, Lei Xia et al. · arXiv · Jun 9, 2026
Long-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which makes evid…
- BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and UpcyclingGianluca Barmina, Annemette Broch Pirchert, Andrea Blasi Núñez, Lukas Galke Poech et al. · arXiv · Jun 8, 2026
As deep learning models scale, managing, inspecting, and modifying large checkpoints has become increasingly challenging. Researchers often need to alter model weights for layer restructuring, precision casting, low-rank factorization, and …
- Clinically Grounded Privacy Evaluation of Medical LMsSasha Ronaghi, Sana Tonekaboni, Lena Stempfle, Vivian Utti et al. · arXiv · Jun 8, 2026
Medical language models (LMs) can memorize and reproduce protected health information, but privacy evaluations often focus on recovery of training text rather than disclosure under realistic threat models. We introduce a clinically grounded…
- DECSELFMASK: Leveraging Unlabeled Text via Self-Relevance-Guided Masking for Decoder-Only ClassificationPietro Ferrazzi, Matteo Merler, Giovanni Bonetta, Alberto Lavelli et al. · arXiv · Jun 8, 2026
Classification tasks require annotated data, which can often be expensive, time-consuming, or even unfeasible to collect. This is the case of the medical domain, where large datasets often have few annotated examples. To address this, we pr…
- AbstRAG: Learning to Abstract for Retrieval ProblemsLei Xu, Xin Quan, Daniel Pedronette, André Freitas · arXiv · Jun 8, 2026
Retrieval-augmented generation often fails when the query, the document evidence, and the user's intent are expressed at different levels of abstraction. A query may ask about a class, a relation, or an event, while the document only states…
- HKVM-RAG: Key-Value-Separated Hypergraph Evidence Organization for Multi-Hop RAGMingyu Zhang, Ying Ma · arXiv · Jun 5, 2026
Multi-hop RAG poses a data-engineering problem beyond passage matching: under fixed retrieval budgets, a system must organize retrieved text into evidence units that expose answer chains. Dense retrievers score passages independently, while…
- Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software EvolutionLiliana Hotsko, Yinxi Li, Yuntian Deng, Pengyu Nie · arXiv · Jun 4, 2026
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning…
- Self-Augmenting Retrieval for Diffusion Language ModelsPaul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go et al. · arXiv · Jun 4, 2026
Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discardi…
- USAD 2.0: Scaling Representation Distillation for Universal Audio UnderstandingHeng-Jui Chang, Alexander H. Liu, Saurabhchand Bhati, Mrudula Athi et al. · arXiv · Jun 4, 2026
Audio encoders are critical to modern audio applications as large language models (LLMs) increasingly rely on a single encoder for diverse inputs. While self-supervised learning (SSL) has yielded strong domain-specific encoders like speech …
- Revisiting Lexicon Evaluation in Unsupervised Word DiscoverySimon Malan, Danel Slabbert, Herman Kamper · arXiv · Jun 4, 2026
Building a lexicon from discovered word-like units is a central goal in zero-resource speech processing. But do our evaluations provide a trustworthy indication of lexicon quality? A common metric, normalized edit distance, averages the pho…
- IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge RetrievalXiaoman Wang, Yaoze Zhang, Wenzhuo Fan, Hongwei Zhang et al. · arXiv · Jun 4, 2026
Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate time with c…
- AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and SynthesisMassimiliano Pronesti, Angelo Miculescu, Mohsin Kapdi, Paul Flanagan et al. · arXiv · Jun 1, 2026
Systematic reviews rely on forest plots to synthesise quantitative evidence across biomedical studies, but generating them remains a fragmented and labour-intensive process. Researchers must interpret complex clinical texts, manually extrac…
- WAXAL-NET: Finetuned Edge ASR Across 19 African LanguagesVictor Tolulope Olufemi, Oreoluwa Babatunde, Ramsey Njema, Bolarinwa Gbotemi et al. · arXiv · Jun 1, 2026
We evaluate whether compact domain-specialized ASR models can outperform massively multilingual foundation models for conversational African speech across 19 languages in the WAXAL corpus. Fine-tuned edge models achieve a macro-averaged WER…
- TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report GenerationXinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu et al. · arXiv · Jun 1, 2026
Deep Research Agents have shown strong capability in multi-step information retrieval, reasoning, and long-form report generation, but existing benchmarks and systems remain predominantly text-centric, with limited evaluation of whether vis…
- When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented GenerationMingyan Wu, Han Yang, Omer Ben-Porat, Yftah Ziser · arXiv · Jun 1, 2026
Retrieval-Augmented Generation (RAG) typically assumes that external knowledge is free, but many high-quality sources are paywalled, licensed, restricted, or otherwise costly to access. We introduce cost-aware RAG, a setting where retrieved…
- Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot EmbodimentsQiuyue Wang, Mingsheng Li, Jian Guan, Jinhui Ye et al. · arXiv · May 28, 2026
Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In…
- Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context SelectionYutong Wang, Xuebo Liu, Derek F. Wong, Zhilin Li et al. · arXiv · May 28, 2026
Document-level translation remains one of the most challenging tasks for large language models, which are constrained by limited context windows that impede global cohesion, while simultaneously suffering from redundant contextual informati…
- CommunityFact: A Dynamic, Multilingual, Multi-domain Benchmark for Misinformation Detection in the WildSahajpreet Singh, Insyirah Mujtahid, Min-Yen Kan, Kokil Jaidka · arXiv · May 28, 2026
Misinformation verification increasingly occurs in public, fast-moving, and multilingual online settings, where static benchmarks provide an incomplete measure of model reliability. We introduce CommunityFact, a refreshable benchmark for mi…
- OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured RecalibrationXinchen Zhang, Bowei Liu, Jiale Liu, Chufan Shi et al. · arXiv · May 27, 2026
Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In this work, we investigate multimodal meta-verification, which…
- Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RLKunhao Zheng, Pierre Chambon, Juliette Decugis, Jonas Gehring et al. · arXiv · May 27, 2026
Linear interpolation between fine-tuned checkpoints has been shown to trace the Pareto front between competing objectives, but whether extrapolative weight averaging can extend such frontiers to new checkpoints useful at inference time, wit…
- GraphSteal: Structural Knowledge Stealing from Graph RAG via Traversal ReconstructionJinze Gu, Qinghua Mao, Xi Lin, Jun Wu · arXiv · May 27, 2026
Retrieval-Augmented Generation (RAG) enhances LLMs by grounding generation in query-relevant external evidence. Beyond unstructured text corpora, Graph RAG integrates knowledge graphs into the retrieval pipeline, enabling LLMs to access ent…
- Adaptive Multimodal Agents-Based Framework for Automatic Workflow ExecutionSusanna Cifani, Mario Luca Bernardi, Marta Cimitile · arXiv · May 27, 2026
Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with the transition from structured metadata parsing to general environmental perception. While the integ…
- Reading Citations from Attention: Faithful Source Attribution in Retrieval Augmented GenerationACL ARR 2026 May Submission · May 26, 2026
Fine-grained citations are essential for trustworthy retrieval-augmented generation (RAG), but most systems ask models to generate citation markers as output tokens, adding formatting burden and potentially obscuring actual evidence use. We…
- Private Retrieval Augmented Generation via Random ProjectionACL ARR 2026 May Submission · May 25, 2026
Retrieval-Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by querying external structured knowledge. However, it can also introduce privacy risks by leaking sensitive information from the retrieval datab…
- Retrieval-Augmented Detection of Potentially Abusive Clauses in Chilean Terms of ServiceChristoffer Loeffler, Tomás Rey Pizarro, Daniel Ignacio Miranda Vásquez, Andrea Martínez Freile · arXiv · May 25, 2026
Online Terms of Service often function as contracts of adhesion, creating asymmetries that may expose consumers to potentially abusive clauses. In Chile, assessing such clauses is legally challenging because some provisions clearly violate …
- MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language ModelsShristi Das Biswas, Kaushik Roy · arXiv · May 25, 2026
Instruction tuning of large vision-language models (LVLMs) increasingly depends on massive multimodal corpora, yet these datasets contain samples with substantial redundancy, low visual dependency, and highly imbalanced coverage of multimod…
- What Makes a Medical Checker Trainable? Diagnosing Signal Collapse and Reward Hacking in Checker-Guided RAG for Biomedical QAYuelyu Ji, Min Gu Kwak, Hang Zhang, Xizhi Wu et al. · arXiv · May 25, 2026
Medical RAG needs evidence-grounded claims, so plugging a claim-level NLI checker into retrieval-augmented RL is intuitive. \textbf{We find that the checker's \emph{output distribution} during training, not its held-out accuracy, decides wh…
- Mitigating Provenance-Role Collapse in Long-Term Agents via Typed Memory RepresentationZhengda Jin, Bingbing Wang, Jing Li, Ruifeng Xu et al. · arXiv · May 25, 2026
Long-term memory is essential for persistent LLM agents, yet prevailing architectures store historical interactions as unstructured, flat text. This unconstrained storage induces provenance-role collapse, a critical failure mode where agent…
- SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase RobustnessJiahao Huo, Wenjie Qu, Yibo Yan, Kening Zheng et al. · arXiv · May 25, 2026
Semantic-level watermarking (SWM) improves robustness against text modifications by treating sentences as the basic unit. However, robustness to paragraph-level paraphrasing remains difficult because such attacks globally disrupt watermark …