Latest Multi-Agent Systems Research Papers
The newest Multi-Agent Systems papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Multi-Agent Systems so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Multi-Agent Systems papers in your inbox — free →Recent papers
- Ma'aks: manually-curated parallel dataset for Arabic text sentiment swap[object Object], [object Object], [object Object], [object Object] et al. · Lang. Resour. Evaluation 2026 · Dec 31, 2026
The advancement of NLP has made significant strides in sentiment style transfer, modifying the linguistic style of a text while preserving its content. However, most existing datasets are non-parallel and focus on English, neglecting low-re…
- LLM-Mediated Demand Response Coordination in Smart MicrogridsJ. de Curtò, I. de Zarzà · arXiv · Jun 9, 2026
Effective demand response in smart microgrids requires prosumers to cooperate voluntarily under strategic self-interest, a coordination problem structurally equivalent to a repeated Prisoner's Dilemma on a social network. This paper present…
- Decentralized Multi-Agent Systems with Shared ContextYuzhen Mao, Azalia Mirhoseini · arXiv · Jun 9, 2026
Multi-agent systems (MAS) can scale large language model reasoning at test time by decomposing complex problems into parallel subtasks. However, most existing MAS rely on centralized orchestration, where a main agent assigns work, collects …
- SkillAxe: Sharpening LLM-Authored Agent Skills Through Evaluation-Guided Self-RefinementSrishti Gautam, Arjun Radhakrishna, Sumit Gulwani · arXiv · Jun 9, 2026
Skill documents, structured natural-language instructions that guide Large Language Model (LLM) agents, are critical to modern agent frameworks, yet LLMs struggle to write skills that actually work. On SkillsBench, human-authored skills imp…
- Decoupling Thought from Speech: Knowledge-Grounded Counterfactual Reasoning for Resilient Multi-Agent ArgumentationJakub Masłowski, Jarosław A. Chudziak · arXiv · Jun 9, 2026
Multi-agent debate frameworks have been shown to improve large language model performance in convergent tasks, but they are currently optimized in a way that heavily favors final output accuracy rather than stability of the process. During …
- Game-Theoretic Multi-Agent Control for Robust Contextual Reasoning in LLMsSaeid Jamshidi, Amin Nikanjam, Arghavan Moradi Dakhel, Kawser Wazed Nafi et al. · arXiv · Jun 9, 2026
Large Language Models (LLMs) in multi-turn interactions maintain evolving context rather than generating isolated responses, making them vulnerable to prompt-injection and context-poisoning attacks in which locally plausible adversarial fra…
- What Spatial Memory Must Store: Occlusion as the Test for Language-Agent MemoryDoeon Kwon, Junho Bang · arXiv · Jun 9, 2026
Language-agent "memory palace" systems anchor each memory to a world coordinate, on the intuition that geometry adds something text cannot. We make that intuition testable and report three results. First, the memory-palace default of foldin…
- Deployment-Time Memorization in Foundation-Model AgentsLei, Chen, Guilin Zhang, Kai Zhao et al. · arXiv · Jun 8, 2026
Foundation-model agents are increasingly long-lived systems that remember users across interactions, making memorization an explicit deployment-time function rather than solely a property of model weights. Existing work addresses parametric…
- FASE: Fast Adaptive Semantic Entropy for Code QualityShizhe Lin, Ladan Tahvildari · arXiv · Jun 8, 2026
Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, system reliability remains hindered by LLM hallucinations and error propagation acr…
- Revisiting mesoscopic traffic flow simulation in SUMO: Limitations, analysis, and an alternativeYing-Chuan Ni, Alina Akopian, Anastasios Kouvelas, Michail A. Makridis · arXiv · Jun 8, 2026
Mesoscopic traffic flow models combines the merits of both macroscopic and microscopic models by capturing individual vehicle behavior in great detail and remaining the computational efficiency. At the time of this study, the mesoscopic mod…
- Performance Evaluation of Social LearningFelice Scala, Marco Carpentiero, Vincenzo Matta, Ali H. Sayed · arXiv · Jun 8, 2026
Social Learning is a decentralized decision-making paradigm in which spatially dispersed agents collect streaming observations regulated by one of a finite number of models (the hypotheses). The agents are interested in assigning probabilit…
- Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network OperationsArun Malik · arXiv · Jun 8, 2026
Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace with the volume, velocity, and complexity of failures. This paper presents an agentic AI arc…
- A Multi-Agent System for IPMSM Design Optimization via an FEA-AI Hybrid ApproachJinseong Han, Sunwoong Yang, Namwoo Kang · arXiv · Jun 8, 2026
Interior permanent magnet synchronous motor (IPMSM) design requires balancing conflicting objectives and multi-physics constraints, while modern optimization workflows face three bottlenecks: manual problem setup, high finite element analys…
- Hardening Agent Benchmarks with Adversarial Hacker-Fixer LoopsZiqian Zhong, Ivgeni Segal, Ivan Bercovich, Shashwat Saxena et al. · arXiv · Jun 8, 2026
Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier m…
- PerspectiveGap: A Benchmark for Multi-Agent Orchestration PromptingYouran Sun, Xingyu Ren, Kejia Zhang, Xinpeng Liu et al. · arXiv · Jun 7, 2026
Real-world LLM applications are moving beyond single-agent workflows toward orchestrated multi-agent systems, yet current models still struggle to determine what each sub-agent needs to know. To measure this, we introduce PerspectiveGap, a …
- RAILS: Verification-Native Clearing For Agentic CommerceAdrian de Valois-Franklin, Alex Bogdan · arXiv · Jun 7, 2026
Autonomous agents negotiate, purchase, deploy code, and move funds, but no neutral mechanism determines whether they met their delegated obligation, who is responsible when they did not, or which settlement action follows. This is the agent…
- Is Telehealth Better Used to Treat Patients or Help Other Physicians Treat Patients? An Agent-Based Modeling Study of Healthcare ProvisionMichael Chary · arXiv · Jun 7, 2026
Telehealth, the delivery of medical care remotely, is hoped to increase access to specialty services or decrease health care utilization. Physicians can provide telehealth to each other or to patients. Specialists often treat complex patien…
- Quantitative Promise Theory: Intentionality and Inference in Autonomous AgentsMark Burgess · arXiv · Jun 7, 2026
I discuss some quantitative representations of Promise Theory for processes involving autonomous agents. Agent models are common in software systems, machine learning, and biology, for example, but may also apply to physics and other forms …
- The Consistency Illusion: How Multi-Agent Debate Hides Reasoning MisalignmentXiaoyang Wang, Christopher C. Yang · arXiv · Jun 7, 2026
Multi-agent LLM systems for medical question answering often treat consensus as a reliability signal: if multiple agents agree on an answer, it is presumed trustworthy. However, answer-level consensus does not entail reasoning-level alignme…
- SceneConductor: 3D Scene Generation from Single Image with Multi-Agent OrchestrationJeonghwan Kim, Yushi Lan, Yongwei Chen, Hieu Trung Nguyen et al. · arXiv · Jun 7, 2026
Generating complete 3D scenes from a single image requires inferring globally consistent geometry, object relationships, and environmental context from inherently ambiguous visual evidence. Despite recent progress in joint layout-and-mesh g…
- Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot TeamingMichal P. Podolinsky, Neel P. Bhatt, Pranay Samineni, Rohan Siva et al. · arXiv · Jun 7, 2026
Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from sources such as…
- Emergence World: A Platform for Evaluating Long-Horizon Multi-Agent AutonomyDeepak Akkil, Ravi Kokku, Karthik Vikram, Tamer Abuelsaad et al. · arXiv · Jun 6, 2026
Most evaluations of LLM agents look like exams: a discrete task, a clean environment, a score in minutes or hours. We argue that this approach is mismatched with the deployment conditions of autonomous systems, where the relevant timescale …
- Benchmarking Open-Ended Multi-Agent Coordination in Language AgentsKale-ab Abebe Tessera, Andras Szecsenyi, Cameron Barker, Alexander Rutherford et al. · arXiv · Jun 6, 2026
As language models are increasingly deployed as autonomous agents, they must coordinate with others over long horizons in open-ended interactive tasks. Yet existing evaluations rarely test these demands together, instead emphasising single-…
- To Nuke or Not to Nuke: LLMs' (Missing) Ethical Reasoning and Actions in a High-Stakes Decision-Making SimulationJohn Chen, Sihan Cheng, Can Gurkan, H M Abdul Fattah · arXiv · Jun 6, 2026
Large language models (LLMs) are increasingly deployed as long-horizon agents with decision-making capacities. While LLMs can show ethical competence on dilemmas such as trolley problems, this competence may not translate to complex, agenti…
- Toward Human-Centered Multi-Agent Systems: Integrating Cognition, Culture, Values, and Cooperation in AI AgentsSafia Baloch, Rahemeen Khan · arXiv · Jun 6, 2026
The emergence of large language model (LLM)-based agents and multi-agent systems has enabled a shift from narrow task automation to more autonomous decision-making. Despite progress in language generation, planning, tool use, and coordinati…
- Silent Failure in LLM Agent Systems: The Entropy Principle and the Inevitable Disorder of Autonomous AgentsDexing Liu · arXiv · Jun 6, 2026
Large Language Model (LLM) agent systems suffer from failures that occur without external triggers -- no injection, no adversarial input, no resource exhaustion. These silent failures -- unexpected deviations from intended behavior under no…
- PACE: Anytime-Valid Acceptance Tests for Self-Evolving AgentsZayx Shawn · arXiv · Jun 6, 2026
Self-evolving agents improve by repeatedly proposing changes to their own prompts, skills, or workflows and keeping those that score higher on a small held-out set. Almost all effort has gone into the proposer that generates candidates; we …
- Continual Quadruped Robots Coordination via Semantic Skill DiscoveryDaoqing Wang, Yuchen Xiao, Weixuan Huang, Zhilong Zhang et al. · arXiv · Jun 6, 2026
Multi-quadruped coordination has attracted increasing attention due to its enhanced payload capacity, broader contact coverage, and improved adaptability to challenging tasks. Existing methods for multi-quadruped manipulation typically focu…
- SKILL.nb: Selective Formalization and Gated Execution for Durable Agent WorkflowsAmine El Hattami, Nicolas Chapados, Christopher Pal · arXiv · Jun 6, 2026
AI agents increasingly turn past experience into reusable artifacts such as code, workflows, and procedural memories. Reuse can improve efficiency, but it also creates a lifecycle reliability problem: artifacts that succeed once may fail un…
- Voting Protocols as Coordination Mechanisms for Role-Constrained Multi-Agent Tutoring SystemsEric S. Qiu, Joyce Gill · arXiv · Jun 6, 2026
Agentic tutoring systems introduce a coordination challenge: multiple agents may propose different but reasonable interventions, yet only one response can be delivered to the learner. In this paper, we study how voting protocols shape coope…