Latest Robotics Research Papers
The newest Robotics papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Robotics so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Robotics papers in your inbox — free →Recent papers
- RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System[object Object], [object Object], [object Object], [object Object] et al. · AAAI 2026 · Dec 31, 2026
Current roadside perception systems mainly focus on instance-level perception, which fall short in enabling interaction via natural language and reasoning about traffic behaviors in context. To bridge this gap, we introduce RoadSceneVQA, a …
- TacForeSight: Force-Guided Tactile World Model for Contact-Rich ManipulationYujie Zang, Yuhang Zheng, Xian Nie, Yupeng Zheng et al. · arXiv · Jun 9, 2026
Contact-rich manipulation requires robots to continuously perceive and regulate evolving physical interactions under dynamic contact transitions or complex surface geometries. Recent imitation learning methods improve contact-aware control …
- JOIN: Anchor-Grasp-Conditioned Joining via Opposition, Inference, and Navigation for Bimanual Assistive ManipulationDrake Moore, Matt Cheng, Xiang Zhi Tan, Taşkın Padır · arXiv · Jun 9, 2026
Assistive mobility and manipulation platforms have received increasing attention as a means of restoring independence to individuals with disabilities. While effective for many basic activities of daily living (ADLs), a significant percenta…
- EM-Fall: Embodied mmWave Sensing for Day-and-Night Fall Detection on Humanoid RobotsYanshuo Lu, Yuxuan Hu, Shenghai Yuan, Xinyu Zhou et al. · arXiv · Jun 9, 2026
Falls are one of the leading causes of injury and hospitalization among elderly individuals, making reliable fall awareness an essential capability for safety monitoring in residential environments. However, existing fall detection systems …
- RoboNaldo: Accurate, Stable and Powerful Humanoid Soccer Shooting via Motion-Guided Curriculum Reinforcement LearningYichao Zhong, Yidan Lu, Yuhang Lu, Tianyang Tang et al. · arXiv · Jun 9, 2026
Elite humanoid soccer shooting requires whole-body stability, high-impulse whole-body interactions, and accuracy to targets. Motion tracking-driven reinforcement learning (RL) provides stability in whole-body movement coordination, but a fi…
- A Distributed Multi-UGV Exploration Framework With Loop-Aware Planning and Descriptor-Aided Localization in Resource-Limited EnvironmentsZhiwei Li, Haiou Liu, Xijun Zhao, Ji Li et al. · arXiv · Jun 9, 2026
Robust and efficient cooperative exploration with multiple unmanned ground vehicles (UGVs) in unknown, GPSdenied, and bandwidth-limited environments without prior maps remains challenging, as localization drift degrades map consistency and …
- Generation of Diverse and Functional Robot Designs using Superquadrics Parametrisation and Quality-DiversityLeni Le Goff, Simon Smith, Emma Hart · arXiv · Jun 9, 2026
Generative design of robots requires navigating a vast search-space, encompassing physical configurations and behavioural parameters. Evolutionary Algorithms (EAs) have shown promising results, but often converge prematurely to a small set …
- A Spiking Neural Architecture for Coordinating Arm and Locomotor ControlLea Steffen, Kathryn Simone, Graeme Damberger, Travis DeWolf et al. · arXiv · Jun 9, 2026
Spiking Neural Networks (SNNs) coupled with neuromorphic hardware offer energy-efficient solutions for humanoid robot control. However, existing SNN-based motor control systems address bipedal locomotion and arm control in isolation, leavin…
- Diffusion Forcing Planner: History-Annealed Planning with Time-Dependent Guidance for Autonomous DrivingZehan Zhang, Neng Zhang, Yaoyi Li, Jia Cai et al. · arXiv · Jun 9, 2026
Learning-based motion planners, despite recent progress, often suffer from temporal inconsistency. Small perturbations across frames can accumulate into unstable trajectories, degrading comfort and safety in closed-loop driving. Several met…
- Multi-UAV Active Sensing with Information Gain-based Planning and Belief FusionS. Habibi, L. Marques · arXiv · Jun 9, 2026
Unmanned aerial vehicles (UAVs) are increasingly used for active sensing and information gathering in spatially distributed environments. Their performance, however, is constrained by limited flight time, sensing uncertainty, and the trade-…
- Language-Driven Cost Optimization for Autonomous DrivingDiego Martinez-Baselga, Khaled Mustafa, Javier Alonso-Mora · arXiv · Jun 9, 2026
The driving behavior of autonomous vehicles is typically governed by the cost function of their motion planner, which encodes objectives such as speed tracking, smoothness, lane keeping, and collision avoidance. However, tuning the paramete…
- Resilient Navigation for Autonomous Farm Robots by Leveraging Jerk-Augmented Models with IMU-Only Disturbance RejectionBatu Candan, Mohammed Atallah, Simone Servadio, Saeed Arabi · arXiv · Jun 9, 2026
Precise state estimation for navigation of autonomous agricultural robots is often compromised by sensor outages (GNSS/LiDAR/Visual) and high-frequency vibrations inherent in off-road environments. This paper proposes a robust navigation al…
- AllDayNav: Lifelong Navigation via Real-World Reinforcement LearningHang Yin, Yinan Liang, Jiazhao Zhang, Jiahang Liu et al. · arXiv · Jun 9, 2026
Lifelong embodied navigation in dynamic environments requires robots to form persistent scene understanding from fragmentary observations, which remains difficult for existing methods that rely on explicit maps or scene graphs and struggle …
- Task Robustness via Re-Labelling Vision-Action Robot DataArtur Kuramshin, Özgür Aslan, Cyrus Neary, Glen Berseth · arXiv · Jun 9, 2026
The recent trend in scaling models for robot learning has resulted in impressive policies that can perform various manipulation tasks and generalize to novel scenarios. However, these policies continue to struggle with following instruction…
- AgniNav: Configuration-Driven Cross-Embodiment Local Planning for Robot NavigationTianhao Zang, Siwei Cheng, Haidong Huang, Shanze Wang et al. · arXiv · Jun 9, 2026
Monocular local navigation is attractive for lightweight robots, but existing vision-based policies often couple perception to a specific body, camera height, and footprint, making transfer from wheeled bases to legged platforms dependent o…
- MV-Actor: Aligning Multi-View Semantics and Spatial Awareness for Bimanual ManipulationYinchen Tian, Huan Li, Muyao Peng, Xi Wang et al. · arXiv · Jun 9, 2026
Robotic manipulation has been widely applied in industrial scenarios. Compared with single-arm manipulation, bimanual manipulation is equipped with multiple cameras to capture information from different viewpoints. However, existing multi-v…
- Embodiment-conditioned Generalist Control for Multirotor Aerial RobotsOrestis Konstantaropoulos, Welf Rehberg, Mihir Kulkarni, Kostas Alexis · arXiv · Jun 9, 2026
We present a generalist position control policy capable of controlling arbitrary multirotor configurations of a certain rotor count (e.g., hexarotors or quadrotors) with a single set of network weights. The policy is conditioned on a physic…
- An Exposure-Time-Aligned Primary-Path Architecture for Autonomous-Driving ECUsToru Saito, Yuki Hagura, Tatsuya Konishi, Satoru Mizusawa et al. · arXiv · Jun 9, 2026
While end-to-end (E2E) autonomous driving has become the dominant research direction, production vehicles continue to rely on modular multi-NN pipelines for a non-trivial transitional period. The subject of this paper is the design of an ar…
- Gradient based Bilevel for Inverse Optimal Control, a Riemannian approachAhmed-Manaf Dahmani, Vincent Bonnet, David Daney, François Charpillet · arXiv · Jun 9, 2026
Inverse Optimal Control (IOC) aims to recover the cost function that explains observed trajectories as solutions of an optimal control problem. Classical IOC formulations rely on bilevel optimization, which repeatedly solves a nested optima…
- GUIDE: Goal-Initialized Directional Understanding for End-to-End Visual NavigationLiang Wang, Jin Jin, KanZhong Yao, YiBin Wu et al. · arXiv · Jun 9, 2026
Learning-based visual navigation for legged robots typically relies on continuous goal updates from hierarchical state estimation to provide a persistent directional reference. This reliance incurs additional sensory and computational overh…
- IMPACT: Learning Internal-Model Predictive Control for Forceful Robotic ManipulationJiawei Gao, Chaoqi Liu, Peilin Wu, Haonan Chen et al. · arXiv · Jun 9, 2026
Real-world robotic manipulation tasks often involve forceful interactions with the environment, such as using tools of varying weights, transporting objects with different masses, and performing contact-rich tasks like table wiping. Previou…
- Bridging Semantics and Physical Execution: A Neuro-Symbolic Framework for Multi-Pair Robotic AssemblyXinyi Li, Aiguo Song, Linhu Wei, Huijun Li · arXiv · Jun 9, 2026
Multi-pair robotic assembly in unstructured environments faces spatial interference and contact uncertainties. Existing paradigms fail to bridge cognitive decision-making and physical execution, as they either encounter state-space explosio…
- MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action ModelsHao Shi, Weiye Li, Bin Xie, Yulin Wang et al. · arXiv · Jun 8, 2026
Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. However, most VLA models rely primarily on the current observation and therefore strug…
- iMaC: Translating Actions into Motion and Contact Images for Embodied World ModelsZhenyu Wu, Xiuwei Xu, Yukun Zhou, Yifan Li et al. · arXiv · Jun 8, 2026
Embodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodied frameworks rely on low-dimensional structured action vectors (e.g., joint ang…
- AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context RoutingJisong Cai, Long Ling, Shiwei Chu, Zhongshan Liu et al. · arXiv · Jun 8, 2026
World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world predictio…
- SynManDex: Synthesizing Human-like Dexterous Grasps from Synthetic Human Pre-GraspsYanming Shao, Zanxin Chen, Wenwei Lin, Mingjie Zhou et al. · arXiv · Jun 8, 2026
Human hand-object interactions encode functional intent, but direct transfer to robotic hands often fails under morphology, contact, and reachability constraints. We present SynManDex, a synthetic pipeline that uses generated human pre-gras…
- AetheRock: An Arm-Worn Robot Teaching System for Force-Guided Vision-Tactile LearningHong Li, Yue Xu, Yihan Tang, Yankang Dong et al. · arXiv · Jun 8, 2026
Force and tactile sensing are indispensable in contact-rich manipulation. However, force-aware robot learning faces critical challenges due to the incompatible assembly of tactile and force sensors in handheld or wearable devices. To addres…
- Difference-Aware Retrieval Policies for Imitation LearningQuinn Pfeifer, Ethan Pronovost, Paarth Shah, Khimya Khetarpal et al. · arXiv · Jun 8, 2026
Parametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding errors during deployment. We show that reusing the training data during inference via a semi-parametric …
- Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action ModelsSeongbin Park, Fan Zhang, Baharan Mirzasoleiman, Shahriar Talebi et al. · arXiv · Jun 8, 2026
Vision-Language-Action (VLA) models have demonstrated impressive end-to-end performance across a variety of robotic manipulation tasks. However, these policies offer no guarantees against collisions with task-irrelevant objects in the scene…
- ProbeAct: Probe-Guided Training-Free Failure Recovery in Vision-Language-Action ModelsFan Zhang, Seongbin Park, Baharan Mirzasoleiman, Shariar Talebi et al. · arXiv · Jun 8, 2026
Vision-Language-Action (VLA) models demonstrate strong perfor-1 mance on language-conditioned robotic manipulation within their training dis-2 tribution, yet their generalization capabilities remain fundamentally limited. They3 lack the rob…