Vision

Latest 3D Vision & NeRFs Research Papers

The newest 3D Vision & NeRFs papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks 3D Vision & NeRFs so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest 3D Vision & NeRFs papers in your inbox — free →

Recent papers

First Assessment of the Capabilities of BIOMASS Tomographic Data for 3D Forest Structure Mapping
Matteo Pardini, Roman Guliaev, Noelia Romero Puig, Konstantinos Papathanassiou · elib (German Aerospace Center) · Aug 1, 2026
Synthetic Aperture Radar Tomography (TomoSAR) reconstructs vertical profiles of the full 3D distribution of the backscattered radar power (or reflectivity) in natural media from SAR images acquired with a slight variation of look-angle [1].…
A deep learning workflow for 3D segmentation and reconstruction of cell lines from confocal microscopy for Monte Carlo Radiobiology Simulations
Hamid Ladjal, Binjamyn Mairesse, Gersende Alphonse, Céline Malesys et al. · HAL (Le Centre pour la Comm... · Jul 26, 2026
International audience...
3D-Aware VLMs with Implicit and Explicit Geometries
Wenhao Li, Xueying Jiang, Quanhao Qian, Deli Zhao et al. · arXiv · Jul 23, 2026
Despite rapid progress, most existing vision-language models (VLMs) built from 2D visual inputs often struggle when handling various 3D tasks that require fine-grained spatial understanding and reasoning. To bridge this gap, we present VLM-…
Texture++: Elevating 3D Asset Texture Resolution with a Region-Aware Diffusion Model
Shuaiwei Wang, Shi Li, Jieting Xu, Yuchi Huo et al. · arXiv · Jul 23, 2026
Numerous 3D assets are discarded due to low texture resolution, while current super-resolution models ignore texture maps and focus on natural images. An efficient and generalizable texture super-resolution model can revitalize a large corp…
GrainGS: Gradient-Decoupled Gaussian Splatting for Efficient Dynamic Novel View Synthesis
Jiahao He, Yihua Shao, Zhengkai Zhao, Pan Gao et al. · arXiv · Jul 23, 2026
Dynamic scene reconstruction with 3D Gaussian Splatting requires a balance between fine-grained motion modeling, structural stability, and compact representation. Existing per-primitive methods provide flexible local deformation but often s…
DAPM: UAV Monocular Depth Estimation from Any Height, Pitch, Roll and FOV
Tong Ling, Wenhui Diao, Yingchao Feng, Hanbo Bi et al. · arXiv · Jul 23, 2026
Monocular depth estimation is a fundamental prerequisite for 3D reconstruction and autonomous navigation in Unmanned Aerial Vehicles (UAVs). In practical deployments, UAVs operate under highly dynamic camera poses characterized by continuou…
Learning-based Seam Correspondence Reconstruction in Sewing Patterns
Zhendong Wang, Jintong Wang, Chen Liu, Yao Jin et al. · arXiv · Jul 23, 2026
Digital sewing patterns typically consist of disjoint 2D panels without explicit stitch annotations, making downstream 3D modeling reliant on labor-intensive expert specification. In this paper, we present a graph-based learning framework t…
Loss Landscape Topology Reveals Why Simple Baselines are Competitive at 3D Point Cloud Segmentation Under Class Imbalance
Antonis Savva, Christos Kyrkou, Theocharis Theocharides · arXiv · Jul 23, 2026
Semantic segmentation of 3D point clouds faces severe class imbalance, yet the effectiveness of specialized imbalance-aware methods from 2D computer vision remains unclear in 3D contexts. We systematically evaluate 11 imbalance mitigation a…
Geo3R: Mitigating Spatial Reasoning Hallucination in Multimodal Large Language Models
Mingyu Wang, Weilin Jin, Wenbo Li, Haoyang Huang et al. · arXiv · Jul 23, 2026
Despite remarkable progress in visual understanding, Multimodal Large Language Models (MLLMs) remain prone to hallucinations when reasoning about spatial relationships, often producing judgments that contradict the true 3D structure of the …
WAT3R: Feedforward Underwater 3D Reconstruction
Jiayi Xu, Jiahao Lu, Ziqiang Zheng, Yihao Tan et al. · arXiv · Jul 23, 2026
Reliable feedforward underwater 3D reconstruction remains challenging due to severe light attenuation and backscattering, which degrade visual quality and disrupt feature consistency across views, leading to inaccurate multi-view geometry. …
Sparse Concept Channels in Frozen 3D CT Vision Encoders
Farhad Nooralahzadeh, Lea Bogensperger, Christian Bluethgen, Michael Krauthammer · arXiv · Jul 23, 2026
Large vision-language models are becoming increasingly dominant in 3D medical image interpretation, but we rarely know <i>which</i> internal units encode clinical findings or <i>where</i> that information lives in th…
RECO: Region-Aware Compensation for Extrinsic Perturbations in Roadside 3D Detection
Junsheng Du, Zhaocheng He, Yuhuan Lu · arXiv · Jul 23, 2026
In intelligent transportation systems, roadside 3D object detection provides wide-area perception crucial for traffic understanding, cooperative early warning, and safe autonomous driving. However, existing methods suffer from high sensitiv…
FA-LAM: Focus-Aware Large Avatar Model for One-Shot 4D Animatable Gaussian Head
Yingdong Hu, Yisheng He, Yiming Jiang, Zehong Lin et al. · arXiv · Jul 23, 2026
We propose FA-LAM, a Focus-Aware Large Avatar Model for one-shot animatable Gaussian head creation, while simultaneously enabling static 3D and dynamic 4D full-head recovery. The core of our method lies in a thorough analysis of the attenti…
Engine-Native Editable 3D World Reconstruction with Objects and Lighting
Junhao Chen, Xinghao Chen, Henghaofan Zhang, Zihao Qiao et al. · arXiv · Jul 23, 2026
Editable 3D scene creation requires object instances and lights that can be inspected, moved, and imported into standard engines, yet existing single-image methods largely stop at room-scale geometry, baked/global illumination, or text-driv…
SubSplat: High-Resolution Pixel-aligned 3DGS via Sub-pixel Gaussian Reparameterization
Jiun Lee, Jaekwang Kim, Sangmin Lee · arXiv · Jul 23, 2026
Pixel-aligned Gaussian splatting enables efficient and generalizable novel-view synthesis. However, high-resolution rendering faces a critical trade-off where increasing input resolution improves detail at the expense of quadratically risin…
3D-GIMP: When 3D Gaussian Inpainting Meets PatchMatch
Xuening Tian, Dieter Schmalstieg, Shohei Mori · arXiv · Jul 22, 2026
Recent advances in 3D scene editing have leveraged iterative diffusion models to update input views. However, this process is computationally expensive and struggles to produce sharp details. Meanwhile, ``hallucination drift'' frequently in…
ATSplat: Compact Feed-forward 3D Gaussian Splatting with Adaptive Token Expansion
Cho In, Jeonghwan Cho, Mijin Yoo, Gim Hee Lee et al. · arXiv · Jul 22, 2026
3D Gaussian Splatting (3DGS) achieves high-quality novel-view synthesis by optimizing freely placed primitives in 3D and adaptively densifying them in under-reconstructed regions. However, this scene-adaptive capacity allocation is largely …
GaussianSeed: Hierarchical Gaussian Seeding for High-Resolution 3D Occupancy Prediction
Xinzhuo Li, Xianghui Pan, Jiayuan Du, Wei Wei et al. · arXiv · Jul 22, 2026
Vision-centric 3D occupancy prediction provides dense scene representations essential for autonomous driving and robotic navigation, yet existing methods struggle to scale to high voxel resolutions due to prohibitive computational costs. To…
A Systematic Benchmark of Intensity Normalisation Methods for 3D Knee MRI Segmentation and Cross-Domain Generalisability
Oliver Mills, Philip Conaghan, Samuel Relton · arXiv · Jul 22, 2026
Robust out-of-the-box performance is essential for the clinical deployment of deep learning models in medical imaging. An important but underexplored factor affecting model generalisability is intensity normalisation, particularly for magne…
STEREOFLOW: Progressive Stereo Matching with StereoDiT and Transition Flow Matching
Hao Wang, Haoran Geng, Xiaotong Yang, Jing Tang et al. · arXiv · Jul 22, 2026
Stereo matching is a fundamental task in 3D reconstruction. Despite remarkable advances, the prevailing paradigms formulate stereo matching as a deterministic regression problem, collapsing the multimodal distribution modeling into a single…
StrokeSeg2: Stroke Lesion Segmentation in Clinical Research Workflows
Youwan Mahé, Axel Plessis, Stéphanie Leplaideur, Elise Bannier et al. · arXiv · Jul 22, 2026
Deep learning frameworks like nnU-Net achieve state-of-theart brain lesion segmentation performance but remain difficult to deploy in clinical research environments due to, among other reasons, software dependencies and computational requir…
KineBench: Benchmarking Embodied World Models via IDM-Free Kinematic Grounding
Zeyu Liu, Zhangzhe Zhu, Yang Zhang, Chenyou Fan et al. · arXiv · Jul 22, 2026
Evaluating the physical consistency of embodied world models(EWMs) is a critical open challenge. While closed-loop evaluation via simulator rollouts offers a more faithful assessment of physical plausibility than open-loop alternatives, exi…
Look Before You Edit: Attention-Guided Camera Placement and Multi-View Alignment for 3D Gaussian Splatting Editing
Jaeyeon Park, Taeho Kang, Youngki Lee · arXiv · Jul 22, 2026
Text-driven 3D scene editing with 3D Gaussian Splatting (3DGS) typically applies a 2D diffusion editor to views rendered from fixed training cameras, limiting both the spatial coverage of edits and the user's freedom to target specific obje…
Extending a Large View Synthesis Model for Multi-view Panoptic Segmentation
Kwonyoung Ryu, In-Jae Lee, Jonghyun Jin, Hyunjee Lee et al. · arXiv · Jul 22, 2026
Large view synthesis models synthesize novel views through cross-view attention without explicit 3D representations, and recent studies have shown that they learn accurate spatial correspondence from RGB supervision alone. We observe that t…
A Unified Tokenization Framework for Pain Recognition using Heterogeneous 3D Modalities
Stefanos Gkikas, Christian Arzate Cruz, Valentina Becchetti, Muhammad Umar Khan et al. · arXiv · Jul 22, 2026
Pain is a complex and pervasive phenomenon affecting a large percentage of the population, and accurate assessment is essential for effective clinical management and intervention. Computational pain recognition systems enable continuous mon…
Point-Selection Fine-Tuning Framework for Robust Point Cloud Classification
Da Li, Chang Ma, Dongfu Yin · arXiv · Jul 22, 2026
Noisy and corrupted points can substantially degrade point cloud recognition performance, especially under challenging corruption settings. In particular, full fine-tuning of 3D pre-trained models may amplify the influence of outliers and o…
Ethical Challenges of Bioprinted and Lab-Grown Skin Substitutes in Burn Care
Joshua Khorsandi, Brian Mansoury, Abu-Bakr Ahmed, Liahm Blank et al. · Journal of Burn Care & Rese... · Jul 22, 2026
Abstract Severe burns continue to impose significant global morbidity and mortality, particularly where surgical and critical-care resources are limited. As survival has improved, attention has shifted toward optimizing reconstruction. Biop…
AUTOMATA - Reference enriched 3D data (D5.3)
Arthur Leck, Clément Joubert · Zenodo (CERN European Organ... · Jul 22, 2026
This deliverable presents the main results of Work Package 5 and the implementation of a software-supported workflow for generating enriched 3D models of archaeological artefacts. The principal result of D5.3 consists of the software compon…
D3VL: Understanding Driving Scenes from 3D Time Series Data and Video with Language Models
Heesang Han, A. Lynn Abbott, Abhijit Sarkar · arXiv · Jul 21, 2026
Recent advances in Multimodal Large Language Models (MLLMs) have triggered the development of end-to-end MLLMs for autonomous driving. However, the main emphasis to date has been for MLLMs using 2D images and videos. In contrast, this paper…
IGGT4D: Streaming 4D Instance-Grounded Geometry Transformer
Zhengyu Zou, Hao Li, Kuixuan Jiao, Liu Liu et al. · arXiv · Jul 21, 2026
Real-world spatial intelligence requires agents to understand scenes from continuous video streams, where objects move, persist, disappear, and reappear over time. While recent spatial foundation models have enabled generalizable feed-forwa…

Track 3D Vision & NeRFs on Distill AI — start free →

Latest 3D Vision & NeRFs Research Papers

Recent papers

Related topics