Latest Face Recognition Research Papers
The newest Face Recognition papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Face Recognition so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Face Recognition papers in your inbox — free →Recent papers
- A History-Aware Visually Grounded Critic for Computer Use AgentsJaewoo Lee, Zaid Khan, Archiki Prasad, Justin Chih-Yao Chen et al. · arXiv · Jun 9, 2026
Various test-time interventions for Computer Use Agents (CUAs), including critic models, have been developed to improve performance through pre-execution action evaluation in complex Graphical User Interface (GUI) environments. However, exi…
- IPSM-Bench: A New Intermediate Phase Segmentation Benchmark in Microstructure Images of Zinc-Based Absorbable BiomaterialsJinglin Xu, Shangyan Zhao, Jiabo Wang, Xinghong Mu et al. · arXiv · Jun 9, 2026
Zinc-based alloys are indispensable emerging absorbable metallic biomaterials, and their macroscopic performance is governed by microstructural characteristics. Intermediate phases-key microstructural constituents-are pivotal in regulating …
- End-to-End Optimization of Incoherent Imaging for Classification Under Detector-Limited ReadoutArcher Wang, Joshua Chen, Sachin Vaidya, Marin Soljačić · arXiv · Jun 8, 2026
End-to-end co-optimization of optical front-ends (e.g. metasurfaces) and neural network back-ends has been widely applied to imaging tasks, yet a formalism characterizing when and why such systems outperform conventional lens-based imaging …
- Cranio-Diff: Diffusion-based Cross-domain Craniofacial Reconstruction with 2D X-ray Skull Guidance and Structural Identity ConstraintsRavi Shankar Prasad, Naresh Gurjar, Shashank Baghel, Chirag et al. · arXiv · Jun 8, 2026
The state-of-the-art generative models, such as CycleGAN, Pix2Pix, and diffusion models have demonstrated remarkable performance in the face generation task. However, they fail to effectively capture cross-modality semantic information in c…
- Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher SupervisionMateo Diaz-Bone, Daniel Caraballo, Florian Scheidegger, Thomas Frick et al. · arXiv · Jun 8, 2026
Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods face challenges when foundational assumptions - such as consistent object scale, …
- StoryVideoQA: Scaling Deep Video Understanding with a Large-Scale, Multi-Genre and Auto-Generated DatasetZhengqian Wu, Zhixian Liu, Aodong Chen, Jingyang Zhang et al. · IJCV · Jun 4, 2026
Video question answering (VideoQA) aims to answer questions about given videos. While existing approaches excel on factoid VideoQA, they struggle with deep video understanding (DVU), which requires the comprehension of complex storylines. T…
- AdaCodec: A Predictive Visual Code for Video MLLMsHaowen Hou, Zhen Huang, Zheming Liang, Qingyi Si et al. · arXiv · Jun 1, 2026
Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visu…
- Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth EstimationSiyuan Bian, Congrong Xu, Jun Gao · arXiv · Jun 1, 2026
Despite advances in depth estimation, flying points remain a persistent failure mode: near object boundaries, depth estimators often predict spurious 3D points in the empty space between foreground and background surfaces. We trace this art…
- AFUN: Towards an Affordance Foundation Model for Functionality UnderstandingZhaoning Wang, Yi Zhong, Jiawei Fu, Henrik I. Christensen et al. · arXiv · Jun 1, 2026
Affordance understanding bridges visual perception and physical action, serving as an explainable interface for robot manipulation in open and unstructured real-world environments. Yet, building an affordance foundation model that not only …
- Not All Points Are Equal: Uncertainty-Aware 4D LiDAR Scene SynthesisXiang Xu, Alan Liang, Youquan Liu, Xian Sun et al. · arXiv · Jun 1, 2026
Constructing faithful 4D worlds from LiDAR-acquired sequences is crucial for embodied AI, yet current generative frameworks apply uniform modeling capacity across all spatial regions. This ignores that perceptual difficulty varies dramatica…
- City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View ImagesSayan Paul, Sourav Ghosh, Siddharth Katageri, Soumyadip Maity et al. · arXiv · May 28, 2026
City-scale 3D surface reconstruction from multiview images for downstream 3D simulation, poses highly challenging problems due to the scale and complexity of urban scenes. Existing city-scale 3D reconstruction methods based on NeRF, Gaussia…
- Self-Prophetic Decoding to Unlock Visual Search in LVLMsZhendong He, Qiyuan Dai, Guanbin Li, Liang Lin et al. · arXiv · May 27, 2026
Large Vision-Language Models (LVLMs) are rapidly evolving toward true multimodal reasoning, with visual search representing a concrete instantiation of the thinking-with-images paradigm. However, LVLM visual search faces two key challenges:…
- SeeGroup: Multi-Layer Depth Estimation of Transparent Surfaces via Self-Determined GroupingHongyu Wen, Jia Deng · arXiv · May 27, 2026
Transparent objects are common in daily life, and it is important to understand their multilayer depth, including the transparent surface and the objects behind it. Existing methods for multilayer depth typically extend single-layer predict…
- Measuring Frame Evolution: Smoothed Temporal Framing Trajectories in Complex Policy DebatesPhilip Leifeld, Kristijan Garic · JCMS Journal of Common Mark... · May 27, 2026
Abstract The European Union faces long‐term governance challenges in contested domains, such as migration management, health data sharing, and facial recognition technology. Across these fields, political debates are shaped by shifting ways…
- TriSplat: Simulation-Ready Feed-Forward 3D Scene ReconstructionWeijie Wang, Zimu Li, Jinchuan Shi, Zeyu Zhang et al. · arXiv · May 25, 2026
Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only in…
- Helix4D: Complex 4D Mesh GenerationJiraphon Yenphraphai, Jianqi Chen, Jian Wang, Gordon Qian et al. · arXiv · May 25, 2026
Current video-to-4D methods struggle with complex topology changes, transparent materials, thin structures, and inner surfaces. We present Helix4D, a dynamic mesh generation framework by inheriting the expressive representation of Trellis2,…
- InstructSAM: Segment Any Instance with Any InstructionsYuqian Yuan, Wentong Li, Zhaocheng Li, Yutong Lin et al. · arXiv · May 25, 2026
In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulates instruction-driven instance segmentation as a set-structured query prediction …
- Look Both Ways Before You Cross: Lifting Cross Fields From 2D Visual PriorsDale Decatur, Jacob Serfaty, Oded Stein, Amir Vaxman et al. · arXiv · May 25, 2026
We present CrossLift, a technique for computing cross fields on meshes guided by visual features in images. We leverage powerful text-to-image priors that are capable of synthesizing images of feature-aligned quad meshes in 2D. We extract t…
- Synthetic Data Alone is Enough? Rethinking Data Scarcity in Pediatric Rare Disease RecognitionGanlin Feng, Yuxi Long, Erin Lou, Lianghong Chen et al. · arXiv · May 21, 2026
Children with rare genetic diseases often exhibit distinctive facial phenotypes, yet developing computer vision systems for early diagnosis remains challenging due to extreme data scarcity, privacy constraints, and limited data sharing in p…
- EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric VideosRuiping Liu, Junwei Zheng, Yufan Chen, Di Wen et al. · arXiv · May 18, 2026
Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchm…
- Better Together: Evaluating the Complementarity of Earth Embedding ModelsThijs L van der Plas, Jacob JW Bakermans, Vishal Nedungadi, Gabrielė Tijūnaitytė et al. · arXiv · May 18, 2026
Earth embedding models transform Earth observation data into embeddings uniquely tied to locations on the Earth's surface. These models are typically evaluated in isolation, comparing the downstream task performance across different Earth e…
- MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI AgentsZiyun Zeng, Hang Hua, Bocheng Zou, Mu Cai et al. · arXiv · May 18, 2026
Recent GUI agents have made substantial progress in visual grounding and action prediction, yet they remain brittle in long-horizon tasks that require maintaining task state across many interface transitions. Existing agents typically rely …
- CoralLite: μCT Reconstruction of Coral Colonies from Individual CorallitesJess Jones, Leonardo Bertini, Kenneth Johnson, Erica Hendy et al. · arXiv · May 14, 2026
The life history of an individual coral is archived within the accreting skeleton of the colony. While reef-forming coral colonies (e.g. massive \emph{Porites} sp.) may live for hundreds of years and deposit calcareous structures many metre…
- Revisiting Photometric Ambiguity for Accurate Gaussian-Splatting Surface ReconstructionJiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng et al. · arXiv · May 12, 2026
Surface reconstruction with differentiable rendering has achieved impressive performance in recent years, yet the pervasive photometric ambiguities have strictly bottlenecked existing approaches. This paper presents AmbiSuR, a framework tha…
- CAD-feature enhanced machine learning for manufacturing effort estimation on sheet metal bending partsMatteo Ballegeer, Toon Van Camp, Willem Jaspers, Alp Bayar et al. · arXiv · May 12, 2026
Graph-based machine learning has emerged as a promising approach for manufacturability analysis by learning directly from CAD models represented as Boundary Representations (B-reps), exploiting both surface geometry and topological connecti…
- BabelDOC: Better Layout-Preserving PDF Translation via Intermediate RepresentationQi Yang, Xiangyao Ma, Xiao Wang, Hao Wang et al. · arXiv · May 11, 2026
As global cross-lingual communication intensifies, language barriers in visually rich documents such as PDFs remain a practical bottleneck. Existing document translation pipelines face a tension between linguistic processing and layout pres…
- PhyGround: Benchmarking Physical Reasoning in Generative World ModelsJuyi Lin, Arash Akbari, Yumei He, Lin Zhao et al. · arXiv · May 11, 2026
Generative world models are increasingly used for video generation, where learned simulators are expected to capture the physical rules that govern real-world dynamics. However, evaluating whether generated videos actually follow these rule…
- Framing Artificial Intelligence: Public Discourse on Facial Recognition in the European Union and the United StatesKerem Öge, Manuel Quintin · JCMS Journal of Common Mark... · May 11, 2026
Abstract To what extent is AI regulation influenced by frames and discourse coalitions? To address this question, we use complex systems and framing theories to analyse public discourse on facial recognition in the European Union (EU) and t…
- EmambaIR: Efficient Visual State Space Model for Event-guided Image ReconstructionWei Yu, Yunhang Qian · arXiv · May 8, 2026
Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations:…
- MoCoTalk: Multi-Conditional Diffusion with Adaptive Router for Controllable Talking Head GenerationXinyan Ye, Jiankang Deng, Abbas Edalat · arXiv · May 8, 2026
Talking-head generation requires joint modeling of identity, head pose, facial expression, and mouth dynamics. Existing methods typically address only a subset of these factors, and rely on fixed-weight or heuristic fusion when multiple con…