Recovering Parametric Scenes from Very Few Time-of-Flight Pixels
ICCV 2025

Teaser Image

Figure: We introduce a method for recovering the geometry of parametric 3D scenes, such as the 6D pose of a known object, from a distributed set of very few (e.g., 15), diffuse (i.e., wide field-of-view) single-pixel ToF Sensors. Methods based on traditional depth sensors suffer poor performance under a low-pixel-count regime due to their sparse coverage. Our method outperforms a point cloud-based baseline by utilizing the entirety of data recovered by a diffuse ToF sensor.

Abstract

We aim to recover the geometry of 3D parametric scenes using very few depth measurements from low-cost, commercially available time-of-flight sensors. These sensors offer very low spatial resolution (i.e., a single pixel), but image a wide field-of-view per pixel and capture detailed time-of-flight data in the form of time-resolved photon counts. This time-of-flight data encodes rich scene information and thus enables recovery of simple scenes from sparse measurements. We investigate the feasibility of using a distributed set of few measurements (e.g., as few as 15 pixels) to recover the geometry of simple parametric scenes with a strong prior, such as estimating the 6D pose of a known object. To achieve this, we design a method that utilizes both feed-forward prediction to infer scene parameters, and differentiable rendering within an analysis-by-synthesis framework to refine the scene parameter estimate. We develop hardware prototypes and demonstrate that our method effectively recovers object pose given an untextured 3D model in both simulations and controlled real-world captures, and show promising initial results for other parametric scenes. We additionally conduct experiments to explore the limits and capabilities of our imaging solution.

Method

Method Architecture Diagram


We present an analysis-by-synthesis based approach. Our method integrates (1) a learning-based feedforward model which predicts an initial estimate of scene parameters; (2) a differentiable renderer that synthesizes sensor measurements given scene parameters in the parametric model; and (3) an optimization-based refiner that iteratively renders sensor measurements to optimize scene parameters using the differentiable renderer. To address the scarcity of real-world imaging data, we re-use our renderer to generate a large-scale synthetic dataset for training our feedforward model, and explore its ability to transfer to real-world captures.

Results: 6D Pose Estimation with 15 ToF Pixels

6D Pose Estimaion Results on 3D Printed Objects
6D Pose Estimaion Results on YCB Objects

Results: Hand Pose Estimation with 8 ToF Pixels

Hand Pose Estimation Results

Citation


The website template was borrowed from Michaël Gharbi.