POCO: 3D Pose and Shape Estimation with Confidence

S.K. Dwivedi; C. Schmid; H. Yi; M.J. Black; D. Tzionas

doi:https://doi.org/10.48550/arXiv.2308.12965

POCO: 3D Pose and Shape Estimation with Confidence

Authors	S.K. Dwivedi C. Schmid H. Yi M.J. Black D. Tzionas
Publication date	2024
Book title	2024 International Conference in 3D Vision
Book subtitle	3DV 2024 : 18-21 March 2024, Davos, Switzerland : proceedings
ISBN	9798350362466
ISBN (electronic)	9798350362459
Event	11th International Conference on 3D Vision
Pages (from-to)	85-95
Publisher	Piscataway, NJ: IEEE Computer Society
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	The regression of 3D Human Pose and Shape HPS from an image is becoming increasingly accurate. This makes the results useful for downstream tasks like human action recognition or 3D graphics. Yet, no regressor is perfect, and accuracy can be affected by ambiguous image evidence or by poses and appearance that are unseen during training. Most current HPS regressors, however, do not report the confidence of their outputs, meaning that downstream tasks cannot differentiate accurate estimates from inaccurate ones. To address this, we develop POCO, a novel framework for training HPS regressors to estimate not only a 3D human body, but also their confidence, in a single feed-forward pass. Specifically, POCO estimates both the 3D body pose and a per-sample variance. The key idea is to introduce a Dual Conditioning Strategy (DCS) for regressing uncertainty that is highly correlated to pose reconstruction quality. The POCO framework can be applied to any HPS regressor and here we evaluate it by modifying HMR, PARE, and CLIFF. In all cases, training the network to reason about uncertainty helps it learn to more accurately estimate 3D pose. While this was not our goal, the improvement is modest but consistent. Our main motivation is to provide uncertainty estimates for downstream tasks; we demonstrate this in two ways: (1) We use the confidence estimates to bootstrap HPS training. Given unlabeled image data, we take the confident estimates of a POCO-trained regressor as pseudo ground truth. Retraining with this automatically-curated data improves accuracy. (2) We exploit uncertainty in video pose estimation by automatically identifying uncertain frames (e.g. due to occlusion) and inpainting these from confident frames.
Document type	Conference contribution
Note	With supplemental items
Language	English
Published at	https://doi.org/10.48550/arXiv.2308.12965 https://doi.org/10.1109/3DV62453.2024.00115
Published at	https://poco.is.tue.mpg.de/
Other links	https://www.proceedings.com/74990.html
Downloads	POCO_3D_Pose_and_Shape_Estimation_with_Confidence (Final published version)
Supplementary materials	mm_624500a085
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

POCO: 3D Pose and Shape Estimation with Confidence