PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

来源：上海人工智能实验室｜2023-10-25

In contrast to numerous NLP and 2D computer vision foundational models, the learning of a robust and highly generalized 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and the diversity of downstream tasks. In this paper, we introduce a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models. Motivated by the fact that informative 3D features should be able to encode rich geometry and appearance cues that can be utilized to render realistic images, we propose a novel universal paradigm to learn point cloud representations by differentiable neural rendering, serving as a bridge between 3D and 2D worlds. We train a point cloud encoder within a devised volumetric neural renderer by comparing the rendered images with the real images. Notably, our approach demonstrates the seamless integration of the learned 3D encoder into diverse downstream tasks. These tasks encompass not only high-level challenges such as 3D detection and segmentation but also low-level objectives like 3D reconstruction and image synthesis, spanning both indoor and outdoor scenarios. Besides, we also illustrate the capability of pre-training a 2D backbone using the proposed universal methodology, surpassing conventional pre-training methods by a large margin. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks. The consistent improvements in various settings imply the effectiveness of the proposed method. Code and models will be made available at https://github.com/OpenGVLab/PonderV2.

Haoyi Zhu^1,4∗, Honghui Yang^1,3∗, Xiaoyang Wu^1,2∗, Di Huang^1∗, Sha Zhang^1,4, Xianglong He¹, Tong He^1†, Hengshuang Zhao², Chunhua Shen³, Yu Qiao¹, Wanli Ouyang¹¹Shanghai Artificial Intelligence Laboratory ²The University of Hong Kong³Zhejiang University ⁴University of Science and Technology of China

Abstract

${ v.newstitle }

${ v.newstitle }

新闻动态

科研活动

${ v.newstitle }

${ v.newstitle }

InternVL

MinerU

LMDeploy

InternLM

OpenCompass

XTuner

${ v.newstitle }

${ v.newstitle }

社会招聘和校园招聘

招生信息

${ v.newstitle }

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

来源：上海人工智能实验室｜2023-10-25

${ v.newstitle }

${ v.newstitle }

新闻动态

科研活动

${ v.newstitle }

${ v.newstitle }

InternVL

MinerU

LMDeploy

InternLM

OpenCompass

XTuner

${ v.newstitle }

${ v.newstitle }

社会招聘和校园招聘

招生信息

${ v.newstitle }

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

来源： 上海人工智能实验室｜2023-10-25

来源：上海人工智能实验室｜2023-10-25