BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision

发表会议及期刊：arXiv

Chenyu Yang^1*Yuntao Chen^2* Hao Tian^3* Chenxin Tao¹Xizhou Zhu³ Zhaoxiang Zhang^2,4Gao Huang¹

Hongyang Li⁵ Yu Qiao⁵ Lewei Lu³Jie Zhou¹ Jifeng Dai^{1,5 ✉}

¹Tsinghua University²Centre for Artificial Intelligence and Robotics, HKISI CAS ³SenseTime Research ⁴Institute of Automation, Chinese Academy of Science (CASIA) ⁵Shanghai Artificial Intelligence Laboratory

{yangcy19, tcx20}@mails.tsinghua.edu.cn, chenyuntao08@gmail.com, tianhao2@senseauto.com

{zhuwalter, luotto}@sensetime.com, zhaoxiang.zhang@ia.ac.cn

{gaohuang, jzhou, daijifeng}@tsinghua.edu.cn, {lihongyang, qiaoyu}@pjlab.org.cn

Abstract

We present a novel bird’s-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. Existing state-of-theart BEV detectors are often tied to certain depth pretrained backbones like VoVNet, hindering the synergy between booming image backbones and BEV detectors. To address this limitation, we prioritize easing the optimization of BEV detectors by introducing perspective view supervision. To this end, we propose a two-stage BEV detector, where proposals from the perspective head are fed into the bird’s-eye-view head for final predictions. To evaluate the effectiveness of our model, we conduct extensive ablation studies focusing on the form of supervision and the generality of the proposed detector. The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset. The code shall be released soon.

comm@pjlab.org.cn

上海市徐汇区龙文路129号国际传媒港L1楼

沪ICP备2021009351号-1

科学研究

BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision

网站地图