上海人工智能实验室,VideoChat : Chat-Centric Video Understanding

VideoChat : Chat-Centric Video Understanding

2023-06-05

In this study, we initiate an exploration into video understanding by introducing VideoChat, an end-to-end chat-centric video understanding system. It integrates video foundation models and large language models via a learnable neural interface, excelling in spatiotemporal reasoning, event localization, and causal relationship inference. To instructively tune this system, we propose a video-centric instruction dataset, composed of thousands of videos matched with detailed descriptions and conversations. This dataset emphasizes spatiotemporal reasoning and causal relationships, providing a valuable asset for training chat-centric video understanding systems. Preliminary qualitative experiments reveal our system’s potential across a broad spectrum of video applications and set the standard for future research. Access our code and data at https://github.com/OpenGVLab/Ask-Anything.

KunChang Li^∗1,4 , Yinan He^∗1 , Yi Wang^∗†1 , Yizhuo Li^1,3 , Wenhai Wang¹

Ping Luo³ , Yali Wang^4,1 , Limin Wang^2,1, Yu Qiao¹

¹OpenGVLab, Shanghai AI Laboratory ²Nanjing University ³The University of Hong Kong ⁴Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

https://github.com/OpenGVLab/Ask-Anything

Abstract

${ v.newstitle }

${ v.newstitle }

新闻动态

科研活动

${ v.newstitle }

${ v.newstitle }

InternVL

MinerU

LMDeploy

InternLM

OpenCompass

XTuner

${ v.newstitle }

${ v.newstitle }

社会招聘和校园招聘

招生信息

${ v.newstitle }

VideoChat : Chat-Centric Video Understanding

2023-06-05