科学研究_上海人工智能实验室

科学研究

Research

首页 > 科学研究

研究方向

人工智能基础理论

开展人工智能前沿基础理论研究，包括机器学习、强化学习、深度学习、知识计算、因果推理、信息安全等；关注人工智能交叉学科研究，探索数据驱动的科学研究新范式。

人工智能开放平台

构建人工智能新型大数据、算法和算力等平台，全面支撑人工智能基础和应用研究。

人工智能基础软件和基础硬件系统

开展人工智能基础软硬件系统的研发，构建技术生态的软硬件基础，包括新一代人工智能训练框架、编程语言、编译器等基础软件，人工智能芯片、传感器等基础硬件。

人工智能应用

探索人工智能技术在城市、交通、医疗、教育、文旅、金融、制造业等行业的应用，关注新领域，开展共性技术平台的研发。

人工智能核心技术

发展新一代人工智能技术，包括计算机视觉、自然语言处理、语音处理、决策智能、智能机器人、城市计算、计算机图形学、数字孪生等。

人工智能伦理与政策

关注人工智能可能引发的经济、社会、伦理、法律、安全、隐私和数据治理等问题，提出解决方案，提供政策参考。

学术成果

发表会议及期刊：arXiv 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

发表会议及期刊：arXiv 2024

Implicit Event-RGBD Neural SLAM

Implicit neural SLAM has achieved remarkable progress recently. Nevertheless, existing methods face significant challenges in non-ideal scenarios, such as motion blur or lighting variation, which often leads to issues like convergence failures, localization drifts, and distorted mapping.

发表会议及期刊：arXiv 2024

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

In this paper, we introduce GS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy.

发表会议及期刊：arXiv 2024

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Hallucination, posed as a pervasive challenge of multimodal large language models (MLLMs), has significantly impeded their real-world usage that demands precise judgment. Existing methods mitigate this issue with either training with specific designed data or inferencing with external knowledge from other sources, incurring inevitable additional costs.

发表会议及期刊：CVPR 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

We introduce Deformable Convolution v4 (DCNv4), a highly efficient and effective operator designed for a broad spectrum of vision applications. DCNv4 addresses the limitations of its predecessor, DCNv3, with two key enhancements

发表会议及期刊：arXiv 2024

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency, albeit at the expense of significant wall-clock time consumption.

发表会议及期刊：CVPR 2024

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Despite significant recent progress in the field of autonomous driving, modern methods still struggle and can incur serious accidents when encountering long-tail un foreseen events and challenging urban scenarios. On the one hand, large language models (LLM) have shown impressive reasoning capabilities that approach “Artificial General Intelligence”.

发表会议及期刊：NeurIPS 2024

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity.

发表会议及期刊：arXiv 2024

Geometry-enhanced Pretraining on Interatomic Potentials

Machine learning interatomic potentials (MLIPs) describe the interactions between atoms in materials and molecules by learning them from a reference database generated by ab initio calculations. MLIPs can accurately and efciently predict such interactions and have been applied to various felds of physical science. However, high-performance MLIPs rely on a large amount of labelled data, which are costly to obtain by ab initio calculations.

发表会议及期刊：CVPR

2024

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

发表会议及期刊：NeurIPS

2024

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

发表会议及期刊：arXiv

2024

Geometry-enhanced Pretraining on Interatomic Potentials

发表会议及期刊：arXiv

2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with...

发表会议及期刊：

金融大模型应用评测报告摘要版

发表会议及期刊：arXiv

2023

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation.

<1 2 3 4 5 >>>

comm@pjlab.org.cn

上海市徐汇区龙文路129号国际传媒港L1楼

沪ICP备2021009351号-1

科学研究

人工智能基础理论

人工智能开放平台

人工智能基础软件和基础硬件系统

人工智能应用

人工智能核心技术

人工智能伦理与政策

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Implicit Event-RGBD Neural SLAM

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Geometry-enhanced Pretraining on Interatomic Potentials

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Geometry-enhanced Pretraining on Interatomic Potentials

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

金融大模型应用评测报告 摘要版

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

网站地图

金融大模型应用评测报告摘要版