科学研究

Research

首页 >  科学研究
研究方向

人工智能基础理论

开展人工智能前沿基础理论研究,包括机器学习、强化学习、深度学习、知识计算、因果推理、信息安全等;关注人工智能交叉学科研究,探索数据驱动的科学研究新范式。

人工智能开放平台

构建人工智能新型大数据、算法和算力等平台,全面支撑人工智能基础和应用研究。

人工智能基础软件和基础硬件系统

开展人工智能基础软硬件系统的研发,构建技术生态的软硬件基础,包括新一代人工智能训练框架、编程语言、编译器等基础软件,人工智能芯片、传感器等基础硬件。

人工智能应用

探索人工智能技术在城市、交通、医疗、教育、文旅、金融、制造业等行业的应用,关注新领域,开展共性技术平台的研发。

人工智能核心技术

发展新一代人工智能技术,包括计算机视觉、自然语言处理、语音处理、决策智能、智能机器人、城市计算、计算机图形学、数字孪生等。

人工智能伦理与政策

关注人工智能可能引发的经济、社会、伦理、法律、安全、隐私和数据治理等问题,提出解决方案,提供政策参考。

学术成果
发表会议及期刊:Nature Communications 2024

AI-driven projection tomography with multicore fibre-optic cell rotation

Optical tomography has emerged as a non-invasive imaging method, providing three-dimensional insights into subcellular structures and thereby enabling a deeper understanding of cellular functions, interactions, and processes. Conventional optical tomography methods are constrained by a limited illumination scanning range, leading to anisotropic resolution and incomplete imaging of cellular structures. To overcome this problem, we employ a compact multi-core fibre-optic cell rotator system that facilitates precise optical manipulation of cells within a microfluidic chip, achieving full-angle projection tomography with isotropic resolution. Moreover, we demonstrate an AI-driven tomographic reconstruction workflow, which can be a paradigm shift from conventional computational methods, often demanding manual processing, to a fully autonomous process. The performance of the proposed cell rotation tomography approach is validated through the three-dimensional reconstruction of cell phantoms and HL60 human cancer cells. The versatility of this learning-based tomographic reconstruction workflow paves the way for its broad application across diverse tomographic imaging modalities, including but not limited to flow cytometry tomography and acoustic rotation tomography. Therefore, this AI-driven approach can propel advancements in cell biology, aiding in the inception of pioneering therapeutics, and augmenting early-stage cancer diagnostics.

发表会议及期刊:arXiv 2023

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

In contrast to numerous NLP and 2D computer vision foundational models, the learning of a robust and highly generalized 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and the diversity of downstream tasks. In this paper, we introduce a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models. Motivated by the fact that informative 3D features should be able to encode rich geometry and appearance cues that can be utilized to render realistic images, we propose a novel universal paradigm to learn point cloud representations by differentiable neural rendering, serving as a bridge between 3D and 2D worlds. We train a point cloud encoder within a devised volumetric neural renderer by comparing the rendered images with the real images. Notably, our approach demonstrates the seamless integration of the learned 3D encoder into diverse downstream tasks. These tasks encompass not only high-level challenges such as 3D detection and segmentation but also low-level objectives like 3D reconstruction and image synthesis, spanning both indoor and outdoor scenarios. Besides, we also illustrate the capability of pre-training a 2D backbone using the proposed universal methodology, surpassing conventional pre-training methods by a large margin. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks. The consistent improvements in various settings imply the effectiveness of the proposed method. Code and models will be made available at https://github.com/OpenGVLab/PonderV2.

发表会议及期刊:arXiv 2023

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale realscanned 3D databases. To facilitate the development of 3D perception, reconstruction, and generation in the real world, we propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. OmniObject3D has several appealing properties: 1) Large Vocabulary: It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popBCorresponding authors. https://omniobject3d.github.io/ ular 2D datasets (e.g., ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations. 2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos. 3) Realistic Scans: The professional scanners support highquality object scans with precise shapes and realistic appearances. With the vast exploration space offered by OmniObject3D, we carefully set up four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c) neural surface reconstruction, and d) 3D object generation. Extensive studies are performed on these four benchmarks, revealing 1 arXiv:2301.07525v2 [cs.CV] 11 Apr 2023 new observations, challenges, and opportunities for future research in realistic 3D vision.

发表会议及期刊:arXiv

2023

FengWu: Pushing the Skillful Global Medium-range Weather Forecast Beyond 10 Days Lead

We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25° latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 m2/s2. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.

发表会议及期刊:Lancet Digit Health

2022

Spatially aware graph neural networks and cross-level molecular profile prediction in colon cancer histopathology: a retrospective multi-cohort study

The model was developed on 459 colon tumour whole-slide images from TCGA-COAD, and externally validated on 165 rectum tumour whole-slide images from TCGA-READ and 161 colon tumour whole-slide images from CPTAC-COAD. For TCGA cohorts, our method accurately predicted the molecular classes of the gene mutations (area under the curve [AUCs] from 82·54 [95% CI 77·41–87·14] to 87·08 [83·28–90·82] on TCGA-COAD, and AUCs from 70·46 [61·37–79·61] to 81·80 [72·20–89·70] on TCGA-READ), along with genes with copy number alterations (AUCs from 81·98 [73·34–89·68] to 90·55 [86·02–94·89] on TCGA-COAD, and AUCs from 62·05 [48·94–73·46] to 76·48 [64·78–86·71] on TCGA-READ), microsatellite instability (MSI) status classification (AUC 83·92 [77·41–87·59] on TCGA-COAD, and AUC 61·28 [53·28–67·93] on TCGA-READ), and protein expressions (AUCs from 85·57 [81·16–89·44] to 89·64 [86·29–93·19] on TCGA-COAD, and AUCs from 51·77 [42·53–61·83] to 59·79 [50·79–68·57] on TCGA-READ). For the CPTAC-COAD cohort, our model predicted a panel of gene mutations with AUC values from 63·74 (95% CI 52·92–75·37) to 82·90 (73·69–90·71), genes with copy number alterations with AUC values from 62·39 (51·37–73·76) to 86·08 (79·67–91·74), and MSI status prediction with AUC value of 73·15 (63·21–83·13).

comm@pjlab.org.cn

上海市徐汇区云锦路701号西岸国际人工智能中心37-38层

沪ICP备2021009351号-1