科学研究

Research

首页 >  论文  >  详情

A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer

发表会议及期刊:scientific data

Kexin Ding1, Mu Zhou2, He Wang3, Olivier Gevaert4, Dimitris Metaxas& Shaoting Zhang


Abstract

The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks.


1Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, 28262, USA.2Sensebrain Research, San Jose, CA, 95131, USA. 3Department of Pathology, Yale University, New Haven, CT,06520, USA. 4Stanford Center for Biomedical Informatics Research, Department of Medicine and Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA. 5Department of Computer Science, Rutgers University, New Brunswick, NJ, 08901, USA. 6Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.

✉e-mail: zhangshaoting@pjlab.org.cn



comm@pjlab.org.cn

上海市徐汇区云锦路701号西岸国际人工智能中心37-38层

沪ICP备2021009351号-1