This collection provides public access to a 3D pathology dataset of prostate cancer, allowing researchers to further investigate various 3D tissue structures and their correlation with prostate cancer patient outcomes (biochemical recurrence). These 3D tissue structures are revealed through: (1) a H&E-analog stain, (2) synthetically generated immunofluorescence staining of CK8 (targeting the luminal epithelial cells of all prostate glands), and (3) 3D segmentation masks of the gland lumen, epithelium, and stromal regions of prostate biopsies. This data collection will promote research in the field of computational 3D pathology for clinical decision support.
In this TCIA collection, we provide the 2x down-sampled fused OTLS-imaged images (H&E-analog staining), the synthetic cytokeratin-8 (CK8) immunofluorescent images at 2x-downsampled resolution, the 3D semantic segmentation masks of glands at 4x down-sampled resolution, the clinical data for patient outcomes (biochemical recurrence), and the coordinates for the cancer-enriched regions of each biopsy. All datasets are from the 50 patient cases studied in this publication: [W. Xie et al., Cancer Research, 2022]. The Python code for the deep-learning models, and for 3D glandular segmentations based on synthetic-CK8 datasets, are available on GitHub at https://github.com/WeisiX/ITAS3D.
Note that the 3D pathology datasets provided in this collection were generated in Dr. Jonathan Liu’s lab at the University of Washington with a custom open-top light-sheet (OTLS) microscope developed by the lab [A.K. Glaser et al., Nature Communications, 2019]. There is no clinical metadata within the imaging files and all patients are referred to with coded identifiers. All of the clinical outcomes data provided in this collection have already been published within the supplement of [W. Xie et al., Cancer Research, 2022].
We would like to acknowledge the individuals and institutions that have provided data for this collection:
Research Program (PCRP) through W81XWH-18-10358 (J.T.C. Liu, L.D. True and J.C. Vaughan), W81XWH-19-1-0589 (N.P. Reder), W81XWH-15-1-0558 (A. Madabhushi) and W81XWH-20-1-0851 (A. Madabhushi and J.T.C. Liu).
Support was also provided by the National Cancer Institute (NCI) through K99 CA240681 (A.K. Glaser), R01CA244170 (J.T.C. Liu), U24CA199374 (A. Madabhushi), R01CA249992 (A. Madabhushi), R01CA202752 (A. Madabhushi), R01CA208236 (A. Madabhushi), R01CA216579 (A. Madabhushi), R01CA220581 (A. Madabhushi), R01CA257612 (A. Madabhushi), U01CA239055 (A. Madabhushi), U01CA248226 (A. Madabhushi), and U54CA254566 (A. Madabhushi).
Additional support was provided by the National Heart, Lung and Blood Institute (NHLBI) through R01HL151277 (A. Madabhushi), the National Institute of Biomedical Imaging and Bioengineering (NIBIB) through R01EB031002 (J.T.C. Liu) and R43EB028736 (A. Madabhushi), the National Institute of Mental Health through R01MH115767 (J.C. Vaughan), the VA Merit Review Award IBX004121A from the United States Department of Veterans Affairs (A. Madabhushi), the National Science Foundation (NSF) 1934292 HDR: I-DIRSE-FW (J.T.C. Liu), the NSF Graduate Research Fellowships DGE-1762114 (K.W. Bishop) and DGE-1762114 (L. Barner), the Nancy and Buster Alvord Endowment (C.D. Keene), and the Prostate Cancer Foundation Young Investigator Award (N.P. Reder).
The training and inference of the deep learning models were facilitated by the advanced computational, storage, and networking infrastructure provided by the Hyak supercomputer system, as funded in part by the student technology fee (STF) at the University of Washington.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, the National Institutes of Health, the Department of Defense, the Department of Veterans Affairs, or the United States Government.
Additional publications relating to the data
- Renier, Nicolas, et al. "iDISCO: a simple, rapid method to immunolabel large tissue samples for volume imaging." Cell 159.4 (2014): 896-910.
- Glaser, Adam K., et al. "Multi-immersion open-top light-sheet microscope for high-throughput imaging of cleared tissues." Nature communications 10.1 (2019): 1-8.
|Data Type||Download all or Query/Filter||License|
|Tissue Slide Images (HDF5, TIFF, XML 3.8TB)|
(Download requires Aspera plugin)
|Clinical data (CSV)|
Click the Versions tab for more info about data releases.
Additional Resources for this Dataset
- The Python code for the deep-learning models, and for 3D glandular segmentations based on synthetic-CK8 datasets, are available on GitHub at https://github.com/WeisiX/ITAS3D.
|Pathology Image Statistics|
Hierarchical Data Format V5, Tiff stack
Number of Patients
Number of Images
|Images Size (tB)||3.8|
Our 3D imaging method is entirely slide-free and non-destructive. We image intact tissue specimens (prostate biopsies) that have been labeled with a fluorescent analog of H&E staining and optically cleared to make them transparent to light. The imaging is performed with a custom-developed in-house open-top light-sheet (OTLS) microscope [A.K. Glaser et al., Nature Communications, 2019].
Our OTLS microscope uses an sCMOS camera to collect images of optically cleared tissues in a slice-by-slice manner as we translate the specimens through an illumination “light sheet.” The sCMOS cameras generate raw 16-bit TIFF files (grayscale) that are assembled into volumetric imaging “tiles” within the RAM of the acquisition computer. These imaging tiles are like volumetric bricks of data, as shown in Fig. 1G of [A.K. Glaser et al., Nature Communications, 2019]. Each tile is then saved to the hard drive in the form of a multi-resolution HDF5 file. As adjacent imaging tiles are scanned within the specimen, they are appended into the same HDF5 in the hard drive. The HDF5 file is similar to a 3D TIFF stack except it contains multiple downsampled versions of the dataset so that users can quickly visualize or access the 3D data at whatever resolution they need. The HDF5 file is also “chunked” so that smaller volumetric regions of interest can be quickly accessed. The HDF5 file has an associated XML file that contains microscope metadata (e.g. stage coordinates for each tile, acquisition parameters, etc.).
Before performing computational analysis of the imaging data, we “fuse” the 3D imaging data contained in the raw HDF5 file. This is done with an open-source program called “BigStitcher” that finely aligns all of the volumetric imaging tiles, shears the dataset, and blends the edges of those imaging tiles such that a “seamless” 3D image is generated. In this project, we also downsample the imaging data by a factor of 2X in all three dimensions during this fusion process. The fused dataset is saved as an uncompressed HDF5 file and is used as an input for our downstream ITAS3D segmentation pipeline [W. Xie et al., Cancer Research, 2022].
The fused HDF5 datasets (2X downsampled compared to the original data) are the rawest form of the imaging data that is provided in this TCIA collection. These are two-channel grayscale volumetric datasets in which one channel is the fluorescent nuclear stain (TO-PRO-3) and the second channel is the eosin stain. The TCIA collection also contains synthetic CK8 immunofluorescence datasets (2X downsampled compared to the original data) that are generated through a deep-learning image-translation model [W. Xie et al., Cancer Research, 2022]. The CK8 stain labels the luminal epithelial cells of all prostate glands (benign or cancerous). Finally, we provide the segmentation masks of the lumen, epithelial, and stromal compartments of each biopsy (4X downsampled compared to the original data).
Citations & Data Usage Policy
Users must abide by the TCIA Data Usage Policy and Restrictions. Attribution should include references to the following citations:
Xie, W., Reder, N. P., Koyuncu, C. F., Leo, P., Hawley, S., Huang, H., Mao, C., POSTUPNA, N. A. D. I. A., kang, soyoung, Serafin, R., Gao, G., Han, Q., Bishop, K., Barner, L., Fu, P., Wright, J., Keene, C., Vaughan, J., Janowczyk, A., … Liu, J. (2023). 3D pathology of prostate biopsies with biochemical recurrence outcomes: raw H&E-analog datasets and image translation-assisted segmentation in 3D (ITAS3D) datasets (PCa_Bx_3Dpathology) (Version 1) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/44MA-GX21
Xie, W., Reder, N. P., Koyuncu, C., Leo, P., Hawley, S., Huang, H., Mao, C., Postupna, N., Kang, S., Serafin, R., Gao, G., Han, Q., Bishop, K. W., Barner, L. A., Fu, P., Wright, J. L., Keene, C. D., Vaughan, J. C., Janowczyk, A., … Liu, J. T. C. (2021). Prostate Cancer Risk Stratification via Nondestructive 3D Pathology with Deep Learning–Assisted Gland Analysis. In Cancer Research (Vol. 82, Issue 2, pp. 334–345). American Association for Cancer Research (AACR). https://doi.org/10.1158/0008-5472.can-21-2843
Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. In Journal of Digital Imaging (Vol. 26, Issue 6, pp. 1045–1057). Springer Science and Business Media LLC. https://doi.org/10.1007/s10278-013-9622-7
Other Publications Using This Data
TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.