Summary
Breast cancer is among the most common cancers and a common cause of death among women. Over 39 million breast cancer screening exams are performed every year and are among the most common radiological tests. This creates a high need for accurate image interpretation. Machine learning has shown promise in interpretation of medical images. However, limited data for training and validation remains an issue.
Here, we share a curated dataset of digital breast tomosynthesis images that includes normal, actionable, biopsy-proven benign, and biopsy-proven cancer cases. The dataset contains four components: (1) DICOM images, (2) a spreadsheet indicating which group each case belongs to, and (3) annotation boxes. A detailed description of this dataset can be found in the following paper:
Publication Citation
Please reference this paper if you use this dataset. Version 1 of the dataset contains only a subset of all data described in the paper above. More data will be share in subsequent versions.
Acknowledgements
We would like to acknowledge the individuals and institutions that have provided data for this collection:
Duke University Hospital/Duke University, Durham, NC, USA
We would like to acknowledge all those who contributed to the curation of this dataset
This work was supported by a grant from the NIH: 1 R01 EB021360 (PI: Mazurowski).
Data Access
Click the Download button to save a ".tcia" manifest file to your computer, which you must open with the NBIA Data Retriever . Click the Search button to open our Data Portal, where you can browse the data collection and/or download a subset of its contents.
Data Type | Download all or Query/Filter |
---|---|
Images (DICOM, XX.X GB) DBT |
(Search button will not work until the data are ready to be released) |
Image Metadata (csv) | |
Boxes indicating lesion locations (csv) | |
Spreadsheet indicating which group each cases belongs to (see the paper for details on the groups) (csv) |
Click the Versions tab for more info about data releases.
Detailed Description
Image Statistics | |
---|---|
Modalities | DBT |
Number of Participants | 693 |
Number of Studies | 700 |
Number of Series | 2596 |
Number of Images | 2596 |
Images Size (GB, compressed) | Added when data released |
Citations & Data Usage Policy
Add any special restrictions in here.
Data Citation
Buda, M., Saha, A., Li, N., Mazurowski, M.A. (2020). Data from the Breast Cancer Screening DBT. Data from The Cancer Imaging Archive. (2020). http://doi.org (Coming soon).
Publication Citation
Buda, M., Saha, A., Walsh, R., Ghate, S., Li, N., Święcicki, A., Lo, J.Y., Mazurowski, M.A., Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model. arXiv preprint https://arxiv.org/abs/2011.07995.
TCIA Citation
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7
Other Publications Using This Data
TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.
Version 1 (Current): Updated 2020/mm/dd
Data Type | Download all or Query/Filter |
---|---|
Images (DICOM, XX.X GB) DBT |
(Requires NBIA Data Retriever .) (Search button will not work until the data is ready to be released) |
Image Metadata (csv) | |
Boxes indicating lesion locations (csv) | |
Spreadsheet indicating which group each cases belongs to (see the paper for details on the groups) (csv) |