Summary
Breast cancer is among the most common cancers and a common cause of death among women. Over 39 million breast cancer screening exams are performed every year and are among the most common radiological tests. This creates a high need for accurate image interpretation. Machine learning has shown promise in interpretation of medical images. However, limited data for training and validation remains an issue.
Here, we share a curated dataset of digital breast tomosynthesis images that includes normal, actionable, biopsy-proven benign, and biopsy-proven cancer cases.
A detailed description of this dataset can be found in the following paper:
Buda, A. Saha, R. Walsh, S. Ghate, N. Li, A. Święcicki, J. Y. Lo, M. A. Mazurowski, Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model. arXiv preprint arXiv:2011.07995.
Please reference this paper if you use this dataset. Please note that version 1 of the dataset contains only a subset of all data described in the paper above. More data will be share in subsequent versions.
The dataset contains four components: (1) DICOM images, (2) a spreadsheet indicating which group each case belongs to, and (3) annotation boxes.
Acknowledgements
We would like to acknowledge the individuals and institutions that have provided data for this collection:
Duke University Hospital/Duke University, Durham, NC, USA
We would like to acknowledge all those who contributed to the curation of this dataset
This work was supported by a grant from the NIH: 1 R01 EB021360 (PI: Mazurowski).
Data Access
Click the Download button to save a ".tcia" manifest file to your computer, which you must open with the NBIA Data Retriever . Click the Search button to open our Data Portal, where you can browse the data collection and/or download a subset of its contents.
Data Type | Download all or Query/Filter |
---|---|
Images (DICOM, XX.X GB) DBT |
(Download and search button not working) |
Annotations (csv) | |
Labels (csv) |
Click the Versions tab for more info about data releases.
Detailed Description
Image Statistics | |
---|---|
Modalities | DBT |
Number of Participants | 985 |
Number of Studies | 1000 |
Number of Series | 3592 |
Number of Images | 3592 |
Images Size (TB, compressed) | 1.2 |
Citations & Data Usage Policy
Add any special restrictions in here.
Data Citation
DOI goes here. Create using pubhub with information from Collection Approval form
Publication Citation
Buda, A., Saha, R., Walsh, S., Ghate, N., Li, A., Święcicki, J. Y., Lo, M. A., Mazurowski, M., Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model. arXiv preprint https://arxiv.org/abs/2011.07995.
TCIA Citation
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7
Other Publications Using This Data
TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.
Version 1 (Current): Updated 2020/mm/dd
Data Type | Download all or Query/Filter |
---|---|
Images (DICOM, xx.x GB) |
(Requires NBIA Data Retriever .) |
Annotations (CSV) | |
Labels (CSV) |