Redirect | ||||
---|---|---|---|---|
|
Summary
Breast cancer is among the most common cancers and a common cause of death among women. Over 39 million breast cancer screening exams are performed every year and are among the most common radiological tests. This creates a high need for accurate image interpretation. Machine learning has shown promise in interpretation of medical images. However, limited data for training and validation remains an issue.
Here, we share a curated dataset of digital breast tomosynthesis images that includes normal, actionable, biopsy-proven benign, and biopsy-proven cancer cases. The dataset contains four components: (1) DICOM images, (2) a spreadsheet indicating which group each case belongs to (3) annotation boxes, and (4) Image paths for patients/studies/views. A detailed description of this dataset can be found in the following paper; please reference this paper if you use this dataset:
M. Buda, A. Saha, R. Walsh, S. Ghate, N. Li, A. Święcicki, J. Y. Lo, M. A. Mazurowski, Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model. arXiv preprint arXiv:2011.07995 (https://arxivdoi.org/abs/2011.0799510.1001/jamanetworkopen.2021.19100).
Please reference this paper if you use this dataset:
M. Buda, A. Saha, R. Walsh, S. Ghate, N. Li, A. Święcicki, J. Y. Lo, M. A. Mazurowski, Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model. arXiv preprint arXiv:2011.07995 (https://arxiv.org/abs/2011.07995).
Additional information and resources related to this dataset can be found here: https://sites.duke.edu/mazurowski/resources/digital-breast-tomosynthesis-database/
A Version 1 of the dataset contains only a subset of all data described in the paper above. More data will be share in subsequent versions.
Acknowledgements
We would like to acknowledge the individuals and institutions that have provided data for this collection:
Duke University Hospital/Duke University, Durham, NC, USA
We would like to acknowledge all those who contributed to the curation of this dataset
This work was supported by a grant from the NIH: 1 R01 EB021360 (PI: Mazurowski).
...
active | true |
---|---|
title | Data Access |
Data Access
Click the Download button to save a ".tcia" manifest file to your computer, which you must open with the NBIA Data Retriever . Click the Search button to open our Data Portal, where you can browse the data collection and/or download a subset of its contents.
...
Training set Phase 2 - Images (DICOM, 1321 GB)
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BSC-DBT%20Train%20Phase%202%20manifest.tcia?api=v2 |
---|
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BCS-DBT%20file-paths-train%20PHASE%202.csv?api=v2 |
---|
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BCS-DBT%20labels-train%20PHASE%202.csv?api=v2 |
---|
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BCS-DBT%20boxes-train-v2.csv?api=v2 |
---|
...
Additional information and resources related to this dataset can be found here: https://sites.duke.edu/mazurowski/resources/digital-breast-tomosynthesis-database/
A Version 1 of the dataset contains only a subset of all data described in the paper above. More data will be share in subsequent versions.
Please visit this discussion forum for any questions related to the data: https://www.reddit.com/r/DukeDBTData/
Required Preprocessing of DBT Images
For some of the images, the laterality stored in the DICOM header and/or image orientation are incorrect. The reference standard "truth" boxes are defined with respect to the corrected image orientation in these instances. Therefore, it is crucial to provide your results for images in the correct image orientation. Python functions for loading image data from a DICOM file into 3D array of pixel values in the correct orientation and for displaying "truth" boxes (if any) are on GitHub. Please see the readme file there for instructions.
DBTex Lesion Detection Challenge Predictions
The DBTex lesion detection challenge tasked participating teams with detecting lesions in the BCS-DBT test set. The challenge had two phases: DBTex1 and DBTex2. Here we provide the BCS-DBT lesion predictions made by all participating teams for both phases, for both the BCS-DBT test and validation sets, as “team_predictions_bothphases.zip”. Please see here under “Output format for the DBTex2 Challenge test set results” for a description of how these results are formatted. Finally, when comparing lesion bounding box predictions to the image data, be sure to load the images correctly according to the above “Required Preprocessing of DBT Images”.
If you use these predictions, please reference the DBTex challenge paper:
Konz N, Buda M, Gu H, et al. A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis. JAMA Netw Open. 2023;6(2):e230524. doi:10.1001/jamanetworkopen.2023.0524
Acknowledgements
We would like to acknowledge the individuals and institutions that have provided data for this collection:
Duke University Hospital/Duke University, Durham, NC, USA
We would like to acknowledge all those who contributed to the curation of this dataset
This work was supported by a grant from the NIH: 1 R01 EB021360 (PI: Mazurowski).
Localtab Group | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
|
...
Important considerations when using this dataset
Some scans contain skin markers. These markers can indicate moles, scars, other skin issues, palpable masses, or other areas of concern. The presence of these markers is reflective of clinical practice and a marker does not necessarily indicate a breast lesion.
Each DBT exam typically consists of 4 views (2 per breast, craniocaudal and mediolateral oblique) but for some exams less than 4 views may be available for analysis.
Images are stored in compressed DICOM format, with an entire 3D volume (view) per DICOM file. The images can be read with Radiant, MicroDICOM, ImageJ (with the Bio-Formats plug-in), Matlab, and all the main DICOM toolkits: GDCM, dcmtk, dicom3tools. If using ImageJ, image files might first need to be uncompressed using one of the toolkits (e.g., with GDCM — `gdcmconv -w compressed.dcm uncompressed.dcm`).
Required preprocessing of DBT images: For some of the images, the laterality stored in the DICOM header and/or image orientation are incorrect. The reference standard "truth" boxes are defined with respect to the corrected image orientation in these instances. Therefore, it is crucial to provide your results for images in the correct image orientation. Python functions for loading image data from a DICOM file into 3D array of pixel values in the correct orientation and for displaying "truth" boxes (if any) are on GitHub. Please see the readme file there for instructions.
DBTex Challenge
Part of this dataset is used for the DBTex challenge (http://spie-aapm-nci-dair.westus2.cloudapp.azure.com/competitions/4), which contains a total of 22032 breast tomosynthesis scans from 5060 patients from this collection. The challenge dataset is broken down into the following cohorts:
- Total: 22032 scans
- Training: 19148 scans
- Validation: 1163 scans
- Test: 1721 scans
Training set (with truth):
The training set consists of 19148 cases. This dataset will be representative of the technical properties (equipment, acquisition parameters, file format) and the nature of lesions in the validation and test sets. An associated Excel file in CSV format will include DBT scan identifier and the definition of the bounding box of all lesions.
Validation set (without truth):
The validation set will consist of 1163 cases. The locations of lesions will not be provided. The validation set needs to be processed, manipulated, and analyzed without human intervention. Validation set output submitted through the online challenge interface will contribute to the challenge leader board.
Test set (without truth):
The test set will consist of 1721 cases. The locations of lesions will not be provided. The test set needs to be processed, manipulated, and analyzed without human intervention.
|
...
title | Citations & Data Usage Policy |
---|
Citations & Data Usage Policy
Tcia license 4 noncommercial |
---|
Info | ||
---|---|---|
| ||
Buda, M., Saha, A., Walsh, R., Ghate, S., Li, N., Święcicki, A., Lo, J.Y., Yang, J., & Mazurowski, M.A. (2020). Data from the Breast Cancer Screening – Digital Breast Tomosynthesis (BCS-DBT). Data from The Cancer Imaging Archive. (2020). DOI: https://doi.org/10.7937/e4wt-cd02. |
Info | ||
---|---|---|
| ||
M. Buda, A. Saha, R. Walsh, S. Ghate, N. Li, A. Święcicki, J. Y. Lo, M. A. Mazurowski, Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model. arXiv preprint https://arxiv.org/abs/2011.07995. |
Info | ||
---|---|---|
| ||
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7 |
Other Publications Using This Data
TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.
...
title | Versions |
---|
Version 4 (Current): Updated 2021/05/24
...
Training set Phase 2 - Images (DICOM, 1321 GB)
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BSC-DBT%20Train%20Phase%202%20manifest.tcia?api=v2 |
---|
...
label | Search |
---|---|
url | https://nbia.cancerimagingarchive.net/nbia-search/?saved-cart=nbia-2791621866040011 |
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BCS-DBT%20file-paths-train%20PHASE%202.csv?api=v2 |
---|
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BCS-DBT%20labels-train%20PHASE%202.csv?api=v2 |
---|
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BCS-DBT%20boxes-train-v2.csv?api=v2 |
---|
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BSC-DBT%20Validation%20Phase%202%20manifest.tcia?api=v2 |
---|
...
label | Search |
---|---|
url | https://nbia.cancerimagingarchive.net/nbia-search/?saved-cart=nbia-40231621863548077 |
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BCS-DBT%20file-paths-validation%20PHASE%202.csv?api=v2https://wiki.cancerimagingarchive.net/download/attachments/64685580/BCS-DBT%20file-paths-validation%20PHASE%202.csv?api=v2 |
---|
...
url | https://wiki.cancerimagingarchive.net/download/attachments/64685580/BSC-DBT%20Test%20Phase%202%20manifest.tcia?api=v2 |
---|
...
label | Search |
---|---|
url | https://nbia.cancerimagingarchive.net/nbia-search/?saved-cart=nbia-26591621862922478 |
...
|
...
|