Child pages
  • Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Localtab Group



Localtab
activetrue
titleData Access

Data Access


Data Type
Download all or Query/Filter
License

Images (DICOM, 163.6GB)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/CBIS-DDSM-All-doiJNLP-zzWs5zfZ.tcia?version=1&modificationDate=1534787024127&api=v2


  


Tcia button generator
labelSearch
urlhttps://www.cancerimagingarchive.net/nbia-search/?CollectionCriteria=CBIS-DDSM




Click the  Download button to save a ".tcia" manifest file to your computer, which you must open with the NBIA Data Retriever .  

Tcia cc by 3

Mass-Training-Description (csv)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/mass_case_description_train_set.csv?version=1&modificationDate=1506796355038&api=v2



Tcia cc by 3

Calc-Training-Description (csv)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/calc_case_description_train_set.csv?version=1&modificationDate=1506796349666&api=v2



Tcia cc by 3

Mass-Test-Description (csv)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/mass_case_description_test_set.csv?version=1&modificationDate=1506796343175&api=v2



Tcia cc by 3

Calc-Test-Description (csv)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/calc_case_description_test_set.csv?version=1&modificationDate=1506796343686&api=v2



Tcia cc by 3



Click the Versions tab for more info about data releases.




Localtab
titleDetailed Description

Detailed Description


Collection Statistics


Modalities

MG

Number of Participants

1,566*

Number of Studies

6775

Number of Series

6775

Number of Images

10239

Image Size (GB)163.6


 * The image data for this collection is structured such that each participant has multiple patient IDs.  For example, pat_id 00038 has 10 separate patient IDs which provide information about the scans within the IDs (e.g. Calc-Test_P_00038_LEFT_CC, Calc-Test_P_00038_RIGHT_CC_1)  This makes it appear as though there are 6,671 participants according to the DICOM metadata, but there are only 1,566 actual participants in the cohort.


The CBIS-DDSM contributors have provided the following additional options for subset download.


Data Type
Download all or Query/Filter
Mass-Training Full Mammogram Images (DICOM)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/Mass-Training_full_mammogram_images_1-doiJNLP-wv6aeYDn.tcia?version=1&modificationDate=1534787720182&api=v2



Mass-Training ROI and Cropped Images (DICOM)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/Mass-Training_ROI-mask_and_crpped_images_1-doiJNLP-07gmVj4b.tcia?version=1&modificationDate=1534787720507&api=v2



Calc-Training Full Mammogram Images (DICOM)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/Calc-Training_full_mammogram_images_1-doiJNLP-PrQ05L6k.tcia?version=1&modificationDate=1534787721436&api=v2



Calc-Training ROI and Cropped Images (DICOM)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/Calc-Training_ROI-mask_and_crpped_images-doiJNLP-kTGQKqBk.tcia?version=1&modificationDate=1534787718550&api=v2



Mass-Training-Description (csv)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/mass_case_description_train_set.csv?version=1&modificationDate=1506796355038&api=v2



Calc-Training-Description (csv)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/calc_case_description_train_set.csv?version=1&modificationDate=1506796349666&api=v2



Mass-Test Full Mammogram Images (DICOM)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/Mass-Test_full_mammogram_images-doiJNLP-6ccCrb8t.tcia?version=1&modificationDate=1534787719378&api=v2



Mass-Test ROI and Cropped Images (DICOM)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/Mass-Test_ROI-mask_and_crpped_images-doiJNLP-SmEOyQFn.tcia?version=1&modificationDate=1534787719824&api=v2



Calc-Test Full Mammogram Images (DICOM)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/Calc-Test_full_mammogram_images-doiJNLP-SiXj6kpS.tcia?version=1&modificationDate=1534787720906&api=v2



Calc-Test ROI and Cropped Images (DICOM)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/Calc-Test_ROI-mask_and_crpped_images-doiJNLP-PsjCfTdf.tcia?version=1&modificationDate=1534787718981&api=v2



Mass-Test-Description (csv)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/mass_case_description_test_set.csv?version=1&modificationDate=1506796343175&api=v2



Calc-Test-Description (csv)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/calc_case_description_test_set.csv?version=1&modificationDate=1506796343686&api=v2





The CBIS-DDSM was created from DDSM by undertaking the following specific procedures:


1) Removal of questionable mass cases

Not all DDSM ROI annotations include suspicious lesions. Due to this issue, a trained mammographer reviewed the questionable cases. In this process, 254 images were identified in which a mass was not clearly seen. These images were removed from the final data set. 


2) Image Decompression

DDSM images are distributed as lossless JPEG files (LJPEG); an obsolete image format. The only library capable of decompressing these images is the Stanford PVRG-JPEG Codec v1.1, which was last updated in 1993. To address this the PVRG-JPEG codec was modified to successfully compile on an OSX 10.10.5 (Yosemite) distribution using Apple GCC clang-602.0.53. The decompression code outputs data in 8-bit raw binary bitmaps. Python tools were developed to read this raw data and store it as 16-bit gray scale TIFF files. These files were later converted to DICOM.

This process is entirely lossless and preserved all information from the original DDSM files.


3) Image Processing

The original DDSM files were distributed with a set of bash and C tools for Linux to perform image correction and metadata processing. These tools were very difficult to refactor for use on modern systems. To address this the tools were re-implemented in Python to be cross-platform and easy to understand for modern users. All images in the DDSM were derived from several different scanners at different institutions. The DDSM data descriptions provide methods to convert raw pixel data into 64-bit optical density values, which are standardized across all images. Optical density values were then re-mapped to 16-bit gray scale TIFF files.  The DDSM automatically clips optical density values to be between 0.05 and 3.0 for noise reduction. This clipping occurs in the CBIS-DDSM as well, but the new tools provide a flag to remove the clipping and retain the original optical density values.


4) Image Cropping

Several CAD tasks require only analyzing abnormalities (the portion of the image in the ROI) without needing the full mammogram image. A set of convenience images are also provided, which are focused crops of abnormalities. Abnormalities were cropped by determining the bounding rectangle of the abnormality with respect to its ROI. The square crops were created by extending the shorter edge of the rectangle to be the same size as the long edge. The centroid of the abnormality is located in the center of these square crops.


5) Updating for precision segmentation

Mass margin and shape have long been proven to be significant indicators for diagnosis in mammography. Because of this, many methods are based on developing mathematical descriptions of the tumor outline. Due to the dependence of these methods on accurate ROI segmentation and the imprecise nature of many of the DDSM-provided annotations, a lesion segmentation algorithm (described below) was applied that is initialized by the general, original DDSM contours but is able to supply much more accurate ROIs. This was done only for masses and not calcifications. Lesion segmentation was accomplished by applying a modification to the local level set framework as presented in Chan and Vese11. Level set models follow a non-parametric deformable model, thus can handle topological changes during evolution11. Chan-Vese model is a region-based method that estimates spatial statistics of image regions and finds a minimal energy where the model best fits the image, resulting in convergence of the contour towards the desired object.  This modification of the local framework includes automated evaluation of the local region surrounding each contour point. For low contrast lesions, small local region is determined, and excessive curve evolution is thus prevented. On the other hand, for noisy or heterogeneous lesions, a relatively large local region is assigned to the contour point to prevent convergence of the level set contour into local minima.  Local frameworks require an initialization of the contour, and thus the original DDSM annotation was used as the level set segmentation initialization.


6) Standardized Train/Test splits

The data were split into a training set and a testing set based on the BIRADS category. This allows for an appropriate stratification for researchers working on CADe as well as CADx. The split was obtained using 20% of the cases for testing and the rest for training. The data were split for all mass cases and all calcification cases separately. Here “case” is used to indicate a particular abnormality, seen on both the CC and MLO views.






Localtab
titleCitations & Data Usage Policy

Citations & Data Usage Policy 

Tcia limited license policy

Info
titleCBIS-DDSM Citation

 Rebecca Sawyer-Lee, Francisco Gimenez, Assaf Hoogi , Daniel Rubin R., Gimenez, F., Hoogi, A., & Rubin, D. (2016). Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) (Version 1)   [DatasetData set]. The Cancer Imaging Archive. DOI:  https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY


Info
titlePublication Citation

Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi, Kanae Kawai Miyake, Mia Gorovoy & Daniel L. RubinR. S., Gimenez, F., Hoogi, A., Miyake, K. K., Gorovoy, M., & Rubin, D. L. (2017). A curated mammography data set for use in computer-aided detection and diagnosis research. In Scientific Data volume 4, Article number: 170177 DOI: (Vol. 4, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/sdata.2017.177


Info
titleTCIA Citation

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013).  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, . In Journal of Digital Imaging , Volume (Vol. 26, Number Issue 6, December, 2013, pp 1045-1057. DOI: , pp. 1045–1057). Springer Science and Business Media LLC. https://doi.org/10.1007/s10278-013-9622-7


Other Publications Using This Data

TCIA maintains a list of publications that leverage our data, including citations of this Collection. If you have a publication you'd like to add please contact the TCIA Helpdesk. Some publications that have used this dataset as a resource include:

  1. Agarwal, O. (2019). An Augmentation in the Diagnostic Potency of Breast Cancer through A Deep Learning Cloud-Based AI Framework to Compute Tumor Malignancy & Risk. International Research Journal of Innovations in Engineering and Technology (IRJIET), 3(6). Retrieved from https://www.irjiet.com/Volume-3/Issue-6-June-2019/An-Augmentation-in-the-Diagnostic-Potency-of-Breast-Cancer-through-a-Deep-Learning-Cloud-Based-A-I-Framework-to-Compute-Tumor-Malignancy-Risk/7 
  2. Agarwal, R., Diaz, O., Lladó, X., Yap, M. H., & Martí, R. (2019). Automatic mass detection in mammograms using deep convolutional neural networks. Journal of Medical Imaging, 6(03), 1. doi: https://doi.org/10.1117/1.jmi.6.3.031409 
  3. Bhalerao, P. B., & Bonde, S. V. (2021). Cuckoo search based multi-objective algorithm with decomposition for detection of masses in mammogram images. International Journal of Information Technology, 13(6), 2215-2226. doi: https://doi.org/10.1007/s41870-021-00805-9 
  4. Cha, K. H., Petrick, N., Pezeshk, A., Graff, C. G., Sharma, D., Badal, A., & Sahiner, B. (2020). Evaluation of data augmentation via synthetic images for improved breast mass detection on mammograms using deep learning. J Med Imaging (Bellingham), 7(1), 012703. doi: https://doi.org/10.1117/1.JMI.7.1.012703 
  5. Divyashree, B. V., & Kumar, G. H. (2021). Breast Cancer Mass Detection in Mammograms Using Gray Difference Weight and MSER Detector. SN Computer Science, 2(2). doi: https://doi.org/10.1007/s42979-021-00452-8 
  6. Duggento, A., Aiello, M., Cavaliere, C., Cascella, G. L., Cascella, D., Conte, G., . . . Toschi, N. (2019). An Ad Hoc Random Initialization Deep Neural Network Architecture for Discriminating Malignant Breast Cancer Lesions in Mammographic Images. Contrast Media Mol Imaging, 2019. doi: https://doi.org/10.1155/2019/5982834 
  7. Farhat, F. G. (2019). A study of machine learning and deep learning models for solving medical imaging problems. (MS Masters). New Jersey Institute of Technology, Retrieved from https://digitalcommons.njit.edu/theses/1653 
  8. Goh, J. Y., & Khang, T. F. (2021). On the classification of simple and complex biological images using Krawtchouk moments and Generalized pseudo-Zernike moments: a case study with fly wing images and breast cancer mammograms. PeerJ Comput Sci, 7, e698. doi: https://doi.org/10.7717/peerj-cs.698 
  9. Hassan, S. a. A., Sayed, M. S., Abdalla, M. I., & Rashwan, M. A. (2020). Breast cancer masses classification using deep convolutional neural networks and transfer learning. Multimedia Tools and Applications, 79(41-42), 30735-30768. doi: https://doi.org/10.1007/s11042-020-09518-w 
  10. Heenaye-Mamode Khan, M., Boodoo-Jahangeer, N., Dullull, W., Nathire, S., Gao, X., Sinha, G. R., & Nagwanshi, K. K. (2021). Multi- class classification of breast cancer abnormalities using Deep Convolutional Neural Network (CNN). PLoS One, 16(8), e0256500. doi: https://doi.org/10.1371/journal.pone.0256500 
  11. Huang, Z., Zhu, X., Ding, M., & Zhang, X. (2020). Medical Image Classification Using a Light-Weighted Hybrid Neural Network Based on PCANet and DenseNet. IEEE Access, 8, 24697-24712. doi: https://doi.org/10.1109/ACCESS.2020.2971225 
  12. Iftikhar, H., Shahid, A. R., Raza, B., & Khan, H. N. (2020). Multi-View Attention-based Late Fusion (MVALF) CADx system for breast cancer using deep learning. Machine Graphics & Vision, 29(1), 55-78. doi: https://doi.org/10.22630/MGV.2020.29.1.4 
  13. Mohamed Aarif, K. O., Sivakumar, P., Mohamed Yousuff, C., & Mohammed Hashim, B. A. (2021). Deep MammoNet: Early Diagnosis of Breast Cancer Using Multi-layer Hierarchical Features of Deep Transfer Learned Convolutional Neural Network. Advanced Machine Learning Approaches in Cancer Prognosis, 204, 317-339. doi: https://doi.org/10.1007/978-3-030-71975-3_12 
  14. Oyetade, I. S., Ayeni, J. O., Ogunde, A. O., Oguntunde, B. O., & Olowookere, T. A. (2021). Hybridized Deep Convolutional Neural Network and Fuzzy Support Vector Machines for Breast Cancer Detection. SN Computer Science, 3(1), 58. doi: https://doi.org/10.1007/s42979-021-00882-4 
  15. Oza, P. R., Sharma, P., & Patel, S. (2022). A Transfer Representation Learning Approach for Breast Cancer Diagnosis from Mammograms using EfficientNet Models. Scalable Computing: Practice and Experience, 23(2), 51-58. doi: https://doi.org/10.12694/scpe.v23i2.1975 
  16. Ratner, A. J. (2019). Accelerating Machine Learning with Training Data Management. (Ph. D Dissertation). Stanford University, https://searchworks.stanford.edu/view/13333383  
  17. Ravitha Rajalakshmi, N., Vidhyapriya, R., Elango, N., & Ramesh, N. (2020). Deeply supervised U‐Net for mass segmentation in digital mammograms. International Journal of Imaging Systems and Technology. doi: https://doi.org/10.1002/ima.22516 
  18. Sanyal, J., Tariq, A., Kurian, A. W., Rubin, D., & Banerjee, I. (2021). Weakly supervised temporal model for prediction of breast cancer distant recurrence. Sci Rep, 11(1), 9461. doi: https://doi.org/10.1038/s41598-021-89033-6 
  19. Sawyer Lee, R., Dunnmon, J. A., He, A., Tang, S., Re, C., & Rubin, D. L. (2021). Comparison of segmentation-free and segmentation-dependent computer-aided diagnosis of breast masses on a public mammography dataset. J Biomed Inform, 113, 103656. doi: https://doi.org/10.1016/j.jbi.2020.103656 
  20. Shen, R., Yao, J., Yan, K., Tian, K., Jiang, C., & Zhou, K. (2020). Unsupervised domain adaptation with adversarial learning for mass detection in mammogram. Neurocomputing. doi: https://doi.org/10.1016/j.neucom.2020.01.099 
  21. Soleimani, H. (2021). Information Fusion of Magnetic Resonance Images and Mammographic Scans for Improved Diagnostic Management of Breast Cancer. (Ph.D. Doctor of Philosophy). University of Waterloo, Waterloo, Ontario, Canada. Retrieved from http://hdl.handle.net/10012/17290  
  22. Sun, Y., & Ji, Y. (2021). AAWS-Net: Anatomy-aware weakly-supervised learning network for breast mass segmentation. PLoS One, 16(8), e0256830. doi: https://doi.org/10.1371/journal.pone.0256830 
  23. Umehara, K., Ota, J., & Ishida, T. (2017). Super-Resolution Imaging of Mammograms Based on the Super-Resolution Convolutional Neural Network. Open Journal of Medical Imaging, 7(04), 180. doi: https://doi.org/10.4236/ojmi.2017.74018 
  24. Vedalankar, A. V., Gupta, S. S., & Manthalkar, R. R. (2021). Addressing architectural distortion in mammogram using AlexNet and support vector machine. Informatics in Medicine Unlocked, 23, 100551. doi: https://doi.org/10.1016/j.imu.2021.100551 
  25. Duggento et al. An Ad Hoc Random Initialization Deep Neural Network Architecture for Discriminating Malignant Breast Cancer Lesions in Mammographic Images Contrast Media Mol Imaging 2019 link to article
  26. Agarwal et al. Automatic mass detection in mammograms using deep convolutional neural networks Journal of Medical Imaging 2019 link to article
  27. Cha, et al.  Evaluation of data augmentation via synthetic images for improved breast mass detection on mammograms using deep learning. J Med Imaging (Bellingham) 2020 link to article
  28. Shen, et al. Unsupervised domain adaptation with adversarial learning for mass detection in mammogram Neurocomputing 2020 link to article

  29. Agarwal. An Augmentation in the Diagnostic Potency of Breast Cancer through A Deep Learning Cloud-Based AI Framework to Compute Tumor Malignancy & Risk International Research Journal of Innovations in Engineering and Technology (IRJIET) 2019 link to article

  30. Farhat. A study of machine learning and deep learning models for solving medical imaging problems 2019 link to article

  31. Ratner, Accelerating Machine Learning with Training Data Management 2019 link to article

  32. Tang, et al. Five Classifications of Mammography Images Based on Deep Cooperation Convolutional Neural Network American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS) 2019 link to article 

  33. Umehara, et al. Super-Resolution Imaging of Mammograms Based on the Super-Resolution Convolutional Neural Network Open Journal of Medical Imaging 2017 link to article





Localtab
titleVersions

Version 1 (Current): Updated 2017/09/14

 


Data Type
Download all or Query/Filter
Images (DICOM, 163.6GB)


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/22516629/CBIS-DDSM-All-doiJNLP-zzWs5zfZ.tcia?version=1&modificationDate=1534787024127&api=v2



Tcia button generator
labelSearch
urlhttps://www.cancerimagingarchive.net/nbia-search/?CollectionCriteria=CBIS-DDSM



 

(Requires the NBIA Data Retriever .)






...