Child pages
  • A DICOM dataset for evaluation of medical image de-identification (Pseudo-PHI-DICOM-Data)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Redirect
delay10
locationhttps://doi.org/10.7937/s17z-r072

Summary

Excerpt

Open access or shared research data must comply with (HIPAA) patient privacy regulations. These regulations require the de-identification of datasets before they can be placed in the public domain.  The process of image de-identification is time consuming, requires significant human resources, and is prone to human error.  Automated image de-identification algorithms have been developed but the research community requires some method of evaluation before such tools can be widely accepted.  This evaluation requires a robust dataset that can be used as part of an evaluation process for de-identification algorithms.  

We developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM image information objects were selected from datasets published in TCIA.  Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM data elements to mimic typical clinical imaging exams.  The evaluation dataset was de-identified by a TCIA curation team using standard TCIA tools and procedures. We are publishing the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (result of TCIA curation) in advance of a potential competition, sponsored by the National Cancer Institute (NCI), for de-identification algorithm evaluation. , and de-identification of medical image datasets. The evaluation dataset published here is a subset of a larger evaluation dataset that was created under contract for the National Cancer Institute. This subset is being published to allow researchers to test their de-identification algorithms and promote standardized procedures for validating automated de-identification.

Acknowledgements

We would like to acknowledge the National Cancer Institute for funding and actively participating in the project that generated the evaluation datasets being published here and the TCIA curation team, led by Ms. Geri Blake, who curated this data.  Original data came from multiple institutions and multiple TCIA image collections.

Localtab Group


Localtab
activetrue
titleData Access

Data Access

Data TypeDownload all or Query/FilterLicense

Images,  (DICOM,

XX.X GB)CR, CT, DX, MG, MR, PT

609 MB)

Evaluation dataset


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-Phi-DICOM%20Evaluation%20dataset%20April%207%202021.tcia?api=v2



Tcia button generator
labelSearch
urlhttps://nbia.cancerimagingarchive.net/nbia-search/?MinNumberOfStudiesCriteria=1&PatientCriteria=6670427471,9189822998,9894340694,8989193730,8155012288,571403367,292821506,339833062,3642991663,6774825273,8732322741,7255997752,7361647728,6451050561,292821506,8834647487,6774825273,6451050561,8548156246,4025360156,6614238035,9894340694,6415974217,3209648408,9894340694,8189244869



(Download requires the NBIA Data Retriever)

Tcia cc by 4

Images,  (DICOM, 606 MB)

De-identified Evaluation dataset


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-PHI-DICOM%20De-id%20Evaluation%20dataset%20April%207%202021.tcia?api=v2



Tcia button generator
labelSearch
urlhttps://nbia.cancerimagingarchive.net/nbia-search/?MinNumberOfStudiesCriteria=1&PatientCriteria=Pseudo-PHI-005,Pseudo-PHI-015,Pseudo-PHI-019,Pseudo-PHI-001,Pseudo-PHI-010,Pseudo-PHI-014,Pseudo-PHI-018,Pseudo-PHI-002,Pseudo-PHI-013,Pseudo-PHI-012,Pseudo-PHI-020,Pseudo-PHI-011,Pseudo-PHI-006,Pseudo-PHI-011,Pseudo-PHI-016,Pseudo-PHI-008,Pseudo-PHI-017,Pseudo-PHI-007,Pseudo-PHI-021,Pseudo-PHI-001,Pseudo-PHI-003,Pseudo-PHI-009,Pseudo-PHI-008,Pseudo-PHI-021,Pseudo-PHI-021,Pseudo-PHI-004



(Download requires the NBIA Data Retriever)

Tcia cc by 4

Patient Mapping (csv, 0.6 kB)

Buttons are not populated until collection is released.

Evaluation/De-identified


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-PHI-DICOM-Dataset%20patid_crosswalk.csv?api=v2



Tcia cc by 4

UID Mapping (csv, 213 kB)

Evaluation/De-identified


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-PHI-DICOM-Dataset%20uid_crosswalk.csv?api=v2



Tcia cc by 4

Click the Versions tab for more info about data releases.

Please contact help@cancerimagingarchive.net  with any questions regarding usage.

Additional Resources for this Dataset

The NCI Cancer Research Data Commons (CRDC) provides access to additional data and a cloud-based data science infrastructure that connects data sets with analytics tools to allow users to share, integrate, analyze, and visualize cancer research data.


Localtab
titleDetailed Description

Detailed Description

Image Statistics


Modalities

CR, CT, DX, MG, MR, PT

Number of Patients

1742

Number of Studies

1744

Number of Series

2052

Number of Images

18233386

Images Size (GB)1.2

There are 21 patients, 22 studies, 26 series but the patient ids, study instance uids, and series instance uids are different between the 2 datasets thus resulting in a double count.


Localtab
titleCitations & Data Usage Policy

Citations & Data Usage Policy

tcia-license-4-internationaltcialimited-license-4-noncommercialpolicy

Info
titleData Citation

Rutherford, M., Mun, S.K., Levine, B., Bennett, W.C., Smith, K., Farmer, P., Jarosz, J., Wagner, U., Farahani, K., Prior, F. (20202021). Data from MIDI A DICOM dataset for evaluation of medical image de-identification (Pseudo-PHI-DICOM-Data) [Data set]The Cancer Imaging Archive. DOI: https://doi.org/10.7937/s17z-r072 (draft, not active).


Info
titlePublication Citation

We ask on the proposal form if they have ONE traditional publication they'd like users to cite.

Info
titleAcknowledgement

Only if they ask for special acknowledgments like funding sources, grant numbers, etc in their proposal.Rutherford, M., Mun, S.K., Levine, B., Bennett, W.C., Smith, K., Farmer, P., Jarosz, J., Wagner, U., Freyman, J., Blake, G., Tarbox, L., Farahani, K., Prior, F. (2021). A DICOM dataset for evaluation of medical image de-identification, Nature Scientific Data. DOI: 10.1038/s41597-021-00967-y


Info
titleTCIA Citation

Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7

Other Publications Using This Data

TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.


Localtab
titleVersions

Version

1

2 (Current): Updated

yyyy

2021/

mm

04/

dd

07

Data TypeDownload all or Query/Filter

Images,  (DICOM, 609 MB)

Evaluation dataset


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-Phi-DICOM%20Evaluation%20dataset%20April%207%202021.tcia?api=v2



Tcia button generator
labelSearch
urlhttps://nbia.cancerimagingarchive.net/nbia-search/?MinNumberOfStudiesCriteria=1&PatientCriteria=6670427471,9189822998,9894340694,8989193730,8155012288,571403367,292821506,339833062,3642991663,6774825273,8732322741,7255997752,7361647728,6451050561,292821506,8834647487,6774825273,6451050561,8548156246,4025360156,6614238035,9894340694,6415974217,3209648408,9894340694,8189244869



(Download requires the NBIA Data Retriever)

Images,  (DICOM, 606 MB)

De-identified Evaluation dataset


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-PHI-DICOM%20De-id%20Evaluation%20dataset%20April%207%202021.tcia?api=v2



Tcia button generator
labelSearch
urlhttps://nbia.cancerimagingarchive.net/nbia-search/?MinNumberOfStudiesCriteria=1&PatientCriteria=Pseudo-PHI-005,Pseudo-PHI-015,Pseudo-PHI-019,Pseudo-PHI-001,Pseudo-PHI-010,Pseudo-PHI-014,Pseudo-PHI-018,Pseudo-PHI-002,Pseudo-PHI-013,Pseudo-PHI-012,Pseudo-PHI-020,Pseudo-PHI-011,Pseudo-PHI-006,Pseudo-PHI-011,Pseudo-PHI-016,Pseudo-PHI-008,Pseudo-PHI-017,Pseudo-PHI-007,Pseudo-PHI-021,Pseudo-PHI-001,Pseudo-PHI-003,Pseudo-PHI-009,Pseudo-PHI-008,Pseudo-PHI-021,Pseudo-PHI-021,Pseudo-PHI-004



(Download requires the NBIA Data Retriever)

Patient Mapping (csv)

Evaluation/De-identified


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-PHI-DICOM-Dataset%20patid_crosswalk.csv?api=v2



UID Mapping (csv)

Evaluation/De-identified


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-PHI-DICOM-Dataset%20uid_crosswalk.csv?api=v2



Note: Removed head imaging from 8 series.

Version 1: Updated 2021/01/31

Data TypeDownload all or Query/Filter

Images,  (DICOM, 653 MB)

Evaluation dataset

(DICOM, xx.x GB)


Tcia button generator
labelSearch
urlhttps://nbia.cancerimagingarchive.net/nbia-search/?MinNumberOfStudiesCriteria=1&PatientCriteria=6670427471,9189822998,9894340694,8989193730,8155012288,571403367,292821506,339833062,3642991663,6774825273,8732322741,7255997752,7361647728,6451050561,292821506,8834647487,6774825273,6451050561,8548156246,4025360156,6614238035,9894340694,6415974217,3209648408,9894340694,8189244869



(Download requires the NBIA Data Retriever)

Images,  (DICOM, 648 MB)

De-identified Evaluation dataset


Tcia button generator
labelSearch
url

(Requires NBIA Data Retriever.)

Buttons are not populated until collection is released.
https://nbia.cancerimagingarchive.net/nbia-search/?MinNumberOfStudiesCriteria=1&PatientCriteria=Pseudo-PHI-005,Pseudo-PHI-015,Pseudo-PHI-019,Pseudo-PHI-001,Pseudo-PHI-010,Pseudo-PHI-014,Pseudo-PHI-018,Pseudo-PHI-002,Pseudo-PHI-013,Pseudo-PHI-012,Pseudo-PHI-020,Pseudo-PHI-011,Pseudo-PHI-006,Pseudo-PHI-011,Pseudo-PHI-016,Pseudo-PHI-008,Pseudo-PHI-017,Pseudo-PHI-007,Pseudo-PHI-021,Pseudo-PHI-001,Pseudo-PHI-003,Pseudo-PHI-009,Pseudo-PHI-008,Pseudo-PHI-021,Pseudo-PHI-021,Pseudo-PHI-004



(Download requires the NBIA Data Retriever)

Patient Mapping (csv)

Evaluation/De-identified


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-PHI-DICOM-Dataset%20patid_crosswalk.csv?api=v2



UID Mapping (csv)

Evaluation/De-identified


Tcia button generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/80969777/Pseudo-PHI-DICOM-Dataset%20uid_crosswalk.csv?api=v2