Summary

Excerpt

Open access or shared research data must comply with (HIPAA) patient privacy regulations. These regulations require the de-identification of datasets before they can be placed in the public domain. The process of image de-identification is time consuming, requires significant human resources, and is prone to human error. Automated image de-identification algorithms have been developed but the research community requires some method of evaluation before such tools can be widely accepted. This evaluation requires a robust dataset that can be used as part of an evaluation process for de-identification algorithms.

We developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM image information objects were selected from datasets published in TCIA. Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM data elements to mimic typical clinical imaging exams. The evaluation dataset de-identified by a TCIA curation team using standard TCIA tools and procedures. We are publishing the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (result of TCIA curation) in advance of a potential competition, sponsored by the National Cancer Institute (NCI), for de-identification algorithm evaluation. de-identification of medical image datasets. The evaluation dataset published here is a subset of a larger evaluation dataset that was created under contract for the National Cancer Institute. This subset is being published to allow researchers to test their de-identification algorithms and promote standardized procedures for validating automated de-identificationWe developed a multi-modality DICOM image dataset that can be used to evaluate the performance of automated de-identification pipelines and protocols. Previously de-identified radiology cases (426) were selected from the Cancer Imaging Archive (TCIA) to use as a validation dataset. The set includes CT, MRI, PET, and radiograph images of most body parts and from various imaging system vendors. The DICOM Standard Security and System Profile was used to create the validation image dataset along with audit logs from TCIA curation of the images. Synthetic PHI/PII and standardized patient IDs were added to DICOM tags in the validation image dataset to mimic non-de-identified images. The validation test dataset and associated de-identified test dataset for 5% of 426 subjects are being released with this publication. This paper describes the validation image dataset creation process, location of associated tables and datasets, and guides for using the dataset. We believe this is the first multi-modality image validation dataset available to the public for use in testing automated image de-identification algorithms.

Acknowledgements

We would like to acknowledge the individuals and institutions that have provided data for this collection:

...

Hospital/Institution Name city, state, country - Special thanks to First Last Names, degree PhD, MD, etc from the Department of xxxxxx, Additional Names from same location.

...

National Cancer Institute for funding and actively participating in the project that generated the evaluation datasets being published here and the TCIA curation team, led by Ms. Geri Blake, who curated this data. Original data came from multiple institutions and multiple TCIA image collections.

Localtab Group

Localtab

active	true
title	Data Access

Data Access

Data Type

Download all or Query/Filter

Images, (DICOM, XX.X GB)

CR, CT, DX, MG, MR, PT

Tcia button generator

Tcia button generator

label	Search

(Download requires the NBIA Data Retriever)

Buttons are not populated until collection is released.

Click the Versions tab for more info about data releases.

Please contact help@cancerimagingarchive.net with any questions regarding usage.

Localtab

title	Detailed Description

Detailed Description

Image Statistics
Modalities	CR, CT, DX, MG, MR, PT
Number of Patients	17
Number of Studies	17
Number of Series	20
Number of Images	1823
Images Size (GB)

Localtab

title	Citations & Data Usage Policy

Citations & Data Usage Policy

Tcia license 4 international

Tcia license 4 noncommercial

Info

title	Data Citation

Rutherford, M., Mun, S.K., Levine, B., Bennett, W.C., Smith, K., Farmer, P., Jarosz, J., Wagner, U., Farahani, K., Prior, F. (2020). Data from MIDI. The Cancer Imaging Archive. DOI: https://doi.org/10.7937/s17z-r072 (draft, not active).

Info

title	Publication Citation

We ask on the proposal form if they have ONE traditional publication they'd like users to cite.

Info

title	Acknowledgement

Only if they ask for special acknowledgments like funding sources, grant numbers, etc in their proposal.

Info

title	TCIA Citation

Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7

Other Publications Using This Data

TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.

Localtab

title	Versions

Version 1 (Current): Updated yyyy/mm/dd

Data Type

Download all or Query/Filter

Images (DICOM, xx.x GB)

Tcia button generator

Tcia button generator

label	Search

(Requires NBIA Data Retriever.)

Buttons are not populated until collection is released.

...

Space shortcuts

Child pages

Versions Compared

Old Version 4

New Version 5

Key

Summary

Acknowledgements

Data Access

Detailed Description

Citations & Data Usage Policy

Other Publications Using This Data

Version 1 (Current): Updated yyyy/mm/dd

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 4

New Version 5

Key

Summary

Acknowledgements

Data Access

Detailed Description

Citations & Data Usage Policy

Other Publications Using This Data

Version 1 (Current): Updated yyyy/mm/dd