Child pages
  • Submission and De-identification Overview

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

At the National Cancer Institute's (NCI) direction , personnel from The Cancer Imaging Archive (TCIA) curate curates and publish publishes freely available cancer imaging datasets.  These include clinical and pre-clinical radiology and pathology images as well as supporting clinical data (e.g. patient demographics and clinical outcomes), image annotations, and links to externally hosted genomic and proteomic datasets in other NCI databases where available.  Data comes from NIH programs, clinical trials and community-submitted proposals to publish data.

The primary objective of TCIA is to provide public access to data in order to enhance transparency and reproducibility in cancer imaging research. To achieve this, TCIA offers data de-identification, curation, and hosting services to relieve individual researchers and institutions of these responsibilities.

...

  • Private Tag Dictionary. TCIA maintains an industry-leading Private Tag Dictionary that contains definitions of vendor-specific private tags by manufacturer, model, and software version together with a disposition indicating how to process that tag to insure ensure that it is free of    The PHI.   The dictionary was updated in 2020 at TCIA by reviewing and merging three well-known private tag dictionaries (Grassroots DICOM, DICOM3tools, DCMTK) with TCIA’s private tag dictionary that contained all private tags encountered by TCIA through 2019.  This dictionary is continually updated and maintained at TCIA and is available for public review and utilization.  The dictionary is leveraged as part of the TCIA Posda curation workflow , and allows for an expanded set of scientifically useful tags to be retained.
  • Private Tag Review Process. When a submitting site sends DICOM data to TCIA, private tags known to be unsafe are removed by the CTP software used to send the data. All other private tags are retained and de-identified at TCIA during curation of the data following the Retain Safe Private Option of the DICOM de-identification standard. The Retain Safe Private Option allows for the retention of DICOM tags stored in the private fields that are known to not contain PHI. If a new private tag is encountered in the Posda database that does not have a private tag disposition, values are inspected in relation to the tag description together with values in the tags and a disposition is assigned. If there is no existing private tag description, an attempt is made to find a manufacturer’s definition of the tag. If no such description can be found the disposition is defined to remove the tag. TCIA will remove any private tags from the images that are not specified in the private tag dictionary or are defined as containing a form of PHI such as name, SSN, etc.  All date and datetime private tags that are retained are offset using the same offset as applied to the standard tags for the image. All private tags containing UIDs are assigned a TCIA root and appended with a hashed value as done with the standard tags. This ensures all references to other images contained within TCIA are maintained. Curators perform a manual inspection of all private tags using Posda reports and any PHI that may be found is removed, emptied, date offset, or hashed as appropriate.

Body Part Examined - When images are made public, a single body part examined, corresponding to the cancer of interest, is assigned to all images.  If the collection consists of sarcoma images (or any other cancer affecting multiple organs within the image collection), there may be multiple body parts assigned, though only one to any series.  In phantom collections, body part examined is simply labeled “PHANTOM”.

All Tags - The The TCIA de-identification process ensures that every DICOM tag of every DICOM object is free of the 18 forms of PHI as currently defined by the Safe Harbor Method.  At the submitting site, a DICOM PS 3.15 compliant script removes or modifies DICOM tags deemed to be unsafe (See table 1 for a complete listing). At TCIA, a software routine extracts every unique value found within a collection being curated and prints them to a report. This report is examined by curators and any actions necessary to remove PHI are applied before moving the images from the Posda Curation Server to the TCIA Public Server. Every DICOM image is inspected by curators for burned-in PHI. Once the images reach the TCIA Public Server, the tags are inspected by two curators for PHI using Posda PHI reports.  Images are spot-checked for any burned-in PHI.

The following table details the de-identification performed at the submitting site by way of a TCIA-supplied de-identification script.  All other tags not mentioned in the table below are reviewed and cleaned if necessary during our Posda curation. 

...