Child pages
  • De-identification Knowledge Base

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add details on how the Tag Sniffer works. Move the CTP description up higher in the page so that this is part of the background needed to understand the workflow.

...

  1. If a collection of images are produced by equipment from different manufacturers, you may have collisions in the sets of private elements you want to retain and discard. For example, element (0009, 1001) from manufacturer A may contain an important physical parameter while that same element from manufacturer B may contain PHI.
  2. If the collection has images that are created by an acquisition modality and are then modified by another application (PACS, workstation), a private group may have multiple reserved blocks. Also, one cannot assume that the original creator will have always chosen reserved block 0010.

TCIA De-identification Work Flow

...

DICOM Basic Attribute Confidentiality Profile

DICOM standards committee Working Group 18 wrote Supplement 142 that is now incorporated into the published DICOM Standard. The Attribute Confidentiality Profile (DICOM PS 3.15: Appendix E) provides a standard for image de-identification and a process with which to reduce the complexity involved in safely de‐identifying DICOM image data while providing flexibility for scenarios which necessitate preservation of certain information needed for quality control and analysis that is essential to research. This is achieved by providing a number of Application Level Confidentiality Profiles which includes a Basic Profile along with a number of Option Profiles. These profiles provide the necessary instructions for how to safely clean DICOM elements which may contain PHI. The full DICOM Standard, including Part 15, is available at the NEMA web site: http://medical.nema.org/standard.html The original Supplement 142 guidance document can be obtained at at ftp://medical.nema.org/medical/dicom/final/sup142_ft.doc. We recommend you use the published standard above as it will be updated with any change proposals.

Appendix E of PS 3.15 documents a system for protecting attributes. We quote a small section of the document.

The Attributes listed in Table E.1-1 for each profile are contained in Standard IODs, or may be contained in Standard Extended IODs. An implementation claiming conformance to an Application Level

Confidentiality Profile as a de-identifier shall protect or retain all instances of the Attributes listed in Table E.1-1, whether contained in the main dataset or embedded in an Item of a Sequence of Items. The following action codes are used in the table:

– D – replace with a non-zero length value that may be a dummy value and consistent with the VR

– Z – replace with a zero length value, or a non-zero length value that may be a dummy value and consistent with the VR

– X – remove

– K – keep (unchanged for non-sequence attributes, cleaned for sequences)

– C – clean, that is replace with values of similar meaning known not to contain identifying information and consistent with the VR

– U – replace with a non-zero length UID that is internally consistent within a set of Instances

– Z/D – Z unless D is required to maintain IOD conformance (Type 2 versus Type 1)

– X/Z – X unless Z is required to maintain IOD conformance (Type 3 versus Type 2)

– X/D – X unless D is required to maintain IOD conformance (Type 3 versus Type 1)

– X/Z/D – X unless Z or D is required to maintain IOD conformance (Type 3 versus Type 2 versus Type 1)

– X/Z/U* - X unless Z or replacement of contained instance UIDs (U) is required to maintain IOD conformance (Type 3 versus Type 2 versus Type 1 sequences containing UID references)

The Attributes listed in Table E.1-1 for each profile are contained in Standard IODs, or may be contained
in Standard Extended IODs. An implementation claiming conformance to an Application Level
Confidentiality Profile as a de-identifier shall protect or retain all instances of the Attributes listed in Table
E.1-1, whether contained in the main dataset or embedded in an Item of a Sequence of Items. The
following action codes are used in the table:
– D – replace with a non-zero length value that may be a dummy value and consistent with the VR
– Z – replace with a zero length value, or a non-zero length value that may be a dummy value and
consistent with the VR
– X – remove
– K – keep (unchanged for non-sequence attributes, cleaned for sequences)
– C – clean, that is replace with values of similar meaning known not to contain identifying
information and consistent with the VR
– U – replace with a non-zero length UID that is internally consistent within a set of Instances
– Z/D – Z unless D is required to maintain IOD conformance (Type 2 versus Type 1)
– X/Z – X unless Z is required to maintain IOD conformance (Type 3 versus Type 2)
– X/D – X unless D is required to maintain IOD conformance (Type 3 versus Type 1)
– X/Z/D – X unless Z or D is required to maintain IOD conformance (Type 3 versus Type 2 versus
Type 1)
– X/Z/U* - X unless Z or replacement of contained instance UIDs (U) is required to maintain IOD
conformance (Type 3 versus Type 2 versus Type 1 sequences containing UID references)

PS 3.15: E.2 then defines the Basic Application Level Confidentiality Profile which describes how to apply the scheme above with a number of options that determine the scope of protection that is provided. These definitions allow a system to follow a standard procedure and document in a standard way the behavior of that system.

Software Tools

CTP

TCIA utilizes the RSNA Clinical Trials Processor (CTP) software in conjunction with caBIG's National Biomedical Imaging Archive (NBIA) to de‐identify and host the images in the archive. The Cancer Imaging Program's Informatics Team has been working closely with the developer of CTP since 2009 to incorporate support for this standard as it was being defined by WG18. A full summary and time line of this project can be found athttps://wiki.nci.nih.gov/display/CIP/Incorporation+of+DICOM+WG18+Supplement+142+into+CTP.

...

  • Clean Descriptors Option: Removal of identification information from descriptive tags which contain unstructured plain text values over which an operator has control
  • Retain Modified Longitudinal Temporal Information Options: Modification of tags that contain dates or times
  • Retain Patient Characteristics Option: Retention of physical characteristics of the patient that are descriptive rather than identifying information (e.g. metabolic measures, body weight, etc.)
  • Retain Device Identity Option: Retention of information about the characteristics of the device used to perform the acquisition
  • Retain Safe Private Option: Retention of Private Attributes confirmed not to contain PHI

DICOM Tag Sniffer

In order to simplify our ability to implement some of the "clean" instructions specified in DICOM PS 3.15 a new tool was developed to help inspect the contents of DICOM elements which allow free text entry by a technician and Private Tags for potential PHI. This tool scans a folder and included subfolders for DICOM objects and produces several different outputs that depend on the mode used and input profiles. The software reads each DICOM object and iterates through each public and private element. The software then uses the profiles below to determine whether to retain the value of the element for later inspection:

  • Confidentiality Profile: One input profile corresponds to the entries in table E.1-1 in DICOM PS 3.15. We list the attributes in the table and the coded values according to the table entries.  When scanning the DICOM objects, each public element is checked against the data in the profile. If the element is found in the profile, the software knows if it should record the element value for later inspection or if the software can ignore it. For example, if the DICOM profile indicates the element is to be deleted, there is no reason to review the value in that element.
  • The Confidentiality Profile input is augmented with elements that are known to contain physical parameters such as rows, columns or pixel spacing. Rather than tell the software to ignore values with a specific value representation, we list those elements explicitly.
  • Modality Software Profile: This input profile describes the private elements that are documented in the conformance statement by the manufacturer. This file takes into account the Private Creator Data Elements described above and has a code table for indicating program actions (record the value, ignore the value, ...)

These outputs are relevant at different stages of the curation and image publication process.

  • Element Inventory: is the set of DICOM tags that are found in the image set. The tags include only the hexadecimal tags (xxxx, yyyy) and no values. All public and private tags are listed, but each is listed only once. The Confidentiality Profile and Modality Software Profile are not consulted as no values are retained for review.
  • Element Values, Pre-Deidentification: We want to examine element values to determine how to configure CTP scripts for proper de-identification. As mentioned above, we want to retain as many elements as possible while not exposing PHI. We also do not want to review all element values in all DICOM objects. We use a Confidentiality Profile that corresponds to the DICOM  Basic Application Confidentiality Profile and a Modality Software Profile that properly describes the private elements in the DICOM objects.
  • Element Values, Final Review: In this mode, we want to review the values in the DICOM objects just before publication. We have de-identified the data and want to analyze the data as a final check. In this mode, we use a different Confidentiality Profile and different Modality Software Profile. For the Confidentiality Profile, we only list elements that we know are physical parameters (rows, columns, ....) and do not include the DICOM references from PS 3.15, Table E.1-1. That will direct the software to record the element values. Likewise, the Modality Software Profile used will direct the software to record all values for later analysis.

We believe this tool might be useful to the rest of the research community and so it's been made freely available as an open source application. We have also created documentation for how a researcher could utilize in the context of their own projects:

TCIA De-identification Work Flow

The TCIA provides standards‐based curation support to ensure safe and thorough de‐identification of all images in the archive per federal HIPAA and HITECH regulations. In order to achieve this compliance without stripping the data of its scientific utility TCIA staff perform a redundant, thorough de‐identification and analysis procedure based on guidance provided by the industry experts in DICOM standards committee Working Group 18.

After initial testing TCIA image curators individually inspect every image, both in the DICOM tags and the image pixels to ensure there is no PHI. Changes to the de‐identification procedure are made as appropriate to correct any potential issues found by our curation team. After the completion of the image submissions the curation team again inspects every image in the full data set to ensure regulatory compliance. Only after this inspection is complete are the images made available to the general public. For general information on what to expect as an image provider please see our web site at http://www.cancerimagingarchive.net/provider.html.

...