The Cancer Imaging Archive (TCIA) staff has accumulated a wealth of knowledge on best practices and procedures for DICOM image de-identification in the process of maintaining our archive. In order to share this information with the wider research community we are maintaining the following knowledge base. This is a living document and will continue to be updated as we learn from our experiences. If you have feedback or questions please contact us at email@example.com.
Here are some presentations and papers which provide an overview on various aspects of DICOM de-identification and the official Supplement 142 de-identification standards:
DICOM standards committee Working Group 18 wrote Supplement 142 that is now incorporated into the published DICOM Standard. The Attribute Confidentiality Profile (DICOM PS 3.15: Appendix E) provides a standard for image de-identification and a process with which to reduce the complexity involved in safely de‐identifying DICOM image data while providing flexibility for scenarios which necessitate preservation of certain information needed for quality control and analysis that is essential to research. This is achieved by providing a number of Application Level Confidentiality Profiles which includes a Basic Profile along with a number of Option Profiles. These profiles provide the necessary instructions for how to safely clean DICOM elements which may contain PHI. The DICOM Standard, including Part 15, is available at the NEMA web site: http://medical.nema.org/standard.html The original Supplement 142 guidance document can be obtained at ftp://medical.nema.org/medical/dicom/final/sup142_ft.doc. We recommend you use the published standard above as it will be updated with any change proposals.
Appendix E of PS 3.15 documents a system for protecting attributes. We quote a small section of the document.
The Attributes listed in Table E.1-1 for each profile are contained in Standard IODs, or may be contained in Standard Extended IODs. An implementation claiming conformance to an Application Level
Confidentiality Profile as a de-identifier shall protect or retain all instances of the Attributes listed in Table E.1-1, whether contained in the main dataset or embedded in an Item of a Sequence of Items. The following action codes are used in the table:
– D – replace with a non-zero length value that may be a dummy value and consistent with the VR
– Z – replace with a zero length value, or a non-zero length value that may be a dummy value and consistent with the VR
– X – remove
– K – keep (unchanged for non-sequence attributes, cleaned for sequences)
– C – clean, that is replace with values of similar meaning known not to contain identifying information and consistent with the VR
– U – replace with a non-zero length UID that is internally consistent within a set of Instances
– Z/D – Z unless D is required to maintain IOD conformance (Type 2 versus Type 1)
– X/Z – X unless Z is required to maintain IOD conformance (Type 3 versus Type 2)
– X/D – X unless D is required to maintain IOD conformance (Type 3 versus Type 1)
– X/Z/D – X unless Z or D is required to maintain IOD conformance (Type 3 versus Type 2 versus Type 1)
– X/Z/U* - X unless Z or replacement of contained instance UIDs (U) is required to maintain IOD conformance (Type 3 versus Type 2 versus Type 1 sequences containing UID references)
PS 3.15: E.2 then defines the Basic Application Level Confidentiality Profile which describes how to apply the scheme above with a number of options that determine the scope of protection that is provided. These definitions allow a system to follow a standard procedure and document in a standard way the behavior of that system.
It is desirable to retain DICOM private data elements that contain parameters describing the acquisition while removing elements containing PHI. Performing this task requires understanding the mechanism defined by DICOM to support private elements. DICOM PS 3.5, section 7.8.1 states:
It is possible that multiple implementors may define Private Elements with the same (odd) group number. To avoid conflicts, Private Elements shall be assigned Private Data Element Tags according to the following rules.
a) Private Creator Data Elements numbered (gggg,0010-00FF) (gggg is odd) shall be used to reserve a block of Elements with Group Number gggg for use by an individual implementor. The implementor shall insert an identification code in the first unused (unassigned) Element in this series to reserve a block of Private Elements. The VR of the private identification code shall be LO (Long String) and the VM shall be equal to 1.
b) Private Creator Data Element (gggg,0010), is a Type 1 Data Element that identifies the implementor reserving element (gggg,1000-10FF), Private Creator Data Element (gggg,0011) identifies the implementor reserving elements (gggg,1100-11FF), and so on, until Private Creator Data Element (gggg,00FF) identifies the implementor reserving elements (gggg,FF00-FFFF).
c) Encoders of Private Data Elements shall be able to dynamically assign private data to any available (unreserved) block(s) within the Private group, and specify this assignment through the blocks corresponding Private Creator Data Element(s). Decoders of Private Data shall be able to accept reserved blocks with a given Private Creator identification code at any position within the Private group specified by the blocks corresponding Private Creator Data Element.
We will use data in group 0009 as a practical example. The table below shows an example of data that could be included in group 0009.
Private Creator Element
Density Standard Deviation
In the example, the element with tag (0009, 0010) is a private creator element with value "ACME". That reserves a block of elements for this manufacturer. The element (0009, 1001) is part of that block; the 10 in the element tag (1001) corresponds to the 10 that is in the tag of the Private Creator Element (0009, 0010).
This only becomes complex when different manufacturers want to use the same reserved block to store information. When this occurs in a single image, the creator of the image reserves a block (for example, 0010). When a second application wants to add data to that same group, it detects the block written by the creator and creates a separate block (for example, 0011). The creator is not required to start at block 0010, but that appears to be common practice. The second or third application is not required to use 0011 or 0012. Based on this encoding scheme, some observations are:
As discussed above, medical manufacturers include private elements in their DICOM images to convey information not defined in the DICOM Standard. This section documents the information we have gathered by reading appropriate conformance statements.
The sections below describe information by manufacturer. That information is encoded in files that describe the private elements created by those manufacturers. Those files are part of the run time environment of the Tag Sniffer and are maintained in our forge: https://mirgforge.wustl.edu/gf/project/dicomtagsniffer/scmsvn/?action=browse&path=%2Ftrunk%2Fdeploy%2Fprofiles%2Fdevice-profiles%2F
GE Discovery CT
GE Discovery MR
GE Discovery PT
GE HiSpeed CT
GE LightSpeed CT
GE Signa MR series
Philips Achieva MR series
Philips Aura CT
Philips Brilliance CT
Siemens Numaris MR
Siemens Syngo MR
Toshiba Aquilion CT
TCIA utilizes the RSNA Clinical Trials Processor (CTP) software in conjunction with caBIG's National Biomedical Imaging Archive (NBIA) to de‐identify and host the images in the archive. The Cancer Imaging Program's Informatics Team has been working closely with the developer of CTP since 2009 to incorporate support for this standard as it was being defined by WG18. A full summary and time line of this project can be found athttps://wiki.nci.nih.gov/display/CIP/Incorporation+of+DICOM+WG18+Supplement+142+into+CTP.
CTP provides an interface that allows application of any combination of the profiles to a set of images, and allows for application of an audit trail for retroactively tracking applied de‐identification. For images that are submitted to TCIA the staff begins with the Basic Application Confidentiality Profile (which is the most aggressive) in combination with the following options:
In order to simplify our ability to implement some of the "clean" instructions specified in DICOM PS 3.15 a new tool was developed to help inspect the contents of DICOM elements which allow free text entry by a technician and Private Tags for potential PHI. This tool scans a folder and included subfolders for DICOM objects and produces several different outputs that depend on the mode used and input profiles. The software reads each DICOM object and iterates through each public and private element. The software then uses the profiles below to determine whether to retain the value of the element for later inspection:
These outputs are relevant at different stages of the curation and image publication process.
We believe this tool might be useful to the rest of the research community and so it's been made freely available as an open source application. We have also created documentation for how a researcher could utilize in the context of their own projects.
The TCIA provides standards‐based curation support to ensure safe and thorough de‐identification of all images in the archive per federal HIPAA and HITECH regulations. In order to achieve this compliance without stripping the data of its scientific utility TCIA staff perform a redundant, thorough de‐identification and analysis procedure based on guidance provided by the industry experts in DICOM standards committee Working Group 18. Each collection submitted for publication is analyzed and de-identified as a whole using the steps listed below. All steps are completed before the collection is released for publication.
Only after this inspection is complete are the images made available to the general public. For general information on what to expect as an image provider please see our web site at http://www.cancerimagingarchive.net/provider.html.