Overview
Updated: 30 June, 2015
Excerpt |
---|
Following industry best-practices, TCIA uses a standards-based approach to de-identification of DICOM images to insure that images are free of protected health information (PHI). The TCIA de-identification process ensures that the HIPAA de-identification standard is met by following the Safe Harbor Method as defined in section 164.514(b)(2) of the HIPPA Privacy Rule. The standard for de-identification of DICOM objects is defined by the DICOM Standard PS 3.15-2011 Digital Imaging and Communications in Medicine (DICOM), Part 15: Security and System Management Profiles (ftp://medical.nema.org/medical/dicom/2016b/output/html/part15.html#chapter_E). At the submitting site, a DICOM PS 3.15 compliant script removes or modifies DICOM tags deemed to be unsafe (See table 1 for a complete listing). TCIA incorporates the “Basic Application Confidentiality Profile” which is amended by inclusion of the following profile options: Clean Pixel Data Option, Clean Descriptors Option, Retain Longitudinal With Modified Dates Option, Retain Patient Characteristics Option, Retain Device Identity Option, and Retain Safe Private Option. The de-identification rules applied to each object are recorded by TCIA in the DICOM sequence Method Code Sequence [0012,0063] by entering the Code Value, Coding Scheme Designator, and Code Meaning for each profile and option that were applied to the DICOM object during de-identification. The DICOM standard for de-identification of objects defines a minimum set of elements to de-identify to be in compliance with the standard. It is up to the user doing the de-identification to insure that PHI is removed or cleaned according to the laws and practices in place at the time de-identification occurs. |
Details
Base level de-identification
The Basic Application Confidentiality Profile requires that Patient Name and Patient ID are either blanked or modified. TCIA incorporates an ID mapping between the original Patient ID and the ID that the images will have within TCIA. The mapping table is created at the image submitting site, the mapping performed prior to the images leaving the sites host computer, and TCIA never sees the original Patient ID. The remapped Patient ID is also mapped to the Patient Name field. This is done for the case where a DICOM viewer or application being used by the TCIA user that downloaded the data would require a Patient Name to be present. To show that the Patient Identity has been removed, the term “YES” is written into DICOM tag 00120062 “PatientIdentityRemoved”.
In general, the Basic Application Profile specifies removal or modification of any tag that by definition would contain PHI that could be used either alone or together with other information to uniquely identify a subject. Removal of detailed geographic information, dates, exam identifiers, patient demographics, free text entry fields, vendor private tags, etc. are all done to minimize the possibility of being able to uniquely identify an individual. The options to the DICOM de-identification standard allows for retention of information to help make the data scientifically valuable, but as more options are added the chance of PHI is increased and a rigorous de-identification process must be followed.
Exam Identifiers - DICOM makes extensive use of universal identifiers (UID) that could be used to identify a subject if a user had access to the PACS system at the institution where the images originated. The Basic Application Confidentiality Profile requires that all UIDs be removed or modified. TCIA uses its own root UID, appends an 8 digit string in the form of xxxx.yyyy (where xxxx is related to the collection and yyyy is related to a submitting site) and then appends a hashed value of the original UID. UIDs have no special meaning other than serving as unique identifiers and the only reason TCIA adds the 8 digit string is to minimize the possibility of two images being assigned the same UID as images come from many different sites. This technique insures that images stay associated with the appropriate series, study, and subject as well as ensuring that referenced images between secondary capture images, structured reports, PET/CT, etc. are still valid references to images within TCIA. Any image resubmitted to TCIA will have the same UID to avoid the same image appearing twice with a different identifier. Original accession numbers are hashed with a 16 bit string to prevent linking of DICOM objects back to the submitting site.
Dates - The Retain Longitudinal With Modified Dates Option allows dates to be retained as long as they are modified from the original date. Date and Date-Time fields in TCIA DICOM image headers have been offset based on a random number, but the longitudinal relationship between dates is maintained. Therefore, a researcher won’t know the precise date the scan occurred, but if a follow up scan was performed 120 days later, that same 120 day difference between scans of a subject will exist in the TCIA images. Dates that occur in DICOM tags other than Date or Date-Time fields are removed. An example of this would be a date entered into the Series Description field. If the date is associated with a library for Code Meaning then that date is preserved as the date would be required to look up the meaning in the correct version of the library. To show that the dates have been modified, the term “MODIFIED” is written into DICOM tag 00280303 “LongitudinalTemporalInformationModified”.
Patient Demographics – The keep Patient Characteristics Option allows keeping some patient demographics for research purposes. The allowed fields are Patient’s Sex, Patient’s Age, Patient’s Size, Patient’s Weight, Ethnic Group, Smoking Status, and Pregnancy Status. If a subject is over 90 years of age, then the age must be listed as 90+. Allergies, Patient State (this is not where they live, rather their condition), Pre-Medication, and Special Needs are defined by the DICOM standard as “clean” and are kept by TCIA and examined for PHI along with all tags during curation. Other patient demographics such as birthdate, address, religious affiliations, etc. are removed or emptied.
The names of health care providers including staff, hospital name, assigned IDs etc. are removed from the DICOM objects in cases where there is enough detail to identify an individual or facility where the scan was done.
Free Text - The Clean Descriptors Option allows for DICOM tags where free text could be entered by a technician to be kept. The following tags fall under that option and are all kept, inspected, and cleaned of PHI by TCIA during the curation process: Allergies, Patient State, Study Description, Series Description, Admitting Diagnoses Description, Admitting Diagnoses Code Sequence, Derivation Description, Identifying Comments, Medical Alerts, Occupation, Additional Patient’s History, Patient Comments, Contrast Bolus Agent, Protocol Name, Acquisition Device Processing Description, Acquisition Comments, Acquisition Protocol Description, Contribution Description, Image Comments, Frame Comments, Reason for Study, Requested Procedure Description, Requested Contrast Agent, Study Comments, Discharge Diagnosis Description, Service Episode Description, Visit Comments, Scheduled Procedure Step Description, Performed Procedure Step Description, Comments on Performed Procedure Step, Requested Procedure Comments, Reason for Imaging Service Request, Imaging Service Request Comments, Interpretation Text, Interpretation Diagnosis Description, Impressions, and Results Comments. The TCIA de-identification script run at the submitting sites removes the field “Request Attributes Sequence” as that tag typically contains PHI and provides no scientific value. Many of these fields contain information valuable to research and are important to retain. For images that are submitted with missing Series Descriptions, TCIA will add text to Series Descriptions to help researchers during TCIA image searches. When a missing series description is encountered, TCIA staff will use the following approach: Enter “LOCALIZER” if the ImageType contains the word localizer; Enter “Contrast” and then append the value contained in Contrast Bolus Agent if a value is present; if Contrast Bolus Agent is missing or empty other tags will be examined to see if a series was scanned with contrast (The Image Comments field is often used by sites to denote contrast); if the Image is an MR then TCIA will map the Scanning Sequence parameters into the Series Description; if none of those conditions apply then TCIA will map Scan Options or simply enter “none” into the Series Description field.
Devices - The Retain Device Identity Option of the DICOM de-identification standard allows for the retention of information related to the scanner used. The option allows for the following relevant tags to be retained: Station Name, Device Serial Number, Device UID, Plate ID, Generator ID, Cassette ID, Gantry ID, Detector ID, Scheduled Study Location, Scheduled Study Location AE Title, Scheduled Station AE Title, Scheduled Station Name, Scheduled Procedure Step Location, Performed Station AE Title, Performed Station Name, Performed Station Name Code Sequence, Scheduled Station Name Code Sequence, Scheduled Station Geographic Location Code Sequence, and Performed Station Geographic Location Code Sequence. TCIA removes Station Name as part of its de-identification process as Station Name often contains information related to the site where the scan occurred. The other tags listed above are retained if they are found to be free of PHI after TCIA curation of the submitted DICOM objects.
Private Tags - When a submitting site sends DICOM data to TCIA all private tags are retained and then de-identified by TCIA during curation of the data according to the Retain Safe Private Option. The Retain Safe Private Option allows for the retention of DICOM tags stored in the private fields. These fields are extensively used by DICOM vendors to store information about the scans. To claim conformance to the DICOM standard the vendors must publish a DICOM conformance statement that defines the standard and private tags that are used by their particular equipment. These conformance statements are typically made available on the vendors website for download. Unfortunately, there are cases where vendors do not make the conformance statement for a piece of equipment publicly available or do not adequately define what is stored in the private tags. In TCIA the Private DICOM elements are de-identified according to the rules contained in a de-identification knowledge base maintained by the TCIA team at the University of Arkansas for Medical Sciences. This knowledge base defines rules for de-identification of private tags based on a vendor’s conformance statement for each scanner and software version. The manufacturer, manufacturer model, modality, and software version are extracted from each series submitted. The TCIA de-identification knowledge base is checked for a conformance statement matching these data. If not found, TCIA locates the conformance statement and adds it to the knowledge base. TCIA will remove any private tags from the images that are not specified in the conformance statement or are defined as containing a form of PHI such as name, SSN, etc. All date and datetime private tags that are retained are offset using the same offset as applied to the standard tags for the image. All private tags containing UIDs are assigned a TCIA root and appended with a hashed value as done with the standard tags. This ensures all references to other images contained within TCIA are maintained. A manual inspection of all private tags is performed using tagSniffer reports and any PHI that may be found is removed, emptied, date offset, or hashed as appropriate.
Body Part Examined - When images are made public, a single body part examined, corresponding to the cancer of interest, is assigned to all images. If the collection consists of sarcoma images (or any other cancer affecting multiple organs within the image collection), there may be multiple body parts assigned, though only one to any series. In phantom collections, body part examined is simply labeled “PHANTOM”.
All Tags - The TCIA de-identification process ensures that every DICOM tag of every DICOM object is free of the 18 forms of PHI as currently defined by the Safe Harbor Method. At the submitting site, a DICOM PS 3.15 compliant script removes or modifies DIOCM tags deemed to be unsafe (See table 1 for a complete listing). At TCIA, a software routine known as tagSniffer extracts every unique value found within a collection being curated and prints them to a report. This report is examined by curators and any actions necessary to remove PHI is applied when moving the images from the Intake server to the Public Server. Every DICOM image is inspected by curators for burned in PHI. Once the images reach the Public Server, the tags are inspected by two curators for PHI using new tagSniffer reports. Images are spot checked for any burned in PHI.
The following table details the de-identification performed at the submitting site by way of a TCIA supplied de-identification script.
Table 1
Tag | Name | Action |
00080050 | AccessionNumber | hash |
00184000 | AcquisitionComments | keep |
00400555 | AcquisitionContextSeq | remove |
00080022 | AcquisitionDate | incrementdate |
0008002a | AcquisitionDatetime | incrementdate |
00181400 | AcquisitionDeviceProcessingDescription | keep |
00189424 | AcquisitionProtocolDescription | keep |
00080032 | AcquisitionTime | keep |
00404035 | ActualHumanPerformersSequence | remove |
001021b0 | AdditionalPatientHistory | keep |
00380010 | AdmissionID | remove |
00380020 | AdmittingDate | incrementdate |
00081084 | AdmittingDiagnosesCodeSeq | keep |
00081080 | AdmittingDiagnosesDescription | keep |
00380021 | AdmittingTime | keep |
00102110 | Allergies | keep |
40000010 | Arbitrary | remove |
0040a078 | AuthorObserverSequence | remove |
00130010 | BlockOwner | CTP |
00180015 | BodyPartExamined | BODYPART |
00101081 | BranchOfService | remove |
00280301 | BurnedInAnnotation | keep |
00181007 | CassetteID | keep |
00400280 | CommentsOnPPS | keep |
00209161 | ConcatenationUID | hashuid |
00403001 | ConfidentialityPatientData | remove |
00700086 | ContentCreatorsIdCodeSeq | remove |
00700084 | ContentCreatorsName | empty |
00080023 | ContentDate | incrementdate |
0040a730 | ContentSeq | remove |
00080033 | ContentTime | keep |
0008010d | ContextGroupExtensionCreatorUID | hashuid |
00180010 | ContrastBolusAgent | keep |
0018a003 | ContributionDescription | keep |
00102150 | CountryOfResidence | remove |
00089123 | CreatorVersionUID | hashuid |
00380300 | CurrentPatientLocation | remove |
00080025 | CurveDate | incrementdate |
Group | curves | remove |
00080035 | CurveTime | keep |
0040a07c | CustodialOrganizationSeq | remove |
fffcfffc | DataSetTrailingPadding | remove |
00181200 | DateofLastCalibration | incrementdate |
0018700c | DateofLastDetectorCalibration | incrementdate |
00181012 | DateOfSecondaryCapture | incrementdate |
00120063 | DeIdentificationMethod | {Per DICOM PS 3.15 AnnexE. Details in 0012,0064} |
00120064 | DeIdentificationMethodCodeSequence | 113100/113101/113105/113107/113108/113109/113111 |
00082111 | DerivationDescription | keep |
0018700a | DetectorID | keep |
00181000 | DeviceSerialNumber | keep |
00181002 | DeviceUID | keep |
fffafffa | DigitalSignaturesSeq | remove |
04000100 | DigitalSignatureUID | remove |
00209164 | DimensionOrganizationUID | hashuid |
00380040 | DischargeDiagnosisDescription | keep |
4008011a | DistributionAddress | remove |
40080119 | DistributionName | remove |
300a0013 | DoseReferenceUID | hashuid |
00102160 | EthnicGroup | keep |
00080058 | FailedSOPInstanceUIDList | hashuid |
0070031a | FiducialUID | hashuid |
00402017 | FillerOrderNumber | empty |
00209158 | FrameComments | keep |
00200052 | FrameOfReferenceUID | hashuid |
00181008 | GantryID | keep |
00181005 | GeneratorID | keep |
00700001 | GraphicAnnotationSequence | remove |
00404037 | HumanPerformersName | remove |
00404036 | HumanPerformersOrganization | remove |
00880200 | IconImageSequence | remove |
00084000 | IdentifyingComments | keep |
00204000 | ImageComments | keep |
00284000 | ImagePresentationComments | remove |
00402400 | ImagingServiceRequestComments | keep |
40080300 | Impressions | keep |
00080012 | InstanceCreationDate | incrementdate |
00080014 | InstanceCreatorUID | hashuid |
00080081 | InstitutionAddress | remove |
00081040 | InstitutionalDepartmentName | remove |
00080082 | InstitutionCodeSequence | remove |
00080080 | InstitutionName | remove |
00101050 | InsurancePlanIdentification | remove |
00401011 | IntendedRecipientsOfResultsIDSequence | remove |
40080111 | InterpretationApproverSequence | remove |
4008010c | InterpretationAuthor | remove |
40080115 | InterpretationDiagnosisDescription | keep |
40080202 | InterpretationIdIssuer | remove |
40080102 | InterpretationRecorder | remove |
4008010b | InterpretationText | keep |
4008010a | InterpretationTranscriber | remove |
00083010 | IrradiationEventUID | hashuid |
00380011 | IssuerOfAdmissionID | remove |
00100021 | IssuerOfPatientID | remove |
00380061 | IssuerOfServiceEpisodeId | remove |
00281214 | LargePaletteColorLUTUid | hashuid |
001021d0 | LastMenstrualDate | incrementdate |
00280303 | LongitudinalTemporalInformationModified | MODIFIED |
04000404 | MAC | remove |
00080070 | Manufacturer | keep |
00081090 | ManufacturerModelName | keep |
00102000 | MedicalAlerts | keep |
00101090 | MedicalRecordLocator | remove |
00101080 | MilitaryRank | remove |
04000550 | ModifiedAttributesSequence | remove |
00203406 | ModifiedImageDescription | remove |
00203401 | ModifyingDeviceID | remove |
00203404 | ModifyingDeviceManufacturer | remove |
00081060 | NameOfPhysicianReadingStudy | remove |
00401010 | NamesOfIntendedRecipientsOfResults | remove |
00102180 | Occupation | keep |
00081070 | OperatorName | remove |
00081072 | OperatorsIdentificationSeq | remove |
00402010 | OrderCallbackPhoneNumber | remove |
00402008 | OrderEnteredBy | remove |
00402009 | OrderEntererLocation | remove |
04000561 | OriginalAttributesSequence | remove |
00101000 | OtherPatientIDs | remove |
00101002 | OtherPatientIDsSeq | remove |
00101001 | OtherPatientNames | remove |
00080024 | OverlayDate | incrementdate |
Group | overlays | remove |
00080034 | OverlayTime | keep |
00281199 | PaletteColorLUTUID | hashuid |
0040a07a | ParticipantSequence | remove |
00101040 | PatientAddress | remove |
00101010 | PatientAge | keep |
00100030 | PatientBirthDate | empty |
00101005 | PatientBirthName | remove |
00100032 | PatientBirthTime | remove |
00104000 | PatientComments | keep |
00100020 | PatientID | Re-Mapped |
00120062 | PatientIdentityRemoved | YES |
00380400 | PatientInstitutionResidence | remove |
00100050 | PatientInsurancePlanCodeSeq | remove |
00101060 | PatientMotherBirthName | remove |
00100010 | PatientName | Re-Mapped |
00102154 | PatientPhoneNumbers | remove |
00100101 | PatientPrimaryLanguageCodeSeq | remove |
00100102 | PatientPrimaryLanguageModifierCodeSeq | remove |
001021f0 | PatientReligiousPreference | remove |
00100040 | PatientSex | keep |
00102203 | PatientSexNeutered | keep |
00101020 | PatientSize | keep |
00380500 | PatientState | keep |
00401004 | PatientTransportArrangements | remove |
00101030 | PatientWeight | keep |
00400243 | PerformedLocation | remove |
00400241 | PerformedStationAET | keep |
00404030 | PerformedStationGeoLocCodeSeq | keep |
00400242 | PerformedStationName | keep |
00404028 | PerformedStationNameCodeSeq | keep |
00081052 | PerformingPhysicianIdSeq | remove |
00081050 | PerformingPhysicianName | remove |
00400250 | PerformProcedureStepEndDate | incrementdate |
00401102 | PersonAddress | remove |
00401101 | PersonIdCodeSequence | remove |
0040a123 | PersonName | empty |
00401103 | PersonTelephoneNumbers | remove |
40080114 | PhysicianApprovingInterpretation | remove |
00081048 | PhysicianOfRecord | remove |
00081049 | PhysicianOfRecordIdSeq | remove |
00081062 | PhysicianReadingStudyIdSeq | remove |
00402016 | PlaceOrderNumberOfImagingServiceReq | empty |
00181004 | PlateID | keep |
00400254 | PPSDescription | keep |
00400253 | PPSID | remove |
00400244 | PPSStartDate | incrementdate |
00400245 | PPSStartTime | keep |
001021c0 | PregnancyStatus | keep |
00400012 | PreMedication | keep |
Group | privategroups | keep |
00131010 | ProjectName | always |
00181030 | ProtocolName | keep |
00540016 | Radiopharmaceutical Information Sequence | process |
00181078 | Radiopharmaceutical Start DateTime | incrementdate |
00181079 | Radiopharmaceutical Stop DateTime | incrementdate |
00402001 | ReasonForImagingServiceRequest | keep |
00321030 | ReasonforStudy | keep |
04000402 | RefDigitalSignatureSeq | remove |
30060024 | ReferencedFrameOfReferenceUID | hashuid |
00380004 | ReferencedPatientAliasSeq | remove |
00080092 | ReferringPhysicianAddress | remove |
00080090 | ReferringPhysicianName | empty |
00080094 | ReferringPhysicianPhoneNumbers | remove |
00080096 | ReferringPhysiciansIDSeq | remove |
00404023 | RefGenPurposeSchedProcStepTransUID | hashuid |
00081140 | RefImageSeq | remove |
00081120 | RefPatientSeq | remove |
00081111 | RefPPSSeq | remove |
00081150 | RefSOPClassUID | keep |
04000403 | RefSOPInstanceMACSeq | remove |
00081155 | RefSOPInstanceUID | hashuid |
00081110 | RefStudySeq | remove |
00102152 | RegionOfResidence | remove |
300600c2 | RelatedFrameOfReferenceUID | hashuid |
00400275 | RequestAttributesSeq | remove |
00321070 | RequestedContrastAgent | keep |
00401400 | RequestedProcedureComments | keep |
00321060 | RequestedProcedureDescription | keep |
00401001 | RequestedProcedureID | remove |
00401005 | RequestedProcedureLocation | remove |
00321032 | RequestingPhysician | remove |
00321033 | RequestingService | remove |
00102299 | ResponsibleOrganization | remove |
00102297 | ResponsiblePerson | remove |
40084000 | ResultComments | keep |
40080118 | ResultsDistributionListSeq | remove |
40080042 | ResultsIDIssuer | remove |
300e0008 | ReviewerName | remove |
00404034 | ScheduledHumanPerformersSeq | remove |
0038001e | ScheduledPatientInstitutionResidence | remove |
0040000b | ScheduledPerformingPhysicianIDSeq | remove |
00400006 | ScheduledPerformingPhysicianName | remove |
00400001 | ScheduledStationAET | keep |
00404027 | ScheduledStationGeographicLocCodeSeq | keep |
00400010 | ScheduledStationName | keep |
00404025 | ScheduledStationNameCodeSeq | keep |
00321020 | ScheduledStudyLocation | keep |
00321021 | ScheduledStudyLocationAET | keep |
00321000 | ScheduledStudyStartDate | incrementdate |
00080021 | SeriesDate | incrementdate |
0008103e | SeriesDescription | keep |
0020000e | SeriesInstanceUID | hashuid |
00080031 | SeriesTime | keep |
00380062 | ServiceEpisodeDescription | keep |
00380060 | ServiceEpisodeID | remove |
00131013 | SiteID | SITEID |
00131012 | SiteName | SITENAME |
001021a0 | SmokingStatus | keep |
00181020 | SoftwareVersion | keep |
00080018 | SOPInstanceUID | hashuid |
00082112 | SourceImageSeq | remove |
00380050 | SpecialNeeds | keep |
00400007 | SPSDescription | keep |
00400004 | SPSEndDate | incrementdate |
00400005 | SPSEndTime | keep |
00400011 | SPSLocation | keep |
00400002 | SPSStartDate | incrementdate |
00400003 | SPSStartTime | keep |
00081010 | StationName | remove |
00880140 | StorageMediaFilesetUID | hashuid |
30060008 | StructureSetDate | incrementdate |
00321040 | StudyArrivalDate | incrementdate |
00324000 | StudyComments | keep |
00321050 | StudyCompletionDate | incrementdate |
00080020 | StudyDate | incrementdate |
00081030 | StudyDescription | keep |
00200010 | StudyID | empty |
00320012 | StudyIDIssuer | remove |
0020000d | StudyInstanceUID | hashuid |
00080030 | StudyTime | keep |
00200200 | SynchronizationFrameOfReferenceUID | hashuid |
0040db0d | TemplateExtensionCreatorUID | hashuid |
0040db0c | TemplateExtensionOrganizationUID | hashuid |
40004000 | TextComments | remove |
20300020 | TextString | remove |
00080201 | TimezoneOffsetFromUTC | remove |
00880910 | TopicAuthor | remove |
00880912 | TopicKeyWords | remove |
00880906 | TopicSubject | remove |
00880904 | TopicTitle | remove |
00081195 | TransactionUID | hashuid |
00131011 | TrialName | PROJECTNAME |
0040a124 | UID | hashuid |
Group | unspecifiedelements | keep |
0040a088 | VerifyingObserverIdentificationCodeSeq | remove |
0040a075 | VerifyingObserverName | empty |
0040a073 | VerifyingObserverSequence | remove |
0040a027 | VerifyingOrganization | remove |
00384000 | VisitComments | keep |
More Details regarding TCIA de-identification may be found at the following links: De-Identification Rules.
The Cancer Imaging Archive (TCIA) staff has accumulated a wealth of knowledge on best practices and procedures for DICOM image de-identification in the process of maintaining our archive. In order to share this information with the wider research community we are maintaining the following knowledge base. This is a living document and will continue to be updated as we learn from our experiences. If you have feedback or questions please contact us at feedback@cancerimagingarchive.net.
Background Information
DICOM Private Data Elements
It is desirable to retain DICOM private data elements that contain parameters describing the acquisition while removing elements containing PHI. Performing this task requires understanding the mechanism defined by DICOM to support private elements. DICOM PS 3.5, section 7.8.1 states:
It is possible that multiple implementors may define Private Elements with the same (odd) group number. To avoid conflicts, Private Elements shall be assigned Private Data Element Tags according to the following rules.
a) Private Creator Data Elements numbered (gggg,0010-00FF) (gggg is odd) shall be used to reserve a block of Elements with Group Number gggg for use by an individual implementor. The implementor shall insert an identification code in the first unused (unassigned) Element in this series to reserve a block of Private Elements. The VR of the private identification code shall be LO (Long String) and the VM shall be equal to 1.
b) Private Creator Data Element (gggg,0010), is a Type 1 Data Element that identifies the implementor reserving element (gggg,1000-10FF), Private Creator Data Element (gggg,0011) identifies the implementor reserving elements (gggg,1100-11FF), and so on, until Private Creator Data Element (gggg,00FF) identifies the implementor reserving elements (gggg,FF00-FFFF).
c) Encoders of Private Data Elements shall be able to dynamically assign private data to any available (unreserved) block(s) within the Private group, and specify this assignment through the blocks corresponding Private Creator Data Element(s). Decoders of Private Data shall be able to accept reserved blocks with a given Private Creator identification code at any position within the Private group specified by the blocks corresponding Private Creator Data Element.
We will use data in group 0009 as a practical example. The table below shows an example of data that could be included in group 0009.
Tag | Description | Value |
---|---|---|
0009, 0010 | Private Creator Element | ACME |
0009, 1001 | Average Density | 15.5 |
0009, 1002 | Density Standard Deviation | 2.2 |
In the example, the element with tag (0009, 0010) is a private creator element with value "ACME". That reserves a block of elements for this manufacturer. The element (0009, 1001) is part of that block; the 10 in the element tag (1001) corresponds to the 10 that is in the tag of the Private Creator Element (0009, 0010).
This only becomes complex when different manufacturers want to use the same reserved block to store information. When this occurs in a single image, the creator of the image reserves a block (for example, 0010). When a second application wants to add data to that same group, it detects the block written by the creator and creates a separate block (for example, 0011). The creator is not required to start at block 0010, but that appears to be common practice. The second or third application is not required to use 0011 or 0012. Based on this encoding scheme, some observations are:
- If a collection of images are produced by equipment from different manufacturers, you may have collisions in the sets of private elements you want to retain and discard. For example, element (0009, 1001) from manufacturer A may contain an important physical parameter while that same element from manufacturer B may contain PHI.
- If the collection has images that are created by an acquisition modality and are then modified by another application (PACS, workstation), a private group may have multiple reserved blocks. Also, one cannot assume that the original creator will have always chosen reserved block 0010.
DICOM Basic Attribute Confidentiality Profile
DICOM standards committee Working Group 18 wrote Supplement 142 that is now incorporated into the published DICOM Standard. The Attribute Confidentiality Profile (DICOM PS 3.15: Appendix E) provides a standard for image de-identification and a process with which to reduce the complexity involved in safely de‐identifying DICOM image data while providing flexibility for scenarios which necessitate preservation of certain information needed for quality control and analysis that is essential to research. This is achieved by providing a number of Application Level Confidentiality Profiles which includes a Basic Profile along with a number of Option Profiles. These profiles provide the necessary instructions for how to safely clean DICOM elements which may contain PHI. The DICOM Standard, including Part 15, is available at the NEMA web site: http://medical.nema.org/standard.html The original Supplement 142 guidance document can be obtained at ftp://medical.nema.org/medical/dicom/final/sup142_ft.doc. We recommend you use the published standard above as it will be updated with any change proposals.
Appendix E of PS 3.15 documents a system for protecting attributes. We quote a small section of the document.
The Attributes listed in Table E.1-1 for each profile are contained in Standard IODs, or may be contained in Standard Extended IODs. An implementation claiming conformance to an Application Level
Confidentiality Profile as a de-identifier shall protect or retain all instances of the Attributes listed in Table E.1-1, whether contained in the main dataset or embedded in an Item of a Sequence of Items. The following action codes are used in the table:
– D – replace with a non-zero length value that may be a dummy value and consistent with the VR
– Z – replace with a zero length value, or a non-zero length value that may be a dummy value and consistent with the VR
– X – remove
– K – keep (unchanged for non-sequence attributes, cleaned for sequences)
– C – clean, that is replace with values of similar meaning known not to contain identifying information and consistent with the VR
– U – replace with a non-zero length UID that is internally consistent within a set of Instances
– Z/D – Z unless D is required to maintain IOD conformance (Type 2 versus Type 1)
– X/Z – X unless Z is required to maintain IOD conformance (Type 3 versus Type 2)
– X/D – X unless D is required to maintain IOD conformance (Type 3 versus Type 1)
– X/Z/D – X unless Z or D is required to maintain IOD conformance (Type 3 versus Type 2 versus Type 1)
– X/Z/U* - X unless Z or replacement of contained instance UIDs (U) is required to maintain IOD conformance (Type 3 versus Type 2 versus Type 1 sequences containing UID references)
The Attributes listed in Table E.1-1 for each profile are contained in Standard IODs, or may be contained
in Standard Extended IODs. An implementation claiming conformance to an Application Level
Confidentiality Profile as a de-identifier shall protect or retain all instances of the Attributes listed in Table
E.1-1, whether contained in the main dataset or embedded in an Item of a Sequence of Items. The
following action codes are used in the table:
– D – replace with a non-zero length value that may be a dummy value and consistent with the VR
– Z – replace with a zero length value, or a non-zero length value that may be a dummy value and
consistent with the VR
– X – remove
– K – keep (unchanged for non-sequence attributes, cleaned for sequences)
– C – clean, that is replace with values of similar meaning known not to contain identifying
information and consistent with the VR
– U – replace with a non-zero length UID that is internally consistent within a set of Instances
– Z/D – Z unless D is required to maintain IOD conformance (Type 2 versus Type 1)
– X/Z – X unless Z is required to maintain IOD conformance (Type 3 versus Type 2)
– X/D – X unless D is required to maintain IOD conformance (Type 3 versus Type 1)
– X/Z/D – X unless Z or D is required to maintain IOD conformance (Type 3 versus Type 2 versus
Type 1)
– X/Z/U* - X unless Z or replacement of contained instance UIDs (U) is required to maintain IOD
conformance (Type 3 versus Type 2 versus Type 1 sequences containing UID references)
PS 3.15: E.2 then defines the Basic Application Level Confidentiality Profile which describes how to apply the scheme above with a number of options that determine the scope of protection that is provided. These definitions allow a system to follow a standard procedure and document in a standard way the behavior of that system.
Software Tools
CTP
TCIA utilizes the RSNA Clinical Trials Processor (CTP) software in conjunction with caBIG's National Biomedical Imaging Archive (NBIA) to de‐identify and host the images in the archive. The Cancer Imaging Program's Informatics Team has been working closely with the developer of CTP since 2009 to incorporate support for this standard as it was being defined by WG18. A full summary and time line of this project can be found athttps://wiki.nci.nih.gov/display/CIP/Incorporation+of+DICOM+WG18+Supplement+142+into+CTP.
CTP provides an interface that allows application of any combination of the profiles to a set of images, and allows for application of an audit trail for retroactively tracking applied de‐identification. For images that are submitted to TCIA the staff begins with the Basic Application Confidentiality Profile (which is the most aggressive) in combination with the following options:
- Clean Descriptors Option: Removal of identification information from descriptive tags which contain unstructured plain text values over which an operator has control
- Retain Modified Longitudinal Temporal Information Options: Modification of tags that contain dates or times
- Retain Patient Characteristics Option: Retention of physical characteristics of the patient that are descriptive rather than identifying information (e.g. metabolic measures, body weight, etc.)
- Retain Device Identity Option: Retention of information about the characteristics of the device used to perform the acquisition
- Retain Safe Private Option: Retention of Private Attributes confirmed not to contain PHI
DICOM Tag Sniffer
In order to simplify our ability to implement some of the "clean" instructions specified in DICOM PS 3.15 a new tool was developed to help inspect the contents of DICOM elements which allow free text entry by a technician and Private Tags for potential PHI. This tool scans a folder and included subfolders for DICOM objects and produces several different outputs that depend on the mode used and input profiles. The software reads each DICOM object and iterates through each public and private element. The software then uses the profiles below to determine whether to retain the value of the element for later inspection:
- Confidentiality Profile: One input profile corresponds to the entries in table E.1-1 in DICOM PS 3.15. We list the attributes in the table and the coded values according to the table entries. When scanning the DICOM objects, each public element is checked against the data in the profile. If the element is found in the profile, the software knows if it should record the element value for later inspection or if the software can ignore it. For example, if the DICOM profile indicates the element is to be deleted, there is no reason to review the value in that element.
- The Confidentiality Profile input is augmented with elements that are known to contain physical parameters such as rows, columns or pixel spacing. Rather than tell the software to ignore values with a specific value representation, we list those elements explicitly.
- Modality Software Profile: This input profile describes the private elements that are documented in the conformance statement by the manufacturer. This file takes into account the Private Creator Data Elements described above and has a code table for indicating program actions (record the value, ignore the value, ...)
These outputs are relevant at different stages of the curation and image publication process.
- Element Inventory: is the set of DICOM tags that are found in the image set. The tags include only the hexadecimal tags (xxxx, yyyy) and no values. All public and private tags are listed, but each is listed only once. The Confidentiality Profile and Modality Software Profile are not consulted as no values are retained for review.
- Element Values, Pre-Deidentification: We want to examine element values to determine how to configure CTP scripts for proper de-identification. As mentioned above, we want to retain as many elements as possible while not exposing PHI. We also do not want to review all element values in all DICOM objects. We use a Confidentiality Profile that corresponds to the DICOM Basic Application Confidentiality Profile and a Modality Software Profile that properly describes the private elements in the DICOM objects.
- Element Values, Final Review: In this mode, we want to review the values in the DICOM objects just before publication. We have de-identified the data and want to analyze the data as a final check. In this mode, we use a different Confidentiality Profile and different Modality Software Profile. For the Confidentiality Profile, we only list elements that we know are physical parameters (rows, columns, ....) and do not include the DICOM references from PS 3.15, Table E.1-1. That will direct the software to record the element values. Likewise, the Modality Software Profile used will direct the software to record all values for later analysis.
We believe this tool might be useful to the rest of the research community and so it's been made freely available as an open source application. We have also created documentation for how a researcher could utilize in the context of their own projects:
TCIA De-identification Work Flow
The TCIA provides standards‐based curation support to ensure safe and thorough de‐identification of all images in the archive per federal HIPAA and HITECH regulations. In order to achieve this compliance without stripping the data of its scientific utility TCIA staff perform a redundant, thorough de‐identification and analysis procedure based on guidance provided by the industry experts in DICOM standards committee Working Group 18.
After initial testing TCIA image curators individually inspect every image, both in the DICOM tags and the image pixels to ensure there is no PHI. Changes to the de‐identification procedure are made as appropriate to correct any potential issues found by our curation team. After the completion of the image submissions the curation team again inspects every image in the full data set to ensure regulatory compliance. Only after this inspection is complete are the images made available to the general public. For general information on what to expect as an image provider please see our web site at http://www.cancerimagingarchive.net/provider.html.
Software Tools for DICOM analysis
In order to simplify our ability to implement some of the "clean" instructions specified in Supplement 142 a new tool was developed to help inspect the contents of DICOM elements which allow free text entry by a technician and Private Tags for potential PHI. We believe this tool might be useful to the rest of the research community and so it's been made freely available as an open source application. We have also created documentation for how a researcher could utilize in the context of their own projects:
Additional tools are currently being developed and these will also be released and documented here once they are stable.