Overview
Updated: 30 June, 2015
Excerpt |
---|
Following industry best-practices, TCIA uses a standards-based approach to de-identification of DICOM images to insure that images are free of protected health information (PHI). The TCIA de-identification process ensures that the HIPAA de-identification standard is met by following the Safe Harbor Method as defined in section 164.514(b)(2) of the HIPPA Privacy Rule. The standard for de-identification of DICOM objects is defined by the DICOM Standard PS 3.15-2011 Digital Imaging and Communications in Medicine (DICOM), Part 15: Security and System Management Profiles (ftp://medical.nema.org/medical/dicom/2016b/output/html/part15.html#chapter_E). At the submitting site, a DICOM PS 3.15 compliant script removes or modifies DICOM tags deemed to be unsafe (See table 1 for a complete listing). TCIA incorporates the “Basic Application Confidentiality Profile” which is amended by inclusion of the following profile options: Clean Pixel Data Option, Clean Descriptors Option, Retain Longitudinal With Modified Dates Option, Retain Patient Characteristics Option, Retain Device Identity Option, and Retain Safe Private Option. The de-identification rules applied to each object are recorded by TCIA in the DICOM sequence Method Code Sequence [0012,0063] by entering the Code Value, Coding Scheme Designator, and Code Meaning for each profile and option that were applied to the DICOM object during de-identification. The DICOM standard for de-identification of objects defines a minimum set of elements to de-identify to be in compliance with the standard. It is up to the user doing the de-identification to insure that PHI is removed or cleaned according to the laws and practices in place at the time de-identification occurs. |
Details
Base level de-identification
The Basic Application Confidentiality Profile requires that Patient Name and Patient ID are either blanked or modified. TCIA incorporates an ID mapping between the original Patient ID and the ID that the images will have within TCIA. The mapping table is created at the image submitting site, the mapping performed prior to the images leaving the sites host computer, and TCIA never sees the original Patient ID. The remapped Patient ID is also mapped to the Patient Name field. This is done for the case where a DICOM viewer or application being used by the TCIA user that downloaded the data would require a Patient Name to be present. To show that the Patient Identity has been removed, the term “YES” is written into DICOM tag 00120062 “PatientIdentityRemoved”.
In general, the Basic Application Profile specifies removal or modification of any tag that by definition would contain PHI that could be used either alone or together with other information to uniquely identify a subject. Removal of detailed geographic information, dates, exam identifiers, patient demographics, free text entry fields, vendor private tags, etc. are all done to minimize the possibility of being able to uniquely identify an individual. The options to the DICOM de-identification standard allows for retention of information to help make the data scientifically valuable, but as more options are added the chance of PHI is increased and a rigorous de-identification process must be followed.
Exam Identifiers - DICOM makes extensive use of universal identifiers (UID) that could be used to identify a subject if a user had access to the PACS system at the institution where the images originated. The Basic Application Confidentiality Profile requires that all UIDs be removed or modified. TCIA uses its own root UID, appends an 8 digit string in the form of xxxx.yyyy (where xxxx is related to the collection and yyyy is related to a submitting site) and then appends a hashed value of the original UID. UIDs have no special meaning other than serving as unique identifiers and the only reason TCIA adds the 8 digit string is to minimize the possibility of two images being assigned the same UID as images come from many different sites. This technique insures that images stay associated with the appropriate series, study, and subject as well as ensuring that referenced images between secondary capture images, structured reports, PET/CT, etc. are still valid references to images within TCIA. Any image resubmitted to TCIA will have the same UID to avoid the same image appearing twice with a different identifier. Original accession numbers are hashed with a 16 bit string to prevent linking of DICOM objects back to the submitting site.
Dates - The Retain Longitudinal With Modified Dates Option allows dates to be retained as long as they are modified from the original date. Date and Date-Time fields in TCIA DICOM image headers have been offset based on a random number, but the longitudinal relationship between dates is maintained. Therefore, a researcher won’t know the precise date the scan occurred, but if a follow up scan was performed 120 days later, that same 120 day difference between scans of a subject will exist in the TCIA images. Dates that occur in DICOM tags other than Date or Date-Time fields are removed. An example of this would be a date entered into the Series Description field. If the date is associated with a library for Code Meaning then that date is preserved as the date would be required to look up the meaning in the correct version of the library. To show that the dates have been modified, the term “MODIFIED” is written into DICOM tag 00280303 “LongitudinalTemporalInformationModified”.
Patient Demographics – The keep Patient Characteristics Option allows keeping some patient demographics for research purposes. The allowed fields are Patient’s Sex, Patient’s Age, Patient’s Size, Patient’s Weight, Ethnic Group, Smoking Status, and Pregnancy Status. If a subject is over 90 years of age, then the age must be listed as 90+. Allergies, Patient State (this is not where they live, rather their condition), Pre-Medication, and Special Needs are defined by the DICOM standard as “clean” and are kept by TCIA and examined for PHI along with all tags during curation. Other patient demographics such as birthdate, address, religious affiliations, etc. are removed or emptied.
The names of health care providers including staff, hospital name, assigned IDs etc. are removed from the DICOM objects in cases where there is enough detail to identify an individual or facility where the scan was done.
Free Text - The Clean Descriptors Option allows for DICOM tags where free text could be entered by a technician to be kept. The following tags fall under that option and are all kept, inspected, and cleaned of PHI by TCIA during the curation process: Allergies, Patient State, Study Description, Series Description, Admitting Diagnoses Description, Admitting Diagnoses Code Sequence, Derivation Description, Identifying Comments, Medical Alerts, Occupation, Additional Patient’s History, Patient Comments, Contrast Bolus Agent, Protocol Name, Acquisition Device Processing Description, Acquisition Comments, Acquisition Protocol Description, Contribution Description, Image Comments, Frame Comments, Reason for Study, Requested Procedure Description, Requested Contrast Agent, Study Comments, Discharge Diagnosis Description, Service Episode Description, Visit Comments, Scheduled Procedure Step Description, Performed Procedure Step Description, Comments on Performed Procedure Step, Requested Procedure Comments, Reason for Imaging Service Request, Imaging Service Request Comments, Interpretation Text, Interpretation Diagnosis Description, Impressions, and Results Comments. The TCIA de-identification script run at the submitting sites removes the field “Request Attributes Sequence” as that tag typically contains PHI and provides no scientific value. Many of these fields contain information valuable to research and are important to retain. For images that are submitted with missing Series Descriptions, TCIA will add text to Series Descriptions to help researchers during TCIA image searches. When a missing series description is encountered, TCIA staff will use the following approach: Enter “LOCALIZER” if the ImageType contains the word localizer; Enter “Contrast” and then append the value contained in Contrast Bolus Agent if a value is present; if Contrast Bolus Agent is missing or empty other tags will be examined to see if a series was scanned with contrast (The Image Comments field is often used by sites to denote contrast); if the Image is an MR then TCIA will map the Scanning Sequence parameters into the Series Description; if none of those conditions apply then TCIA will map Scan Options or simply enter “none” into the Series Description field.
Devices - The Retain Device Identity Option of the DICOM de-identification standard allows for the retention of information related to the scanner used. The option allows for the following relevant tags to be retained: Station Name, Device Serial Number, Device UID, Plate ID, Generator ID, Cassette ID, Gantry ID, Detector ID, Scheduled Study Location, Scheduled Study Location AE Title, Scheduled Station AE Title, Scheduled Station Name, Scheduled Procedure Step Location, Performed Station AE Title, Performed Station Name, Performed Station Name Code Sequence, Scheduled Station Name Code Sequence, Scheduled Station Geographic Location Code Sequence, and Performed Station Geographic Location Code Sequence. TCIA removes Station Name as part of its de-identification process as Station Name often contains information related to the site where the scan occurred. The other tags listed above are retained if they are found to be free of PHI after TCIA curation of the submitted DICOM objects.
Private Tags - When a submitting site sends DICOM data to TCIA all private tags are retained and then de-identified by TCIA during curation of the data according to the Retain Safe Private Option. The Retain Safe Private Option allows for the retention of DICOM tags stored in the private fields. These fields are extensively used by DICOM vendors to store information about the scans. To claim conformance to the DICOM standard the vendors must publish a DICOM conformance statement that defines the standard and private tags that are used by their particular equipment. These conformance statements are typically made available on the vendors website for download. Unfortunately, there are cases where vendors do not make the conformance statement for a piece of equipment publicly available or do not adequately define what is stored in the private tags. In TCIA the Private DICOM elements are de-identified according to the rules contained in a de-identification knowledge base maintained by the TCIA team at the University of Arkansas for Medical Sciences. This knowledge base defines rules for de-identification of private tags based on a vendor’s conformance statement for each scanner and software version. The manufacturer, manufacturer model, modality, and software version are extracted from each series submitted. The TCIA de-identification knowledge base is checked for a conformance statement matching these data. If not found, TCIA locates the conformance statement and adds it to the knowledge base. TCIA will remove any private tags from the images that are not specified in the conformance statement or are defined as containing a form of PHI such as name, SSN, etc. All date and datetime private tags that are retained are offset using the same offset as applied to the standard tags for the image. All private tags containing UIDs are assigned a TCIA root and appended with a hashed value as done with the standard tags. This ensures all references to other images contained within TCIA are maintained. A manual inspection of all private tags is performed using tagSniffer reports and any PHI that may be found is removed, emptied, date offset, or hashed as appropriate.
Body Part Examined - When images are made public, a single body part examined, corresponding to the cancer of interest, is assigned to all images. If the collection consists of sarcoma images (or any other cancer affecting multiple organs within the image collection), there may be multiple body parts assigned, though only one to any series. In phantom collections, body part examined is simply labeled “PHANTOM”.
All Tags - The TCIA de-identification process ensures that every DICOM tag of every DICOM object is free of the 18 forms of PHI as currently defined by the Safe Harbor Method. At the submitting site, a DICOM PS 3.15 compliant script removes or modifies DIOCM tags deemed to be unsafe (See table 1 for a complete listing). At TCIA, a software routine known as tagSniffer extracts every unique value found within a collection being curated and prints them to a report. This report is examined by curators and any actions necessary to remove PHI is applied when moving the images from the Intake server to the Public Server. Every DICOM image is inspected by curators for burned in PHI. Once the images reach the Public Server, the tags are inspected by two curators for PHI using new tagSniffer reports. Images are spot checked for any burned in PHI.
The following table details the de-identification performed at the submitting site by way of a TCIA supplied de-identification script.
Table 1
Tag | Name | Action |
00080050 | AccessionNumber | hash |
00184000 | AcquisitionComments | keep |
00400555 | AcquisitionContextSeq | remove |
00080022 | AcquisitionDate | incrementdate |
0008002a | AcquisitionDatetime | incrementdate |
00181400 | AcquisitionDeviceProcessingDescription | keep |
00189424 | AcquisitionProtocolDescription | keep |
00080032 | AcquisitionTime | keep |
00404035 | ActualHumanPerformersSequence | remove |
001021b0 | AdditionalPatientHistory | keep |
00380010 | AdmissionID | remove |
00380020 | AdmittingDate | incrementdate |
00081084 | AdmittingDiagnosesCodeSeq | keep |
00081080 | AdmittingDiagnosesDescription | keep |
00380021 | AdmittingTime | keep |
00102110 | Allergies | keep |
40000010 | Arbitrary | remove |
0040a078 | AuthorObserverSequence | remove |
00130010 | BlockOwner | CTP |
00180015 | BodyPartExamined | BODYPART |
00101081 | BranchOfService | remove |
00280301 | BurnedInAnnotation | keep |
00181007 | CassetteID | keep |
00400280 | CommentsOnPPS | keep |
00209161 | ConcatenationUID | hashuid |
00403001 | ConfidentialityPatientData | remove |
00700086 | ContentCreatorsIdCodeSeq | remove |
00700084 | ContentCreatorsName | empty |
00080023 | ContentDate | incrementdate |
0040a730 | ContentSeq | remove |
00080033 | ContentTime | keep |
0008010d | ContextGroupExtensionCreatorUID | hashuid |
00180010 | ContrastBolusAgent | keep |
0018a003 | ContributionDescription | keep |
00102150 | CountryOfResidence | remove |
00089123 | CreatorVersionUID | hashuid |
00380300 | CurrentPatientLocation | remove |
00080025 | CurveDate | incrementdate |
Group | curves | remove |
00080035 | CurveTime | keep |
0040a07c | CustodialOrganizationSeq | remove |
fffcfffc | DataSetTrailingPadding | remove |
00181200 | DateofLastCalibration | incrementdate |
0018700c | DateofLastDetectorCalibration | incrementdate |
00181012 | DateOfSecondaryCapture | incrementdate |
00120063 | DeIdentificationMethod | {Per DICOM PS 3.15 AnnexE. Details in 0012,0064} |
00120064 | DeIdentificationMethodCodeSequence | 113100/113101/113105/113107/113108/113109/113111 |
00082111 | DerivationDescription | keep |
0018700a | DetectorID | keep |
00181000 | DeviceSerialNumber | keep |
00181002 | DeviceUID | keep |
fffafffa | DigitalSignaturesSeq | remove |
04000100 | DigitalSignatureUID | remove |
00209164 | DimensionOrganizationUID | hashuid |
00380040 | DischargeDiagnosisDescription | keep |
4008011a | DistributionAddress | remove |
40080119 | DistributionName | remove |
300a0013 | DoseReferenceUID | hashuid |
00102160 | EthnicGroup | keep |
00080058 | FailedSOPInstanceUIDList | hashuid |
0070031a | FiducialUID | hashuid |
00402017 | FillerOrderNumber | empty |
00209158 | FrameComments | keep |
00200052 | FrameOfReferenceUID | hashuid |
00181008 | GantryID | keep |
00181005 | GeneratorID | keep |
00700001 | GraphicAnnotationSequence | remove |
00404037 | HumanPerformersName | remove |
00404036 | HumanPerformersOrganization | remove |
00880200 | IconImageSequence | remove |
00084000 | IdentifyingComments | keep |
00204000 | ImageComments | keep |
00284000 | ImagePresentationComments | remove |
00402400 | ImagingServiceRequestComments | keep |
40080300 | Impressions | keep |
00080012 | InstanceCreationDate | incrementdate |
00080014 | InstanceCreatorUID | hashuid |
00080081 | InstitutionAddress | remove |
00081040 | InstitutionalDepartmentName | remove |
00080082 | InstitutionCodeSequence | remove |
00080080 | InstitutionName | remove |
00101050 | InsurancePlanIdentification | remove |
00401011 | IntendedRecipientsOfResultsIDSequence | remove |
40080111 | InterpretationApproverSequence | remove |
4008010c | InterpretationAuthor | remove |
40080115 | InterpretationDiagnosisDescription | keep |
40080202 | InterpretationIdIssuer | remove |
40080102 | InterpretationRecorder | remove |
4008010b | InterpretationText | keep |
4008010a | InterpretationTranscriber | remove |
00083010 | IrradiationEventUID | hashuid |
00380011 | IssuerOfAdmissionID | remove |
00100021 | IssuerOfPatientID | remove |
00380061 | IssuerOfServiceEpisodeId | remove |
00281214 | LargePaletteColorLUTUid | hashuid |
001021d0 | LastMenstrualDate | incrementdate |
00280303 | LongitudinalTemporalInformationModified | MODIFIED |
04000404 | MAC | remove |
00080070 | Manufacturer | keep |
00081090 | ManufacturerModelName | keep |
00102000 | MedicalAlerts | keep |
00101090 | MedicalRecordLocator | remove |
00101080 | MilitaryRank | remove |
04000550 | ModifiedAttributesSequence | remove |
00203406 | ModifiedImageDescription | remove |
00203401 | ModifyingDeviceID | remove |
00203404 | ModifyingDeviceManufacturer | remove |
00081060 | NameOfPhysicianReadingStudy | remove |
00401010 | NamesOfIntendedRecipientsOfResults | remove |
00102180 | Occupation | keep |
00081070 | OperatorName | remove |
00081072 | OperatorsIdentificationSeq | remove |
00402010 | OrderCallbackPhoneNumber | remove |
00402008 | OrderEnteredBy | remove |
00402009 | OrderEntererLocation | remove |
04000561 | OriginalAttributesSequence | remove |
00101000 | OtherPatientIDs | remove |
00101002 | OtherPatientIDsSeq | remove |
00101001 | OtherPatientNames | remove |
00080024 | OverlayDate | incrementdate |
Group | overlays | remove |
00080034 | OverlayTime | keep |
00281199 | PaletteColorLUTUID | hashuid |
0040a07a | ParticipantSequence | remove |
00101040 | PatientAddress | remove |
00101010 | PatientAge | keep |
00100030 | PatientBirthDate | empty |
00101005 | PatientBirthName | remove |
00100032 | PatientBirthTime | remove |
00104000 | PatientComments | keep |
00100020 | PatientID | Re-Mapped |
00120062 | PatientIdentityRemoved | YES |
00380400 | PatientInstitutionResidence | remove |
00100050 | PatientInsurancePlanCodeSeq | remove |
00101060 | PatientMotherBirthName | remove |
00100010 | PatientName | Re-Mapped |
00102154 | PatientPhoneNumbers | remove |
00100101 | PatientPrimaryLanguageCodeSeq | remove |
00100102 | PatientPrimaryLanguageModifierCodeSeq | remove |
001021f0 | PatientReligiousPreference | remove |
00100040 | PatientSex | keep |
00102203 | PatientSexNeutered | keep |
00101020 | PatientSize | keep |
00380500 | PatientState | keep |
00401004 | PatientTransportArrangements | remove |
00101030 | PatientWeight | keep |
00400243 | PerformedLocation | remove |
00400241 | PerformedStationAET | keep |
00404030 | PerformedStationGeoLocCodeSeq | keep |
00400242 | PerformedStationName | keep |
00404028 | PerformedStationNameCodeSeq | keep |
00081052 | PerformingPhysicianIdSeq | remove |
00081050 | PerformingPhysicianName | remove |
00400250 | PerformProcedureStepEndDate | incrementdate |
00401102 | PersonAddress | remove |
00401101 | PersonIdCodeSequence | remove |
0040a123 | PersonName | empty |
00401103 | PersonTelephoneNumbers | remove |
40080114 | PhysicianApprovingInterpretation | remove |
00081048 | PhysicianOfRecord | remove |
00081049 | PhysicianOfRecordIdSeq | remove |
00081062 | PhysicianReadingStudyIdSeq | remove |
00402016 | PlaceOrderNumberOfImagingServiceReq | empty |
00181004 | PlateID | keep |
00400254 | PPSDescription | keep |
00400253 | PPSID | remove |
00400244 | PPSStartDate | incrementdate |
00400245 | PPSStartTime | keep |
001021c0 | PregnancyStatus | keep |
00400012 | PreMedication | keep |
Group | privategroups | keep |
00131010 | ProjectName | always |
00181030 | ProtocolName | keep |
00540016 | Radiopharmaceutical Information Sequence | process |
00181078 | Radiopharmaceutical Start DateTime | incrementdate |
00181079 | Radiopharmaceutical Stop DateTime | incrementdate |
00402001 | ReasonForImagingServiceRequest | keep |
00321030 | ReasonforStudy | keep |
04000402 | RefDigitalSignatureSeq | remove |
30060024 | ReferencedFrameOfReferenceUID | hashuid |
00380004 | ReferencedPatientAliasSeq | remove |
00080092 | ReferringPhysicianAddress | remove |
00080090 | ReferringPhysicianName | empty |
00080094 | ReferringPhysicianPhoneNumbers | remove |
00080096 | ReferringPhysiciansIDSeq | remove |
00404023 | RefGenPurposeSchedProcStepTransUID | hashuid |
00081140 | RefImageSeq | remove |
00081120 | RefPatientSeq | remove |
00081111 | RefPPSSeq | remove |
00081150 | RefSOPClassUID | keep |
04000403 | RefSOPInstanceMACSeq | remove |
00081155 | RefSOPInstanceUID | hashuid |
00081110 | RefStudySeq | remove |
00102152 | RegionOfResidence | remove |
300600c2 | RelatedFrameOfReferenceUID | hashuid |
00400275 | RequestAttributesSeq | remove |
00321070 | RequestedContrastAgent | keep |
00401400 | RequestedProcedureComments | keep |
00321060 | RequestedProcedureDescription | keep |
00401001 | RequestedProcedureID | remove |
00401005 | RequestedProcedureLocation | remove |
00321032 | RequestingPhysician | remove |
00321033 | RequestingService | remove |
00102299 | ResponsibleOrganization | remove |
00102297 | ResponsiblePerson | remove |
40084000 | ResultComments | keep |
40080118 | ResultsDistributionListSeq | remove |
40080042 | ResultsIDIssuer | remove |
300e0008 | ReviewerName | remove |
00404034 | ScheduledHumanPerformersSeq | remove |
0038001e | ScheduledPatientInstitutionResidence | remove |
0040000b | ScheduledPerformingPhysicianIDSeq | remove |
00400006 | ScheduledPerformingPhysicianName | remove |
00400001 | ScheduledStationAET | keep |
00404027 | ScheduledStationGeographicLocCodeSeq | keep |
00400010 | ScheduledStationName | keep |
00404025 | ScheduledStationNameCodeSeq | keep |
00321020 | ScheduledStudyLocation | keep |
00321021 | ScheduledStudyLocationAET | keep |
00321000 | ScheduledStudyStartDate | incrementdate |
00080021 | SeriesDate | incrementdate |
0008103e | SeriesDescription | keep |
0020000e | SeriesInstanceUID | hashuid |
00080031 | SeriesTime | keep |
00380062 | ServiceEpisodeDescription | keep |
00380060 | ServiceEpisodeID | remove |
00131013 | SiteID | SITEID |
00131012 | SiteName | SITENAME |
001021a0 | SmokingStatus | keep |
00181020 | SoftwareVersion | keep |
00080018 | SOPInstanceUID | hashuid |
00082112 | SourceImageSeq | remove |
00380050 | SpecialNeeds | keep |
00400007 | SPSDescription | keep |
00400004 | SPSEndDate | incrementdate |
00400005 | SPSEndTime | keep |
00400011 | SPSLocation | keep |
00400002 | SPSStartDate | incrementdate |
00400003 | SPSStartTime | keep |
00081010 | StationName | remove |
00880140 | StorageMediaFilesetUID | hashuid |
30060008 | StructureSetDate | incrementdate |
00321040 | StudyArrivalDate | incrementdate |
00324000 | StudyComments | keep |
00321050 | StudyCompletionDate | incrementdate |
00080020 | StudyDate | incrementdate |
00081030 | StudyDescription | keep |
00200010 | StudyID | empty |
00320012 | StudyIDIssuer | remove |
0020000d | StudyInstanceUID | hashuid |
00080030 | StudyTime | keep |
00200200 | SynchronizationFrameOfReferenceUID | hashuid |
0040db0d | TemplateExtensionCreatorUID | hashuid |
0040db0c | TemplateExtensionOrganizationUID | hashuid |
40004000 | TextComments | remove |
20300020 | TextString | remove |
00080201 | TimezoneOffsetFromUTC | remove |
00880910 | TopicAuthor | remove |
00880912 | TopicKeyWords | remove |
00880906 | TopicSubject | remove |
00880904 | TopicTitle | remove |
00081195 | TransactionUID | hashuid |
00131011 | TrialName | PROJECTNAME |
0040a124 | UID | hashuid |
Group | unspecifiedelements | keep |
0040a088 | VerifyingObserverIdentificationCodeSeq | remove |
0040a075 | VerifyingObserverName | empty |
0040a073 | VerifyingObserverSequence | remove |
0040a027 | VerifyingOrganization | remove |
00384000 | VisitComments | keep |
More Details regarding TCIA de-identification may be found at the following links: De-Identification Rules.
Excerpt |
---|
The Cancer Imaging Archive (TCIA) adheres strictly to Health Insurance Portability and Accountability Act (HIPAA) regulations. Since publicly accessible databases and image archives using actual human images must not contain personal health information (PHI), de-identification is used to cleanse Digital Imaging and Communications in Medicine (DICOM) images so no PHI remains. |
TCIA staff have accumulated a wealth of knowledge on best practices and procedures for DICOM image de-identification. We’re sharing this information with the wider research community by maintaining this knowledge base as a living document that will be updated as we learn from our experiences. If you have feedback or questions, please contact us at feedback@cancerimagingarchive.net.
DICOM Basic Attribute Confidentiality Profile
The DICOM standards committee Working Group 18 (WG18) wrote Supplement 142, now incorporated into the published DICOM Standard. The Attribute Confidentiality Profile (DICOM PS 3.15: Appendix E) provides a standard for image de-identification, reducing the complexity involved in safely de-identifying DICOM image data while retaining the flexibility to preserve certain information for essential quality control and analysis. Application Level Confidentiality Profiles, which include a Basic Profile along with a number of Option Profiles, provide instructions for how to safely clean DICOM elements which may contain PHI. The DICOM Standard, including Part 15, is available at the NEMA web site. We recommend using the published standard above as it will be updated with any change proposals. We have also ported the contents of Table E.1-1 into XLS format for easy access.
Appendix E of PS 3.15 documents a system for protecting attributes; a section is reproduced below:
The Attributes listed in Table E.1-1 for each profile are contained in Standard IODs, or may be contained in Standard Extended IODs. An implementation claiming conformance to an Application Level
Confidentiality Profile as a de-identifier shall protect or retain all instances of the Attributes listed in Table E.1-1, whether contained in the main dataset or embedded in an Item of a Sequence of Items. The following action codes are used in the table:
– D – replace with a non-zero length value that may be a dummy value and consistent with the VR
– Z – replace with a zero length value, or a non-zero length value that may be a dummy value and consistent with the VR
– X – remove
– K – keep (unchanged for non-sequence attributes, cleaned for sequences)
– C – clean, that is replace with values of similar meaning known not to contain identifying information and consistent with the VR
– U – replace with a non-zero length UID that is internally consistent within a set of Instances
– Z/D – Z unless D is required to maintain IOD conformance (Type 2 versus Type 1)
– X/Z – X unless Z is required to maintain IOD conformance (Type 3 versus Type 2)
– X/D – X unless D is required to maintain IOD conformance (Type 3 versus Type 1)
– X/Z/D – X unless Z or D is required to maintain IOD conformance (Type 3 versus Type 2 versus Type 1)
– X/Z/U* - X unless Z or replacement of contained instance UIDs (U) is required to maintain IOD conformance (Type 3 versus Type 2 versus Type 1 sequences containing UID references)
PS 3.15: E.2 then defines the Basic Application Level Confidentiality Profile, which describes how to apply the scheme above with a number of options that determine the level of protection provided. These definitions allow a system to follow a standard procedure and document the behavior of that system in a standard way.
DICOM Private Data Elements
We typically retain DICOM private data elements containing parameters that describe the acquisition while removing PHI. Performing this task requires understanding the mechanism defined by DICOM to support private elements. DICOM PS 3.5, section 7.8.1 states:
It is possible that multiple implementors may define Private Elements with the same (odd) group number. To avoid conflicts, Private Elements shall be assigned Private Data Element Tags according to the following rules.
...
We use data in group 0009 as an example in the table below:
Tag | Description | Value |
0009, 0010 | Private Creator Element | ACME |
0009, 1001 | Average Density | 15.5 |
0009, 1002 | Density Standard Deviation | 2.2 |
In the example, the element with tag (0009, 0010) is a private creator element with value "ACME". That reserves a block of elements for this manufacturer. The element (0009, 1001) is part of that block; the 10 in the element tag (1001) corresponds to the 10 in the tag of the Private Creator Element (0009, 0010).
This becomes complex when different manufacturers want to use the same reserved block to store information. When this occurs in a single image, the creator of the image reserves a block (for example, 0010). When a second application wants to add data to that same group, it detects the block written by the creator and creates a separate block (for example, 0011). The creator is not required to start at block 0010, but that appears to be common practice. The second or third application is not required to use 0011 or 0012. Based on this encoding scheme, some observations are:
...
DICOM Tag Sniffer
To simplify implementation of "clean" instructions specified in DICOM PS 3.15, a new tool was developed to help inspect the contents of DICOM elements, allowing free text entry by a technician and Private Tags for potential PHI. This tool scans a folder, includes subfolders for DICOM objects, and produces several different outputs that depend on the mode used and input profiles. The software reads each DICOM object and iterates through each public and private element. The software then uses the profiles below to determine whether or not to retain the value of the element for later inspection:
- Confidentiality Profile: One input profile corresponds to the entries in table E.1-1 in DICOM PS 3.15. We list the attributes in the table and coded values according to table entries. When scanning DICOM objects, each public element is checked against the data in the profile. If the element is found in the profile, the software knows whether to record the element value for later inspection or ignore it. For example, if the DICOM profile indicates the element is to be deleted, there is no reason to review the value in that element.
- The Confidentiality Profile input is augmented with elements known to contain physical parameters such as rows, columns, or pixel spacing. Rather than tell the software to ignore values with a specific value representation, we list those elements explicitly.
- Modality Software Profile: This input profile describes private elements documented in the conformance statement by the manufacturer. This file takes into account the Private Creator Data Elements described above and has a code table for indicating program actions (record the value, ignore the value, etc).
These outputs are relevant at different stages of the curation and image publication process:
- Element Inventory: The element inventory consists of a set of DICOM tags found in the image set. The DICOM tags include only hexadecimal tags (xxxx, yyyy) and no values. All public and private tags are listed once. The Confidentiality Profile and Modality Software Profile are not consulted, as no values are retained for review.
- Element Values Before De-identification: Element values are examined to determine how to configure CTP scripts for proper de-identification. As mentioned above, we want to retain as many elements as possible while not exposing PHI. We also do not want to review all element values in all DICOM objects. We use a Confidentiality Profile that corresponds to the DICOM Basic Application Confidentiality Profile and a Modality Software Profile that properly describes private elements in DICOM objects.
- Element Values, Final Review: To review the values in DICOM objects after data is de-identified, as a final check before publication, this mode uses a different Confidentiality Profile and a different Modality Software Profile. For the Confidentiality Profile, we only list elements that are known physical parameters (rows, columns, etc.) and do not include the DICOM references from PS 3.15, Table E.1-1. That directs the software to record the element values. Likewise, the Modality Software Profile directs the software to record all values for later analysis.
This tool can be useful to the rest of the research community, so it has been made freely available as an open source application. We have also created documentation for its use in the context of a researcher’s own projects.
Private Element Knowledge Base Query Application
Data recorded in the documents above are also available through a web-based application with query capabilities. Researchers who obtain images through TCIA or by other means are welcome to search the database to find definitions for private elements.
Manufacturer Specific Private Tags
As discussed above, medical manufacturers include private elements in their DICOM images to convey information not defined in the DICOM Standard. This section documents the information we have gathered from conformance statements.
The sections below describe information by manufacturer. That information is encoded in files that describe the private elements created by those manufacturers. Those files are part of the run-time environment of the Tag Sniffer and are maintained in our forge:
The information in the documents below is also available through a web based tool with query functions. That tool is found here:
GE Medical Systems
Document | Date |
---|---|
12/26/2012 | |
12/24/2012 | |
4/11/2012 | |
12/27/2012 | |
12/27/2012 | |
12/27/2012 | |
7/12/2012 | |
3/28/2012 |
Philips
Document | Date |
---|---|
12/30/2012 | |
7/17/2012 | |
7/16/2012 |
Siemens
Document | Date |
---|---|
7/6/2012 | |
7/17/2012 | |
12/29/2012 |
Toshiba
Document | Date |
---|---|
7/6/2012 | |
7/18/2012 |
Software Tools
CTP
- Download the software
- Sample configuration files
- Read the documentation
- Join the User Group mailing list
- Extending/Contributing CTP source code
TCIA utilizes the Radiological Society of North America (RSNA) CTP software in conjunction with caBIG's National Biomedical Imaging Archive (NBIA) to de-identify and host the images in the archive. The Cancer Imaging Program's Informatics Team has been working closely with the developer of CTP since 2009 to incorporate support for this standard as it was being defined by WG18. Find the summary and time line of this project here: https://wiki.nci.nih.gov/display/CIP/Incorporation+of+DICOM+WG18+Supplement+142+into+CTP.
CTP provides an interface that allows application of any combination of profiles to a set of images, and allows for an audit trail to retroactively track applied de-identification. When images are submitted to TCIA, the staff begins with the Basic Application Confidentiality Profile (which is the most aggressive) in combination with the following options:
...
TCIA De-identification Work Flow
TCIA provides standards-based curation support to ensure safe and thorough de-identification of all images in the archive per federal HIPAA and Health Information Technology for Economic and Clinical Health (HITECH) Act regulations. To achieve this compliance without stripping the data of its scientific utility, TCIA performs a thorough de-identification and analysis procedure based on guidance from the DICOM standards committee WG18. Each collection submitted for publication is analyzed and de-identified as a whole using the steps listed below. All steps are completed before the collection is released for publication.
- Each image in the collection is visually inspected to guarantee no PHI is burned into the pixel data.
- Tag Sniffer is used to review the collection and produce an Element Inventory annotated with data from the DICOM Basic Application Confidentiality Profile and our set of Modality Software Profiles. This produces the list of DICOM elements found in the collection with a simple annotation scheme:
- One of the Basic Application Confidentiality Profile codes that indicates the DICOM scheme for de-identification (if the element is listed by DICOM).
- A simple code from our Modality Software Profile (No PHI: Retain, PHI: Delete, Not Sure: Review).
- No code, indicating the element is not registered.
- Pre-identification output of the Tag Sniffer is also generated, containing the set of elements in the collection and all values that need to be reviewed for PHI. If the Basic Application Confidentiality Profile or applicable Modality Software Profile indicates the attribute is to be cleaned or that the attribute is a physical parameter that does not contain PHI, there is no need to review that element at this step. We know that our de-identification script will process the element properly.
- Information from steps 2 and 3 is combined to create a CTP de-identification script for the collection. In the event of multiple scanners from different manufacturers, we might create and apply different scripts based on manufacturer.
- The CTP de-identification script (or scripts) is (are) applied to the image collection and a separate copy of the images is created, retaining the original set in case we need to repeat a step.
- Tag Sniffer is used to review the de-identified images and create the Final Review Output. This more complete output is reviewed by analysts to guarantee no PHI is carried forward after de-identification. Both public and private elements are included in the output for review.
- If any errors are detected in de-identification in step 6, the CTP script is adjusted and the image set is processed again starting at step 5.
Only after this inspection is complete are the images made available to the general public. For information on what to expect as an image provider, please see our web site at http://www.cancerimagingarchive.net/provider.html.
Background Information
Here are some presentations and papers which provide an overview on various aspects of DICOM de-identification and the official Supplement 142 de-identification standards:
...