This document is intended to provide details of The Cancer Imaging Archive's (TCIA) protocol for data collection, de-identification and curation so that submitting sites are comfortable with the protocol prior to agreeing to use the established procedures to accomplish these activities. The purpose of this protocol is to assure the research community that this process is implemented under the supervision of the University of Arkansas for Medical Sciences (UAMS) Institutional Review Board (IRB # 205568).
Private Tags - Unfortunately, there are many cases where vendors do not make the conformance statement for a piece of equipment publicly available or do not adequately define what is stored in the private tags, but these fields are extensively used by DICOM vendors to store information about the scans which are sometimes necessary for researchers to utilize the data. When a submitting site sends DICOM data to TCIA all private tags are retained and then de-identified by TCIA during curation of the data according to the Retain Safe Private Option. The Retain Safe Private Option allows for the retention of DICOM tags stored in the private fields. TCIA uses a private tag dictionary maintained by the Posda curation toolkit to decide the disposition of a vendor written private tag. The Posda private tag dictionary is a compilation of 4 well-known private tag dictionaries, the TCIA De-identification Knowledge Base (DeID KB), Grassroots DICOM, DICOM3tools, and DCMTK. The addition of the other 3 private tag dictionaries allows for an expanded set of scientifically useful tags to be retained. To implement the new Posda private tag dictionary, TCIA resolved any discrepancies between the 4 included dictionaries and assigned dispositions to all private tags ever seen by Posda. Unique values seen within the private tags were inspected to ensure that dispositions were correctly assigned. If a new private tag is encountered in the Posda database that does not have a private tag disposition, values are inspected in relation to the tag description together with values in the tags and a disposition is assigned. If there is no existing private tag description, an attempt is made to find a manufacturer’s definition of the tag. If no such description can be found the disposition is defined to remove the tag. TCIA will remove any private tags from the images that are not specified in the private tag dictionary or are defined as containing a form of PHI such as name, SSN, etc. All date and datetime private tags that are retained are offset using the same offset as applied to the standard tags for the image. All private tags containing UIDs are assigned a TCIA root and appended with a hashed value as done with the standard tags. This ensures all references to other images contained within TCIA are maintained. A manual inspection of all private tags is performed using tagSniffer reports and any PHI that may be found is removed, emptied, date offset, or hashed as appropriate.
Body Part Examined - When images are made public, a single body part examined, corresponding to the cancer of interest, is assigned to all images. If the collection consists of sarcoma images (or any other cancer affecting multiple organs within the image collection), there may be multiple body parts assigned, though only one to any series. In phantom collections, body part examined is simply labeled “PHANTOM”.