At the National Cancer Institute's (NCI) direction, personnel from The Cancer Imaging Archive (TCIA) collect and curate clinical and pre-clinical (animal studies) Radiology and Pathology images, clinical trial data (including patient demographics and clinical outcomes), annotations and image derived features and other types of clinical research data (e.g., gene expression profiles). Data comes from NIH programs, funded research and clinical trials. The ultimate goal of TCIA is to make the data publicly available to increase transparency and reproducibility in cancer imaging research. To facilitate this goal, TCIA provides data de-identification, curation and hosting services to remove these burdens from individual investigators and institutions.
This document is intended to provide details of The Cancer Imaging ArchiveTCIA's (TCIA) protocol for data collection, de-identification and curation so that submitting sites are comfortable with the protocol prior to agreeing to use the established procedures to accomplish these activities. This The Department of Biomedical Informatics at UAMS hosts The Cancer Imaging Archive for the National Cancer Institute (NCI), therefore this process is implemented under the supervision of the University of Arkansas for Medical Sciences (UAMS) Institutional Review Board (IRB # 205568).The Department of Biomedical Informatics at UAMS hosts The Cancer Imaging Archive for the
Obtaining permission to share your data
The value of TCIA increases as we receive new data sharing proposals from the research community. Researchers with the following objectives are encouraged to submit an application to publish their data:
- Meeting the data sharing requirements set forth by the National Cancer Institute (NCI)
- for grant or contract awards
- Meeting the data sharing requirements set forth by a peer reviewed journal for publication
- Sharing data that could stimulate discoveries in emerging areas of cancer imaging research (e.g. radiogenomics,
- Sharing data to be used in challenge competitions or for benchmarking and validating analysis techniques in image processing
We do not charge a fee for sharing your data through TCIA except in rare circumstances where proposals are extremely large. TCIA is funded by the National Cancer Institute, therefore all applications must have relevance to cancer research. Applications are reviewed monthly by the TCIA Advisory Group to assess their utility to the TCIA user community. A strong preference is given to data sets which can be fully public and do not impose any special access or usage restrictions. Proposals which contain supporting non-image data (e.g. patient outcomes, training classifiers/labels, tumor segmentations) are highly preferred to those which lack these characteristics.
If approved, submitting sites must sign our TCIA UAMS Data Transfer Agreement before data collection is initiated. For cases where submitters are not legally permitted to allow commercial use of their data please see TCIA Non-Commercial Data Submission Agreement. NCI and TCIA strongly discourages prohibiting commercial use as this significantly hinders those who wish to use TCIA data to translate research into clinical patient care. See our Data Usage Policies and Restrictions page for more information about the rules for downstream use of these data.
Data Curation Overview
All data is fully de-identified in accordance with international standards, US laws and UAMS IRB protocol requirements. TCIA then makes the data freely and openly available under the Creative Commons licensing as shown below:
“This collection is freely available to browse, download, and use for commercial, scientific and educational purposes as outlined in the Creative Commons Attribution 3.0 Unported License. See TCIA's Data Usage Policies and Restrictions for additional details. Questions may be directed to email@example.com.”
Prior to uploading data to TCIA, the submitter must obtain IRB approval (or ethics board equivalent) from their institution allowing them to submit a de-identified version of their data to TCIA. The NCI and the submitting site IRB are jointly responsible for reviewing the consent under which data to be submitted to TCIA was originally collected.
All data is All data is anonymized to the fullest extent possible at the submitting site, and then encrypted prior to transmission to UAMS. All incoming data is captured in a quarantine system and treated as if it contains PHI. All TCIA personnel are trained in HIPAA regulations and procedures. TCIA servers are managed by UAMS IT as if they were UAMS clinical systems. Once the full analysis and de-identification is complete, data is moved to a separate public repository and made available to the research community. This process has been reviewed by the UAMS Chief Security Officer.
A TCIA submission expert will work with an Imaging point of contact from your site to receive your data. The submission expert will provide instructions to clean common locations where PHI might exist (e.g. file names, slide labels) and a link to upload the data into our secure UAMS Box system. Upon receipt the slides are visually reviewed for burned in PHI, incorrectly labeled slides, scan quality issues and ensuring filenames match labels. Metadata fields are also reviewed to ensure they contain no PHI. Slide types supported include: Aperio (.svs, .tif), Hamamatsu (.vms, .vmu, .ndpi), Leica (.scn), MIRAX (.mrxs), Philips (.tiff), Sakura (.svslide), Trestle (.tif), Ventana (.bif, .tif) and Generic tiled TIFF (.tif).
A TCIA submission expert will work with an Imaging point of contact from your site. The expert will provide all the required tools for de-identifying and sending your imaging data and will answer any questions you have throughout the process. These tools have been approved by NIH and comply with the Digital Imaging and Communications in Medicine (DICOM) international standard for medical image de-identification. Following industry best practices, TCIA uses a standards-based approach to de-identification of DICOM images and non-image data to insure that all data make publicly available are free of protected health information (PHI). The TCIA de-identification process ensures that the HIPAA de-identification standard is met by following the Safe Harbor Method as defined in section 164.514(b)(2) of the HIPPA Privacy Rule utilizing the following steps:
- Clinical Trials Processor (CTP)
- CTP is a stand-alone image processing application for DICOM (radiology) data. Our curation teams provide this tool to submitters in order to perform the initial de-identification of DICOM images before they are transferred to TCIA.
- CTP Wizard
- CTP Wizard is a custom extension of CTP developed by TCIA. It provides an interactive graphical user interface which can be used to more easily walk submitters through the process of importing, de-identifying, and exporting their DICOM data.
- Posda is an open source application for the archival and curation of DICOM datasets. After receiving the data from submitters our curation teams use POSDA to perform additional quality checks and ensure all data was completely de-identified. It allows users to import DICOM data while tracking date and time received. Users can also prioritize multiple data submission streams based on assigned priority, identify and resolve duplicate unique identifiers (UIDs) submitted with different image or metadata, and check and edit data for DICOM conformance, consistency, and referential integrity.
DICOM De-identification Details
Following industry best-practices, TCIA uses a standards-based approach to de-identification of DICOM images to insure that images are free of protected health information (PHI). The TCIA de-identification process ensures that the HIPAA de-identification standard is met by following the Safe Harbor Method as defined in section 164.514(b)(2) of the HIPAA Privacy Rule. The standard for de-identification of DICOM objects is defined by Attribute Confidentiality Profile – DICOM PS 3.15: Appendix E. At the submitting site, a DICOM PS 3.15 compliant script removes or modifies DICOM tags deemed to be unsafe (See table 1 for a complete listing). TCIA incorporates the “Basic Application Confidentiality Profile” which is amended by inclusion of the following profile options: Clean Pixel Data Option, Clean Descriptors Option, Retain Longitudinal With Modified Dates Option, Retain Patient Characteristics Option, Retain Device Identity Option, and Retain Safe Private Option. The de-identification rules applied to each object are recorded by TCIA in the DICOM sequence Method Code Sequence [0012,0063] by entering the Code Value, Coding Scheme Designator, and Code Meaning for each profile and option that were applied to the DICOM object during de-identification. The DICOM standard for de-identification of objects defines a minimum set of elements to de-identify to be in compliance with the standard. It is up to the user doing the de-identification to insure that PHI is removed or cleaned according to the laws and practices in place at the time de-identification occurs.
The following table details the de-identification performed at the submitting site by way of a TCIA supplied de-identification script. All other tags not mentioned in the table below are reviewed and cleaned if necessary during our Posda curation.
Table 1 - DICOM Tags Modified or Removed at the source site (18 forms of PHI as currently defined by the Safe Harbor Method, DICOM PS 3.15 compliant)