Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
Table of Contents
maxLevel1

Overview

The Cancer Imaging Archive (TCIA) is a publisher of cancer related data. Each Collection TCIA publishes is issued a Digital Object Identifier (DOI) through DataCite so their contents become discoverable and associated metadata is made available to the community. DataCite provides a REST API that can be used to search metadata according to their published schema. DataCite.  The DataCite Commons is a web search interface for the PID Graph, the graph formed by the collection of scholarly resources such as publications, datasets, people and research organizations, and their connections. 

The DataCite REST API can be used to programmatically access Collection metadata such as their DOIs, titles and abstracts.  Please note that this API was not developed by TCIA and is not supported through the TCIA help desk. Please refer to the Documentation below for how to use the DataCite REST API to query TCIA metadataTCIA. See https://support.datacite.org/ for any technical questions.  The TCIA Helpdesk may be able to assist if your inquiry is related to the content of the data itself.

Official Datacite Documentation

The Cancer Imaging Archive is identified within DataCite as

  • prefix-id 10.7937
  • client-id = sml.tcia
  • provider-id = tciar

Properties

TCIA Metadata in DataCite

TCIA utilizes the following Properties of the DataCite schema.

Table 1: DataCite Mandatory Properties ID 
Property IDProperty Property Description
Identifier (
Identifier DOI of the Dataset
)
Creator (
Creator Authors of the Dataset, preferably with ORCIDID
))
Title  
Title (
Published Title of the Dataset
Publisher (
Publisher The Cancer Imaging Archive
)
PublicationYear 
PublicationYear (
The Year the Dataset was published in TCIA
)
10 ResourceType  
ResourceType (Table 2: DataCite Recommended and Optional Properties ID Property 11AlternateIdentifier (
Dataset; Equivalent to a TCIA Collection
11 *AlternateIdentifier 
TCIA "Short Name" for the Dataset.  These short names appear in various places such as https://www.cancerimagingarchive.net/collections/ and https://www.cancerimagingarchive.net/tcia-analysis-results/
)
15 *
Version (
Version The Current Version of the Dataset
)
16 *Rights 
Rights (
Licensing Information
)
17 *
17 
Description 
Description (Dataset Abstract)

...

Dataset Abstract

* indicates properties that are "Recommended and Optional" per the Datacite Schema whereas the others are required to create a DOI.

TCIA-Utils

The tcia_utils package contains functions to simplify common tasks one might perform when interacting with The Cancer Imaging Archive (TCIA) via Python.  Issues with this package should be submitted at https://github.com/kirbyju/tcia_utils/issuesInstallation can be achieved with this Pip command:

pip install tcia_utils

To import functions related to Datacite:

from tcia_utils import datacite

An example notebook demonstrating tcia_utils functionality with DataCite's API can be found at https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_DataCite_Queries.ipynb. 

Example Queries


Info
titleRetrieve a single DataCite record in JSON format.

For this example we are using a Published Collection called "Pseudo-PHI-DICOM-Data":

https://api.datacite.org/dois/10.7937/s17z-r072

...