Finding labeled datasets on TCIA

Many collections on TCIA contain "labels" which can be used for training and testing artificial intelligence models. However, users who are not familiar with medical imaging may benefit from assistance learning how to identify and utilize data types suitable for common deep learning tasks such as image classification, object detection, and object segmentations. This page seeks to summarize key information about finding image labels that may not be obvious to researchers who do not have a background in radiology or histopathology. It also provides a host of useful links from third party sources which could be useful to data scientists who are new to working with medical images.

Object Segmentation

Many collections on TCIA contain some kind of "ground truth" labels which provide information about object location and boundaries in the images. In radiology images these are typically created by one or more radiologists hand-drawing boundaries around objects such as the patient's tumor(s) or organs on each image. These kinds of data can be shared using a few different file formats.

DICOM provides support for these kinds of data using SEG and RTSTRUCT modalities.
Many popular open source tools export these labels in other formats. Popular formats include NIFTI, NRRD, and MHA.

On TCIA you can find these data in a couple of ways.

For Collections datasets you can look for SEG / RTSTRUCT in the modality column to determine where DICOM segmentations or contours are available. You can also filter for "Image Analyses" in the supporting data column. If a collection says "Image Analyses" but does not include SEG or RTSTRUCT in the modality this is typically because the analysis was in some other format. This could be segmentation data in NIFTI/NRRD/MHA formats, but it might also represent some other kind of analysis such as image classification.
For Analysis Results of existing TCIA collections it is a bit more straightforward. Simply use the filter above the table to search for "segmentations" which will find any instance of this in the Analysis Artifacts column.

Image classification

TCIA includes a wealth of non-image data which could be utilized for image classification purposes.

Clinical data (outcomes, stage)
Distinguishing between cancer types (lgg vs gbm)
Genomic/Proteomic subtypes

Suggested Deep Learning Parameters for TCIA Result Submission

Deep Learning parameters are critical for researchers to reproduce Deep Learning experiments. However, it is usually the author's discretion for the completeness and format of reported parameters in manuscripts. This document proposes a list of essential Deep Learning parameters to be included in TCIA results submission process. The goal is to explicitly capture these parameters in a common format such that TCIA users can easily reproduce and compare analysis results from TCIA data.

List of Deep Learning Parameters

Deep Neural Network (DNN) Name - for example, VGG16, ResNet-101, UNet, etc., or a link to GitHub repository or manuscript for customized DNNs if applicable.
Data Augmentation Methods - for example, color augmentation (HSV or RGB color space), transformation, noise, GAN, patch generation, downsizing parameters, etc.
Training, Validation, and Testing Set Configuration - for example number of samples per each set, total number of samples, etc.
Hyperparameters - for example, learning rate, early stopping, batch size, number of epochs, etc.
Training Statistics - for example, wall time spent in training, accuracy metrics such as if average score or best score is reported, etc.
Training Environment - for example, GPU type, Deep Learning framework used such as TensorFlow/PyTorch, number of GPUs, number of nodes, etc

Other useful stuff

https://www.youtube.com/watch?v=-XUKq3B4sdw - how a radiologist interprets lung CTs?
https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial - how to pre-process images for deep learning
https://theaisummer.com/medical-image-coordinates/ - DICOM deep learning for medical imaging novices
https://developer.nvidia.com/clara-medical-imaging - NVIDIA package for simplifying deep learning tasks in medical imaging
https://forums.fast.ai/t/fastai-v2-has-a-medical-imaging-submodule/56117 - FastAI package for simplifying deep learning in medical imaging
"TCIA as a Centralized Data Resource for Development of AI" from RSNA 2019
https://www.kaggle.com/marcovasquez/basic-eda-data-visualization - RSNA intracranial hemorrhaging guide
https://github.com/RSNA/AI-Deep-Learning-Lab - RSNA 2019 deep learning course
https://github.com/RSNA/MagiciansCorner - Notebooks, datasets, other content for the Radiology:AI series known as Magicians Corner by Brad Erickson

Space shortcuts

Child pages

Finding labeled datasets on TCIA

Object Segmentation

Image classification

Suggested Deep Learning Parameters for TCIA Result Submission

List of Deep Learning Parameters

Other useful stuff

Space shortcuts

Child pages

Deep learning and AI

Finding labeled datasets on TCIA

Object Segmentation

Image classification

Suggested Deep Learning Parameters for TCIA Result Submission

List of Deep Learning Parameters

Other useful stuff