Finding labeled datasets on TCIA

Many collections on TCIA contain "labels" which can be used for training and testing artificial intelligence models. However, users who are not familiar with medical imaging may benefit from assistance learning how to identify and utilize data types suitable for common deep learning tasks such as image classification, object detection, and object segmentations. This page seeks to summarize key information about finding image labels that may not be obvious to researchers who do not have a background in radiology or histopathology.

Object Segmentation

Many collections on TCIA contain some kind of "ground truth" labels which provide information about object location and boundaries in the images. In radiology images these are typically created by one or more radiologists hand-drawing boundaries around objects such as the patient's tumor(s) or organs on each image. These kinds of data can be shared using a few different file formats.

DICOM provides support for these kinds of data using SEG and RTSTRUCT modalities.
Many popular open source tools export these labels in other formats. Popular formats include NIFTI, NRRD, and MHA.

Image classification

TCIA includes a wealth of non-image data which could be utilized for image classification purposes.

Clinical data (outcomes, stage)
Distinguishing between cancer types (lgg vs gbm)
Genomic/Proteomic subtypes

Suggested Deep Learning Parameters for TCIA Result Submission

Deep Learning parameters are critical for researchers to reproduce Deep Learning experiments. However, it is usually the author's discretion for the completeness and format of reported parameters in manuscripts. This document proposes a list of essential Deep Learning parameters to be included in TCIA results submission process. The goal is to explicitly capture these parameters in a common format such that TCIA users can easily reproduce and compare analysis results from TCIA data.

List of Deep Learning Parameters

Deep Neural Network (DNN) Name - for example, VGG16, ResNet-101, UNet, etc., or a link to GitHub repository or manuscript for customized DNNs if applicable.
Data Augmentation Methods - for example, color augmentation (HSV or RGB color space), transformation, noise, GAN, patch generation, downsizing parameters, etc.
Training, Validation, and Testing Set Configuration - for example number of samples per each set, total number of samples, etc.
Hyperparameters - for example, learning rate, early stopping, batch size, number of epochs, etc.
Training Statistics - for example, wall time spent in training, accuracy metrics such as if average score or best score is reported, etc.
Training Environment - for example, GPU type, Deep Learning framework used such as TensorFlow/PyTorch, number of GPUs, number of nodes, etc

Other useful stuff

https://www.youtube.com/watch?v=-XUKq3B4sdw - how a radiologist interprets lung CTs?
https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial - how to pre-process images for deep learning
https://developer.nvidia.com/clara-medical-imaging
https://forums.fast.ai/t/fastai-v2-has-a-medical-imaging-submodule/56117

Space shortcuts

Child pages

Finding labeled datasets on TCIA

Object Segmentation

Image classification

Suggested Deep Learning Parameters for TCIA Result Submission

List of Deep Learning Parameters

Other useful stuff

Space shortcuts

Child pages

Deep learning

Finding labeled datasets on TCIA

Object Segmentation

Image classification

Suggested Deep Learning Parameters for TCIA Result Submission

List of Deep Learning Parameters

Other useful stuff