Finding labeled datasets on TCIA
two major kinds of "labels", segmentations (pixel data) and classifications (clinical/omic data)
SEG/RTSTRUCT for segmentations from home page, image analyses/clinical/omics for classifications from home page.
...
Many collections on TCIA contain "labels" which can be used for training and testing artificial intelligence models. However, users who are not familiar with medical imaging may benefit from assistance learning how to identify and utilize data types suitable for common deep learning tasks such as image classification, object detection, and object segmentations. This page seeks to summarize key information about finding image labels that may not be obvious to researchers who do not have a background in radiology or histopathology.
Object Segmentation
Many collections on TCIA contain some kind of "ground truth" labels which provide information about object location and boundaries in the images. In radiology images these are typically created by one or more radiologists hand-drawing boundaries around objects such as the patient's tumor(s) or organs on each image. These kinds of data can be shared using a few different file formats.
- DICOM provides support for these kinds of data using SEG and RTSTRUCT modalities.
- Many popular open source tools export these labels in other formats. Popular formats include NIFTI, NRRD, and MHA.
Image classification
TCIA includes a wealth of non-image data which could be utilized for image classification purposes.
- Clinical data (outcomes, stage)
- Distinguishing between cancer types (lgg vs gbm)
- Genomic/Proteomic subtypes
Suggested Deep Learning Parameters for TCIA Result Submission
...
- https://www.youtube.com/watch?v=-XUKq3B4sdw - how a radiologist interprets lung CTs?
- https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial - how to pre-process images for deep learning
- https://developer.nvidia.com/clara-medical-imaging
- https://forums.fast.ai/t/fastai-v2-has-a-medical-imaging-submodule/56117