Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Supervised machine learning (ML) algorithms require labeled data for algorithm training and validation.  What represents labeled data is a complex question that depends on the problem being addressed.  For example, an algorithm designed to analyze screening chest CT images and estimate the probability of a positive screening result (probability the person has cancer) may only need an outcome measure such as pathology confirmed cancer diagnosis, whereas a researcher focused on improving image segmentation would need accurate expert segmentations of the appropriate regions of interest.  TCIA continues to improve its support for Artificial Intelligence and Machine Learning based research by asking data submitters for more detailed information and improving our approaches to help researchers find the data they need. 

Finding annotated datasets on TCIA

...

TCIA includes a wealth of non-image data that could be utilized for image classification purposes.  

  1. Image Analyses - Filter the table for "Image Analyses" on the Browse Collections page to find datasets with this type of information.  Image Analyses could include expert-derived image annotations (e.g. Where is the tumor located? What is the shape of the tumor?) or quantitative imaging features (e.g. What is the tumor volume? What is the texture of the tumor?).  In the case of "radiomics" and other quantitative imaging features it is critical to use standardized image feature definitions such as those outlined in this publication.  
  2. Clinical data (e.g. outcomes, stage) - Filter the table for "clinical" on the Browse Collections page to find datasets with this type of information
  3. Distinguishing between cancer types (e.g. low grade vs high grade gliomas) - Cancer Type is one of the columns on the Browse Collections, making it easy to filter or search for datasets based on this criteria.
  4. Genomic/Proteomic subtypes - Filter the table for "genomics" or "proteomics" on the Browse Collections page to find datasets with this type of information.  In most cases you will need to retrieve specific details about the patients' genomic/proteomic from external databases such as NCI's Genomic Data Commons or Proteomic Data Commons.  Please note these websites are not supported by TCIA staff, but we do coordinate with the teams that operate these archives to ensure common patient identifiers are used which enable you to link these data to TCIA images.

...

  1. Deep Neural Network (DNN) Name - for example, VGG16, ResNet-101, UNet, etc., or a link to GitHub repository or manuscript for customized DNNs if applicable.
  2. Data Augmentation Methods - for example, color augmentation (HSV or RGB color space), transformation, noise, GAN, patch generation, downsizing parameters, etc.
  3. Training, Validation, and Testing Set Configuration - for example number of samples per each set, total number of samples, etc.
  4. Hyperparameters - for example, learning rate, early stopping, batch size, number of epochs, etc.
  5. Training Statistics - for example, wall time spent in training, accuracy metrics such as if average score or best score is reported, etc.
  6. Training Environment - for example, GPU type, Deep Learning framework used such as TensorFlow/PyTorch, number of GPUs, number of nodes, etc

We also encourage you to review the following papers:

Info
titlePublication Citation
Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. John Mongan, Linda Moy, and Charles E. Kahn, Jr. Radiology: Artificial Intelligence 2020 2:2


Info
titlePublication Citation

The T.R.U.E. checklist for identifying impactful AI-based findings in nuclear medicine: is it True? Is it Reproducible? Is it Useful? Is it Explainable? Irene Buvat, Fanny Orlhac.

Third party tips and tutorials for applying deep learning to medical imaging data

...