CT volumes with multiple organ segmentations

Summary

Our dataset consists of 140 computed tomography (CT) scans, each with five organs labeled in 3D: lung, bones, liver, kidneys and bladder.

Patients were included based on the presence of lesions in one or more of the labeled organs. Most of the images exhibit liver lesions, both benign and malignant. Some also exhibit metastatic disease in other organs such as bones and lungs.

The images come from a wide variety of sources, including abdominal and full-body; contrast and non-contrast; low-dose and high-dose CT scans. 130 images are dedicated CTs, the remaining 10 are the CT component taken from PET-CT exams. This makes the dataset ideal for training and evaluating organ segmentation algorithms, which ought to perform well in a wide variety of imaging conditions.

The dataset is divided into a training set consisting of 130 CT scans, and a testing set constisting of the remaining 10. For the training set, the lungs and bones were automatically segmented by morphological image processing algorithms. The source code for these algorithms will be made publicly available. For the testing set, the lungs and bones were segmented manually by a human reader. All other organs were segmented manually in both the training and testing sets. Manual segmentations were done with ITK-SNAP, starting with semi-automatic active contour segmentation followed by manual clean-up.

Deep learning has the potential to make enormous advances in medical imaging analysis, but training these models requires large, diverse, painstakingly-annotated datasets. With 140 CT scans from a variety of sources, our dataset would be one of the largest of its kind in TCIA. To our knowledge, ours is the only one to include annotations of 5 different organ classes. Our dataset includes large and easily-located organs such as the lungs, as well as small and difficult ones like the bladder. We hope the dataset will enable widespread adoption of multi-class organ segmentation and competitive benchmarking of various computational approaches.

The creators used the dataset to successfully train a CT organ segmenter which is in active use in research projects at Stanford University.

Acknowledgements

This work was supported in part by grants from the National Cancer Institute, National Institutes of Health, 1U01CA190214 and 1U01CA187947.

Data Access

Click the Download button to save a ".tcia" manifest file to your computer, which you must open with the NBIA Data Retriever. Click the Search button to open our Data Portal, where you can browse the data collection and/or download a subset of its contents.

Data Type	Download all or Query/Filter
Images (16.9 GB across 4 zips)	Download volumes 0-49 (4.7 GB) from Box Download volumes 50-99 (5.3 GB) from Box Download volumes 100-139 (6.6 GB) from Box Download labels and README (259 MB) from Box
Supplemental Data (format)

Click the Versions tab for more info about data releases.

Detailed Description

Image Statistics
Modalities	NIfTI CT and segmentations
Number of Patients	140
Number of Studies	140
Number of Series	280
Number of Images	280
Images Size (GB)	16.9 GB

CTs and segmentations are saved in Nifti-1 (.nii.gz) format. Each Nifti-1 file stores the entire CT volume in Hounsfield units. Segmentations are in patient-native space (no change in registration).

Note: several volumes appear to be left-right flipped relative to others. Please contact the authors or help@cancerimagingarchive.net if this causes confusion.

Source code will be provided as a link to Github, in Matlab and C languages

Scripts for finding the bones and lungs are here: https://github.com/bbrister/ctOrganSegmentation

Please explore README.txt that is also bundled in the zip with the label files.

README content

DATA FORMAT

All files are stored in Nifti-1 format with 32-bit floating point data.

Images are stored as 'volume-XX.nii.gz' where XX is the case number.
All images are CT scans, under a wide variety of imaging conditions including high-dose and low-dose, with and without contrast, abdominal, neck-to-pelvis and whole-body. Many patients exhibit cancer lesions, especially in the liver, but they were not selected according to any specific disease criteria. Numeric values are in Hounsfield units.

Segmentations are stored as 'labels-XX.nii.gz', where XX is the same number as the corresponding volume file. Organs are encoded as follows:

0: Background (None of the following organs)

1: Liver

2: Bladder

3: Lungs

4: Kidneys

5: Bone

6: Brain

TEST AND TRAIN SPLITS

All organ masks were generated either (A) semi-automatically using ITK-SNAP, or (B) automatically using morphological algorithms. ITK-SNAP is a popular open-source program for medical image segmenation. Semi-automatic segmentation consists of manual editing with the 3D paintbrush tool, followed by refinement with active contours.

The first 21 volumes (case numbers 0-20) constitute the TESTING split. All organs in these volumes have been labeled with method (B). Bones were first labeled with method (A), then the result was refined with method (B).

The remaining volumes constitute the TRAINING split. For these volumes, both lungs and bones were labeled with method (B). These masks suffice for training a deep neural network, but should not be considered reliable for evaluation.

All other organs were labeled with method (A) for both the training and testing splits. For these organs, there is no difference in label accuracy between the two splits.

CREDITS

These data were annotated between 2018-2019 by:
-Blaine Rister
-Kaushik Shivakumar

131 of the original images came from the Liver Tumor Segmentation Challenge (LiTS). Please see the challenge website (https://competitions.codalab.org/competitions/17094) for the credits for these images. Most of the liver masks for these images came from this challenge, although some were annotated by the above.
9 additional images were added from PET-CT patients from Stanford Healthcare, so that this additional imaging modality could be represented in the training and evaluation data.

Please direct questions to Blaine Rister by email at blaine@stanford.edu .

CITATIONS

Please refer to the following paper to cite this data:
- Arxiv [1901.04056] The Liver Tumor Segmentation Benchmark (LiTS) (https://arxiv.org/abs/1901.04056)

Citations & Data Usage Policy

Add any special restrictions in here.

These collections are freely available to browse, download, and use for commercial, scientific and educational purposes as outlined in the Creative Commons Attribution 3.0 Unported License. Questions may be directed to help@cancerimagingarchive.net. Please be sure to acknowledge both this data set and TCIA in publications by including the following citations in your work:

Data Citation

Blaine Rister, Kaushik Shivakumar, Tomomi Nobashi and Daniel L. Rubin. (2019)

Acknowledgement

CT organ segmentation using GPU data augmentation, unsupervised labels and IOU loss. Blaine Rister, Darvin Yi, Kaushik Shivakumar, Tomomi Nobashi and Daniel L. Rubin. https://arxiv.org/abs/1811.11226
Arxiv [1901.04056] The Liver Tumor Segmentation Benchmark (LiTS) https://arxiv.org/abs/1901.04056

TCIA Citation

Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7

Other Publications Using This Data

TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.

Version X (Current): Updated yyyy/mm/dd

Data Type	Download all or Query/Filter
Images (DICOM, xx.x GB)	(Requires NBIA Data Retriever.)
Clinical Data (CSV)	Link
Other (format)

Added new subjects.

Version 1: Updated 2018/10/24