Page History

Summary

ExcerptOur

This dataset consists of 140 computed tomography (CT) scans, each with five organs labeled in 3D: lung, bones, liver, kidneys and bladder. The brain is also labeled on the minority of scans which show it.

Patients were included based on the presence of lesions in one or more of the labeled organs. Most of the images exhibit liver lesions, both benign and malignant. Some also exhibit metastatic disease in other organs such as bones and lungs.

The images come from a wide variety of sources, including abdominal and full-body; contrast and non-contrast; low-dose and high-dose CT scans. 130 images are dedicated CTs, the remaining 10 are the CT component taken from PET-CT exams. This makes the dataset ideal for training and evaluating organ segmentation algorithms, which ought to perform well in a wide variety of imaging conditions.

The

dataset is

dataset includes large and easily-located organs such as the lungs, as well as small and difficult ones like the bladder. We hope the dataset will enable widespread adoption of multi-class organ segmentation, as well as competitive benchmarking of algorithms for it.

The data are divided into a

training

testing set

consisting

of

130

21 CT scans, and a

testing

training set

constisting

of the remaining

10

119. For the training set, the lungs and bones were automatically segmented by morphological image processing

algorithms

.

The source code for these algorithms will be made publicly available.

For the testing set, the lungs and bones were segmented manually

by a human reader

. All other organs were segmented manually in both the training and testing sets. Manual segmentations were done with ITK-SNAP (https://www.itksnap.org), starting with semi-automatic active contour segmentation followed by manual clean-up.

Deep learning has the potential to make enormous advances in medical imaging analysis, but training these models requires large, diverse, painstakingly-annotated datasets. With 140 CT scans from a variety of sources, our dataset would be one of the largest of its kind in TCIA. To our knowledge, ours is the only one to include annotations of 5 different organ classes. Our dataset includes large and easily-located organs such as the lungs, as well as small and difficult ones like the bladder. We hope the dataset will enable widespread adoption of multi-class organ segmentation and competitive benchmarking of various computational approaches.
The creators used the dataset to successfully train a CT organ segmenter which is in active use in research projects at Stanford University.

The source code for the morphological algorithms is available at:
- https://github.com/bbrister/ctOrganSegmentation.git

Many images were borrowed from the Liver Tumor Segmentation (LiTS) challenge, which the organizers have generously allowed us to distribute. For more information, see the following website and paper:
- https://lits-challenge.com
- Arxiv [1901.04056] The Liver Tumor Segmentation Benchmark (LiTS) (https://arxiv.org/abs/1901.04056)

Acknowledgements

This work was supported in part by grants from the National Cancer Institute, National Institutes of Health, 1U01CA190214 and 1U01CA187947.

Localtab Group

Localtab

active	true
title	Data Access

Data Access

Click the Download button to save a ".tcia" manifest file to your computer, which you must open with the NBIA Data Retriever. Click the Search button to open our Data Portal, where you can browse the data collection and/or download a subset of its contents.

Data Type	Download all or Query/Filter
Images (16.9 GB across 4 zips)	Download volumes 0-49 (4.7 GB) from Box Download volumes 50-99 (5.3 GB) from Box Download volumes 100-139 (6.6 GB) from Box Download labels and README (259 MB) from Box
Supplemental Data (format)

Click the Versions tab for more info about data releases.

Localtab

title	Detailed Description

Detailed Description

Image Statistics
Modalities	NIfTI CT and segmentations
Number of Patients	140
Number of Studies	140
Number of Series	280
Number of Images	280
Images Size (GB)	16.9 GB

CTs and segmentations are saved in Nifti-1 (.nii.gz) format. Each Nifti-1 file stores the entire CT volume in Hounsfield units. Segmentations are in patient-native space (no change in registration).

Note: several volumes appear to be left-right flipped relative to others. Please contact the authors or help@cancerimagingarchive.net if this causes confusion.

Source code will be provided as a link to Github, in Matlab and C languages

Scripts for finding the bones and lungs are

The source code for the morphological algorithm (bone and lung segmentation) is available here: https://github.com/bbrister/ctOrganSegmentation.git

Please explore README.txt that is also bundled in the zip with the label files.

Info

title	README content

DATA FORMAT

All files are stored in Nifti-1 format with 32-bit floating point data.

Images are stored as 'volume-XX.nii.gz' where XX is the case number.
All images are CT scans, under a wide variety of imaging conditions including high-dose and low-dose, with and without contrast, abdominal, neck-to-pelvis and whole-body. Many patients exhibit cancer lesions, especially in the liver, but they were not selected according to any specific disease criteria. Numeric values are in Hounsfield units.

Segmentations are stored as 'labels-XX.nii.gz', where XX is the same number as the corresponding volume file.

Organs are encoded as follows:

0: Background (None of the following organs)

1: Liver

2: Bladder

3: Lungs

4: Kidneys

5: Bone

6: Brain

TEST AND TRAIN SPLITS

All organ masks were generated either (A) semi-automatically using ITK-SNAP, or (B) automatically using morphological algorithms. ITK-SNAP is a popular open-source program for medical image segmenationsegmentation. Semi-automatic segmentation consists of manual editing with the 3D paintbrush tool, followed by refinement with active contours.

The first 21 volumes (case numbers 0-20) constitute the TESTING split. All organs in these volumes have been labeled with method (B). Bones were first labeled with method (A), then the result was refined with method (B).

The remaining volumes constitute the TRAINING split. For these volumes, both lungs and bones were labeled with method (B). These masks suffice for training a deep neural network, but should not be considered reliable for evaluation.

All other organs were labeled with method (A) for both the training and testing splits. For these organs, there is no difference in label accuracy between the two splits.

CREDITS

These data were annotated between 2018-2019 by:
-Blaine Rister
-Kaushik Shivakumar

131 of the of the original images came from the were borrowed from the Liver Tumor Segmentation (LiTS) challenge, which the organizers have generously allowed us to distribute. For more information, see the following website and paper:
- https://lits-challenge.com
- Arxiv [1901.04056] The Liver Tumor Segmentation Challenge Benchmark (LiTS) (https://arxiv.org/abs/1901.04056)

Please see the challenge website (https://competitions.codalab.org/competitions/17094) for the credits for these images. Most of the liver masks for these images also came from this challengeLiTS, although some were annotated by the above.
9 additional Additional CT images were added from PET-CT patients from Stanford Healthcare, so that this additional imaging modality could be represented in the training and evaluation data.

Please direct questions to Blaine Rister by email at blaine@stanford.edu .

CITATIONS

Please refer to the following paper to cite this data:
- Arxiv [1901.04056] The Liver Tumor Segmentation Benchmark (LiTS) (https://arxiv.org/abs/1901.04056)

Localtab

title	Citations & Data Usage Policy

Citations & Data Usage Policy

Add any special restrictions in here.

These collections are freely available to browse, download, and use for commercial, scientific and educational purposes as outlined in the Creative Commons Attribution 3.0 Unported License. Questions may be directed to help@cancerimagingarchive.net. Please be sure to acknowledge both this data set and TCIA in publications by including the following citations in your work:

Info

title	Data Citation

Blaine Rister, Kaushik Shivakumar, Tomomi Nobashi and Daniel L. Rubin. (2019)

Info

title	Acknowledgement

CT organ segmentation using GPU data augmentation, unsupervised labels and IOU loss. Blaine Rister, Darvin Yi, Kaushik Shivakumar, Tomomi Nobashi and Daniel L. Rubin. https://arxiv.org/abs/1811.11226
Arxiv [1901.04056] The Liver Tumor Segmentation Benchmark (LiTS) https://arxiv.org/abs/1901.04056

Info

title	TCIA Citation

Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7

Other Publications Using This Data

TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.

Localtab

title	Versions

Version X (Current): Updated yyyy/mm/dd

Data Type	Download all or Query/Filter
Images (DICOM, xx.x GB)	(Requires NBIA Data Retriever.)
Clinical Data (CSV)	Link
Other (format)

Added new subjects.

Version 1: Updated 2018/10/24

Added new subjects.

Space shortcuts

Child pages

Versions Compared

Old Version 8

New Version 9

Key

Summary

Data Access

Detailed Description

CTs and segmentations are saved in Nifti-1 (.nii.gz) format. Each Nifti-1 file stores the entire CT volume in Hounsfield units. Segmentations are in patient-native space (no change in registration).

Source code will be provided as a link to Github, in Matlab and C languages

Citations & Data Usage Policy

Other Publications Using This Data

Version X (Current): Updated yyyy/mm/dd

Version 1: Updated 2018/10/24