Skip to end of metadata
Go to start of metadata

Summary

The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.

Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases.  Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.

 

 

Data Access

Choosing the Download option will provide you with a file to launch the TCIA Download Manager to download the entire collection. If you want to browse or filter the data to select only specific scans/studies please use the Search By Collection option.

Data TypeDownload all or Query/Filter
Images (DICOM, 124GB) 
Radiologist Annotations/Segmentations (XML)
Nodule Size List (web)
Nodule Counts by Patient (XLS)
Patient Diagnoses (XLS)

Click the Versions tab for more info about data releases.

Detailed Description

Collection Statistics

updated 3/21/2012

Modalities

CT (computed tomography)
DX (digital radiography) 
CR (computed radiography) 

Number of Patients

1,010

Number of Studies

1,308

Number of Series

1,018 CT

290 CR/DX

Number of Images

244,527

Image Size (GB)124

 

Reader Annotation and Markup

These links help describe how to use the .XML annotation files which are packaged along with the images in The Cancer Imaging Archive.  The option to include annotation files in the download is enabled by default, so the XML described here will be included when downloading the LIDC-IDRI images unless you specifically uncheck this option.  If you are only interested in the XML files or you have already downloaded the images you can obtain them here:

The following documentation explains the format and other relevant information about the XML annotation and markup files:

Annotation and Markup Issues/Comments

  1. Please note that it is not currently possible to visualize this annotation and markup on top of the images themselves. The Cancer Imaging Program is exploring the possibility of converting this XML into a standardized format so that it could be visualized in commonly available workstations, but no definite estimate of completion is available at this time.
  2. For a subset of approximately 100 cases from among the initial 399 cases released, inconsistent rating systems were used among the 5 sites with regard to the spiculation and lobulation characteristics of lesions identified as nodules > 3 mm. The XML nodule characteristics data as it exists for some cases will be impacted by this error. We apologize for any inconvenience.
  3. Also note that the XML files do not store radiologist annotations in a manner that allows for a comparison of individual radiologist reads across cases (i.e., the first reader recorded in the XML file of one CT scan will not necessarily be the same radiologist as the first reader recorded in the XML file of another CT scan).
  4. March 2010: Contrary to previous documentation, the correct ordering for the subjective nodule lobulation and nodule spiculation rating scales stored in the XML files is 1=none to 5=marked. The issue of consistency noted above still remains to be corrected.
  5. Note: On 2012-03-21 the XML associated with patient LIDC-IDRI-0101 was updated with a corrected version of the file.

Nodule-Specific Details

Diagnosis Data

For a limited set of cases, LIDC sites were able to identify diagnostic data associated with the case. 

  • tcia-diagnosis-data-2012-04-20.xls
  • Note: This project has concluded and we are not able to obtain any additional diagnosis data beyond what is available in the above link.

Data was collected for as many cases as possible and is associated at two levels:

  1. Diagnosis at the patient level (diagnosis is associated with the patient)
  2. Diagnosis at the nodule level (where possible)

At each level, data was provided as to whether the nodule was:

  1. Unknown (no data is available)
  2. Benign or non-malignant disease
  3. A malignancy that is a primary lung cancer
  4. A metastatic lesion that is associated with an extra-thoracic primary malignancy

For each lesion, there is also information provided as to how the diagnosis was established including options such as:

  1. unknown - not clear how diagnosis was established
  2. review of radiological images to show 2 years of stable nodule
  3. biopsy
  4. surgical resection
  5. progression or response

Software

MAX

MAX ("multi-purpose application for XML") performs nodule matching and pmap generation based on the XML files provided with the LIDC/IDRI Database. It also performs certain QA and QC tasks and other XML-related tasks.

MAX is written in Perl and was developed under RedHat Linux. It has been run under Windows.

Downloading MAX and its associated files implies acceptance of the following notice (also available here and in the distro as a text file):

DISCLAIMER: MAX is not guaranteed to process all input correctly. Possible errors include (but are not limited to) the inability to process correctly some types of nodule ambiguity (where nodule ambiguity refers to overlap between nodule markings having complicated shapes or to overlap between a nodule marking and a non-nodule mark).

Download the distro (max-V107.tgz); view/download ReadMe.txt (a text file that is also included in the distro).

LIDC 2 Image Toolbox (Matlab)

This tool is a community contribution developed by Thomas Lampert.  It is designed for extracting individual annotations from the XML files and converting them, and the DICOM images, into TIF format for easier processing in Matlab (LIDC-IDRI dataset).  It is available for download from: https://sites.google.com/site/tomalampert/code

Citations & Data Usage Policy 

This collection is freely available to browse, download, and use for commercial, scientific and educational purposes as outlined in the Creative Commons Attribution 3.0 Unported License.  See TCIA's Data Usage Policies and Restrictions for additional details. Questions may be directed to help@cancerimagingarchive.net.

Please be sure to include the following citations and attributions in your work if you use this data set:

LIDC-IDRI Citation

Icon

Smith K, Clark K, Bennett W, Nolan T, Kirby J, Wolfsberger M, Moulton J, Vendt B, Freymann J.  Data from LIDC-IDRI.  http://dx.doi.org/10.7937/K9/TCIA.2015.LO9QL9SX

TCIA Citation

Icon

Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. (paper)

Publication Citation

Icon

Armato SG III, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, van Beek EJR, Yankelevitz D, et al.:  The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38: 915--931, 2011.   (paper)

In addition, please be sure to include the following attribution in any publications or grant applications along with references to appropriate LIDC publications:

The authors acknowledge the National Cancer Institute and the Foundation for the National Institutes of Health, and their critical role in the creation of the free publicly available LIDC/IDRI Database used in this study.

Other Publications Using This Data

See the LIDC-IDRI section on our Publications page for other work leveraging this collection. If you have a publication you'd like to add please contact the TCIA Helpdesk.

Version 3 (Current): Updated 2015/07/27

Data TypeDownload all or Query/Filter
Images (DICOM, 124GB) 
Radiologist Annotations/Segmentations (XML)
Nodule Size List (web)
Nodule Counts by Patient (XLS)
Patient Diagnoses (XLS)

 

Prior to 7/27/2015, many of the series in the LIDC-IDRI collection,had inconsistent values in the DICOM Frame of Reference UID, DICOM tag (0020,0052).  Each image had a unique value for Frame of Reference (which should be consistent across a series).  This has been corrected.  In addition, the following tags, which were present (but should not have been), were removed: (0020,0200) Synchronization Frame of Reference, (3006,0024) Referenced Frame of Reference, and (3006,00c2) Related Frame of Reference.

Version 2: Updated 2012/03/21

On 2012-03-21 the XML associated with patient LIDC-IDRI-0101 was updated with a corrected version of the file. The old version is still available if needed for audit purposes.

Version 1:

There was a "pilot release" of 399 cases of the LIDC CT data via the NCI CBIIT installation of NBIA. The LIDC-IDRI collection contained on TCIA is the complete data set of all 1,010 patients which includes all 399 pilot CT cases plus the additional 611 patient CTs and all 290 corresponding chest x-rays. A table which allows mapping between the old NBIA IDs and new TCIA IDs can be downloaded for those who have obtained and analyzed the older data.

 For a subset of approximately 100 cases from among the initial 399 cases released, inconsistent rating systems were used among the 5 sites with regard to the spiculation and lobulation characteristics of lesions identified as nodules > 3 mm. The XML nodule characteristics data as it exists for some cases will be impacted by this error. We apologize for any inconvenience.

 Contrary to previous documentation (prior to March 2010), the correct ordering for the subjective nodule lobulation and nodule spiculation rating scales stored in the XML files is 1=none to 5=marked. The issue of consistency noted above still remains to be corrected.

 

  • No labels