Child pages
  • Lung Image Database Consortium

LIDC XML file documentation

For the First 30 cases

Version 1.0

January 11, 2006

(NOTE BELOW ADDED Jan 15, 2009)

Please note:   We recently (Fall 2008) discovered that, for a subset of cases, inconsistent rating systems were used among the 5 sites with regard to the spiculation and lobulation characteristics of lesions identified as nodules > 3 mm.   The XML nodule characteristics data as it exists for some cases will be impacted by this error.  We apologize for any inconvenience and we are in the process of correcting this situation.  

LIDC team

Jan 2009

 

Index

I. Over v iew

II. LI D C Dat a Collection_Proc e ss

  1. Inc l usion Crit e ria
  2.          Communicatio n P r otocol
  3.             Categories o f Ob j ects to Mark
  4.          Results of the Mark i ng P rocess
  5.       Format and C o ntent of X ML Data Files

Appenxdix 1 Exa m ple I m a ge

Example XML file w i t h com m ents

  1. Overview (back to in d e x )

The purpose of this document is to describe in some detail the contents of the xml files containing reader annotations that were the results of the LIDC data collection/marking process.  This document will first give some background into the LIDC reader marking process (the blinded and unblinded reading process) so that the data in the xml files can be correctly interpreted. Next, the format and contents of the xml files themselves will be described. Finally, an example case of DICOM image, xml file and resulting presentation of the annotations as an overlay will be provided.

 

In the near future, links will be provided to tools or pieces of computer software code that have been developed either by the LIDC or by users of the database who have agreed to supply their own tools/code to the LIDC.

 

  1. LIDC Data Collection Process (back to in d ex )

 

The LIDC collected data under tightly specified inclusion criteria, communication protocols and markup protocols. Additional background information on the LIDC, its mission and several of its internal reports can be located either at http://imaging.cancer.gov/programsandresources/InformationSystems/LIDC  

 

or through the following publication:

 

Armato SG 3rd, McLennan G, McNitt-Gray MF, Meyer CR, Yankelevitz D, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, Reeves AP, Croft BY, Clarke LP; Lung Image Database Consortium Research Group. Lung image database consortium: developing a resource for the medical imaging research community. Radiology. 2004 Sep;232(3):739-48.

 

The inclusion criteria, communication protocol and markup protocol are described below.

 

 

  1. Inclusion Criteria (back to in d ex )
    1. Inclusion criteria for cases

The cases were nominated according to scan inclusion criteria which included the following:

  1. CT scan of the lungs only – no other modalities or anatomic regions were included in this database (Chest X-rays will be included in a future release).
  2. Full chest screening or diagnostic CT examination or a limited-anatomy diagnostic CT examination
  3. Reconstruction interval and collimation both < 3 mm
  4. Scans may include high levels of noise or streak, motion, or metal artifacts (this will be characterized in the assessment of image quality)
  5. Other pathology may be present, unless it is spatially contiguous with nodules and substantially interferes with their visual interpretation
  1. Inclusion criteria for Nodules

Nominated cases could have 0 to 6 nodules, where:

  1. The term “nodule” represents a spectrum of abnormalities (irrespective of presumed histology), which is itself a subset of a broader spectrum of abnormalities termed “focal abnormality;” a lesion should be considered a “nodule” if it satisfies the definition of “nodule” (the most essential component of which is its "nodular" morphology—the remaining components will be determined by the visual nodule library)
  2. Number of nodules with longest diameter between 3-30 mm should not exceed 6 (at least on the initially contributed scan of any particular patient)
  3. Nodules may represent primary lung cancers, metastatic disease, or non-cancerous processes
  1. Communication Protocol (back to in d ex )
    1. The LIDC used a two phase reading process. In the first phase, multiple readers (N=4 as of Jan 2006), read and annotated each case independently in a blinded fashion.  That is, they all read the same cases, but they the readings were done asynchronously and independently.  After the results of that first, blinded reading were complete, they were compiled and sent back out to the same readers so that they could see both their own markings as well as the markings from the other three readers.  Each reader then, again independently, read each case, this time with the benefit of information as to what other readers saw/marked, and then made a final decisions about the markings for that case.
  2. Categories of objects to Mark (back to i n de x )

The LIDC, after careful consideration, decided to have readers mark three categories of objects. These categories were based on object size and the LIDC’s desire to mark both nodules (those objects that meet the nodule criteria above) as well as objects that did not meet the nodule criteria, but that might be confused with nodules. This last category (non-nodules described below) was provided to assist CAD developers by explicitly marking objects that do not represent potential cancers. 

Please note that object size is determined by electronic calipers to estimate lesion’s longest diameter in the section demonstrating greatest extent of lesion. The lesion’s axial extent was not considered in lesion sizing

Therefore, the resulting categories are:

  1. Nodules > 3mm diameter (but < 30 mm diameter)
  • For these nodules, each reader drew a complete outline around the nodule in all sections in which it appeared, with the pixels that comprise the outline at the first pixel outside the nodule.
  • “Regions of exclusion” within the nodule were eliminated through construction of another outline, again with the pixels that comprise the exclusion outline considered part of the region of exclusion
  • Finally, each reader was asked to subjectively assess several characteristics of the nodule (each characteristic on a 1-5 scale):
    • Subtlety – in terms of its difficulty in detection
    • Internal structure – or expected internal composition of the nodule (soft tissue, fluid, fat, air)
    • Calcificiation – pattern of calcification if present
    • Sphericity – the three dimensional shape of the nodule in terms of its roundness:
    • Margin – description of how well defined the margins of the nodule is.
    • Spiculation – amount of speculation present in nodule
    • Texture – internal texture or composition of nodule in terms of solid and ground glass components
    • Malignancy - Radiologist subjective assessment of likelihood of malignancy of this nodule, ASSUMING  60-year-old male smoker
      1. Nodules < 3mm diameter
      • Indicate only the approximate three-dimensional center-of-mass of any such nodule of indeterminate nature (Note: NO contour provided).
      • If the opacity is clearly benign, no marking
      • No subjective assessment of characteristics

 

  1. Non-Nodules > 3mm  diameter
  • Indicate only the approximate three-dimensional center-of-mass of any such nodule of indeterminate nature (Note: NO contour provided).
  • No subjective assessment of characteristics
  • Indicate a “non-nodule nexus” region with a mark at its approximate three-dimensional center-of-mass or at a prominent focus of the nexus (note that the database will make no distinction between the single mark assigned to a “non-nodule nexus” and the single mark assigned to a single lesion considered a “non-nodule > 3 mm”); nodules within a non-nodule nexus may be indicated separately in accordance
  • A mass that exceeds 30 mm should be marked as a “non-nodule > 3 mm”

NOTE: Non-nodules < 3 mm were not marked.

  1. Results of the Marking Process (back to ind e x )

For each nominated case, the LIDC marking process was performed.  The results of that process for the first 30 cases are:

  1. Anonymized DICOM image data for the entire CT scan series along with the preserved DICOM header information (obviously minus the information removed for anonymization)
  2. For a specific CT series, the unblinded read results from each of the four readers are combined into one xml file. This file (described in detail below) contains:
    1. Descriptions of all nodules > 3 mm(contours and subjective assessments),
    2. For all nodules < 3mm, the approximate centroids are provided, and
    3. For all non-nodules, the approximate centroids or nexus locations are provided. 

Note that any reader information has been anonymized as well, but each xml file contains the results of all four readers.

  1.         Format and Content of XML data files (back to i n dex )

The overall format of the xml files is:

  1. A file header that contains information including the
    •  LIDC xml version number
    •  type of result contained in the xml file (these are all unblinded read results)
    • Series instance UID (0020,000E) of the CT series that this xml file is linked to.
  1. The results from each of the four radiologists’ unblinded “reading sessions” where a reading session consists of the results consists of a set of markings done by a single reader at a single phase (for these xml files, the unblinded reading phase). Each of the four reading sessions follows the same format:
    • <readingSession> indicates the beginning of this reading session. There will be up to four for each CT series.
    • Annotation version number and reader id (here all reader ids are set to “anon”, but there were four distinct readers for each case)
    • For each nodule > 3 mm marked by this reader, both the nodule characteristics and the complete roi boundary (which has > 1 point) are reported in the following manner:
      1. <unblindedReadNodule>  indicates the beginning of an unblinded read nodule section
      2. a nodule id – a unique id for the nodule marked by this reader
      3. the radiologist assessed characteristics of the nodule (described above in section V.1)
      4. Nodule Contour ROI – this is the description of the complete three dimensional contour of the nodule (remembering that the radiologist was instructed to mark the first voxel outside the nodule). 

Things to NOTE:

The overall format is to report each z (longitudinal direction) position using ImageZposition and imageSOP UID below) on which the nodule is visualized; and within each z position, to report the x and y coordinates (Note that (0,0) is upper left of the image) of the connected boundary points within each x-y plane.

Note also that the LIDC allowed the radiologists to describe regions of exclusion (primarily regions of air within a nodule), so an “inclusion” tag was developed: when the “inclusion” value is “true” the roi being described is considered to be part of the nodule, when the “inclusion” value is “false, the roi being described is excluded (or subtracted from) the nodule.

Thus, the format of the nodule contour is:

  1. roi – the <roi> tag indicates the beginning of a two dimensional contour description. 
  2. Image Z position - this is the table position recorded in the third element of the tuplet in DICOM header (0020,0032) for the image slice on which this portion of the nodule is visualized and contoured.
  3. imageSOP_UID - this is the unique identifier in the DICOM header (0008, 0018) for this image slice. It should be noted that this also field specifies the name of the corresponding image file (e.g. 1.3.6.1.4.1.9328.50.3.1892 .dcm ). The user can open that image file to see the image on which the nodule is visualized.
  4. Inclusion – “True” means that the roi that follows is considered part of the nodule; “False” means that the roi that follows should be subtracted from the nodule.
  5. Edge Map.  What follows is the list of (x,y) pairs that describe the connected set of points describing the nodule contour within the x-y plane determined by the z position above.  The format for each point is;

<edgeMap> beginning of edge map point

<xCoord>377</xCoord> x coordinate

<yCoord>248</yCoord> y coordinate

</edgeMap> end of this edge map point

<edgeMap> next edge map point

<xCoord>378</xCoord> x coordinate

<yCoord>247</yCoord> y coordinate

</edgeMap> end of this edge map point

  1. </roi> indicates the end of this two dimensional contour and may be followed by another two dimensional contour description.
  2. </unblindedReadNodule> indicates the end of this nodule’s description.
  • For each nodule < 3 mm marked by this reader, ONLY a single point is reported - and no characteristics.  Please NOTE that the distinction between nodules > 3 mm and nodules < 3 mm is that nodules < 3 mm have only a single roi point and do not have any radiologist assessed characteristics.  The data for nodules < 3mm is therefore a subset of the data for nodules > 3mm and is as follows:
    1. <unblindedReadNodule>  indicates the beginning of an unblinded read nodule section as above
    2. a nodule id – a unique id for the nodule marked by this reader
    3. Nodule Contour ROI –  For nodules < 3mm, this will consist of a single z position and a single edge map point and represents the approximate centroid of that nodule
      1. roi – same as for nodule > 3mm;  indicates the beginning of the two dimensional contour description. 
      2. Image Z position - same as for nodule > 3mm; indicates the table position recorded in the DICOM header (0020,1041) for the image slice on which this nodule < 3mm is visualized and marked.
      3. imageSOP_UID – same as for nodule > 3mm; this is the unique identifier in the DICOM header (0008, 0018) for this image slice
      4. Inclusion – same as for nodule > 3mm; but this should always be “TRUE” for nodules < 3mm.
      5. Edge Map.  Same as for nodule > 3 mm, but will be a single edge map point representing the approximate centroid of the nodule < 3mm as marked by this reader.  The format for this point is;

<edgeMap> beginning of edge map point

<xCoord>377</xCoord> x coordinate

<yCoord>248</yCoord> y coordinate

</edgeMap> end of this edge map point

  1. </roi> indicates the end of this nodule roi.
  2. </unblindedReadNodule> indicates the end of this nodule description.
  • For each non-nodule > 3 mm marked by this reader, ONLY a single point is reported - and no characteristics.  .  The data for non-nodules > 3mm is similar to data recorded for nodules < 3mm, but can be uniquely identified by the <nonNodule> tag.  This description does use similar data structures to the nodules described above:
    1. <nonNodule>  indicates the beginning of an non-Nodule > 3mm description
    2. a non-nodule id – a unique id for the non-nodule > 3mm marked by this reader
    3. Image Z position - the meaning is the same as for nodules and this indicates the table position recorded in the DICOM header (0020,1041) for the image slice on which this non-nodule > 3mm is visualized and marked. 
    4. imageSOP_UID – the meaning is the same as for nodules; this is the unique identifier in the DICOM header (0008, 0018) for the image slice on which this non-nodule is marked.  
    5. Locus – is unique to non-nodules (and is used in place of “edge map”) and indicates that the indicator point of the non-nodule is to follow:

<locus> beginning of non-nodule description

<xCoord>215</xCoord> x coordinate location of non-nodule

<yCoord>312</yCoord> y coordinate location of non-nodule

</locus> end of non-nodule description

 

  1. </ unblindedReadNodule> indicates the end of this non - nodule > 3mm description

 

  • <readingSession>  indicates the end of that reader’s session. There can be up to four reading sessions for each CT series. 

An example image with displayed markings is provided in Appendix 1 :

In addition, an example xml file with some added comments is provided in an additional document .


Appendix 1 – Example image with displayed markings (back to i n d e x )

The following images are from the LIDC first 30 cases.

Case:  1.3.6.1.4.1.9328.50.3.1888

The original folder from the LIDC ftp site is: 

ftp://ncicbfalcon.nci.nih.gov/lidc/LIDC_Release_0001/1.3.6.1.4.1.9328.50.3.1888

Original image file: 1.3.6.1.4.1.9328.50.3.1908.dcm

The associated xml file is: 1.3.6.1.4.1.9328.50.3.1888.xml

   

Original image with nodule      Image with nodule > 3 mm ROI displayed from file

 

Note that the xml file contains points that describe the contour of the nodule (i.e. only its border points), while the display here (for visibility and ease of understanding) shows the entire region of the nodule on this image (all border points and all internal points).