Child pages
  • A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis (Lung-PET-CT-Dx)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Summary


Excerpt

Our dataset consists of three parts: raw DICOM data, JPG images transformed from raw DICOM data, and non-image data including sex, age, history, some DICOM files and XML Annotation files.  The XML Annotation files, which are widely used in deep learning and machine learning research, were provided by five Radiologist (original said doctors) and two deep learning researchers making our dataset a useful tool and resource for developing algorithms for medical diagnosis.

The images were analyzed on the mediastinum patients have gene expression, and pathologist reports. The images were analyzed both on the mediastinum  (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval slice interval varies from 0.625 mm to 5 mm, scanning . Scanning mode include includes plain scan, contrast scan, 3D reconstruction, etc. All the cases were confirmed by pathological diagnosis. We labeled the locations of tumor in JPG images. And and contrast and 3D reconstruction.  The location of the tumors was labeled in the DICOM images, and the image annotations are saved in XML files in Annotation Files with Hashed Filenames in PASCAL VOC format. Users can parse the annotations using the PASCAL Development Toolkit  https://pypi.org/project/pascal-voc-tools/

Several lines about the protocol and how subjects were recruited.  The cases were confirmed by pathological We provide JPG images and XML annotation files in PASCAL VOC format which is widely used in deep learning and machine learning researches. The annotation files are provided by five doctors and two deep learning researchers. Besides that, all the cases were confirmed by pathology. Thus, we can guarantee our dataset precise and ease of use. Our dataset can be regarded as a useful tools and data resource to develop medical diagnosis algorithm based on deep learning. On the other hand, our data set can be used as an effective tool for promoting medical diagnosis.


Acknowledgements

We would like to acknowledge the individuals and institutions that have provided data for this collection:

...