Child pages
  • Computed Tomography Images from Large Head and Neck Cohort (RADCURE)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

The RADCURE dataset was collected clinically for radiation therapy treatment planning and retrospectively reconstructed for quantitative imaging research.  

Acquisition and Validation Methods: RADCURE is comprised of data for 3,346 patients and contains computed tomography (CT) images with corresponding normal and non-normal tissue contours. CT scans were collected using systems from three different manufacturers. Standard clinical imaging protocols were followed, and contours were generated and reviewed at weekly quality assurance rounds. RADCURE imaging and structure set data was extracted from our institution’s radiation treatment planning and oncology systems using an in-house data mining and processing system. Furthermore, images are linked to clinical data for each patient and include demographic, clinical and treatment information based on the 7th edition TNM staging system. The median patient age is 63, with the final dataset including 80% males. Oropharyngeal cancer makes up 50% of the population with larynx, nasopharynx, and hypopharynx cancer, comprising 25, 12, and 5% respectively. Median follow-up was 5 years with 60% of the patients alive at last follow-up.   

Data Format and Usage Notes: During extraction of images and contours from our institution’s radiation treatment planning and oncology systems, the data was converted to DICOM and RTSTRUCT formats, respectively. To improve the usability of the RTSTRUCT files, individual contour names were standardized for primary tumor volumes and 29 organs-at-risk. Demographic, clinical, and treatment information is provided as a comma-separated values (csv) file. This dataset is a superset of the Radiomic Biomarkers in Oropharyngeal Carcinoma (OPC-Radiomics) dataset and fully encapsulates all previous data; this dataset replaces the OPC-Radiomics dataset. The RTSTRUCTs from OPC-Radiomics have been standardized to adhere to the TG-263 nomenclature. Age of 90 years or greater is considered PHI and set to 90 years to minimize impact to privacy. Both radiological and clinical metadata were offset by an undisclosed number of days for anonymization and should be noted for downstream analysis. The TG263-standardized RTSTRUCTs include only the GTVp (primary gross tumor volume) contours. Patients without corresponding GTVp contours will not have RTSTRUCTs.

Potential Applications: The availability of imaging, clinical, demographic and treatment data in RADCURE makes it a viable option for a variety of quantitative image analysis research initiatives. This includes the application of machine learning or artificial intelligence methods to expedite routine clinical practices, discover new non-invasive biomarkers, or develop prognostic models.  

Several studies have tried to address this data cleaning challenge using different approaches (Ger et al. 2018; Gjesteby et al. 2016). Recently, a convolutional neural network (CNN) was used to detect patient CT volumes containing artifacts with a precision-recall area under the curve (AUC) of (0.92) and accuracy of (98.4%) (Welch et al. 2020). However, to the author’s knowledge, no work has been done to differentiate between dental artifacts (DA) of different magnitudes or to quantify how the location of DAs could affect quantitative imaging features used to train radiomic models. Furthermore, previous DA detection studies have classified hand-drawn regions of interest (ROI) as DA positive or DA negative (REF) but have not examined the correlation between radiomic features in a given ROI and its distance from the DA source. These methods, even if effective at screening datasets for artifacts, could cause vast amounts of data to be unnecessarily marked as unclean, even if the artifacts do not homogeneously affect radiomic features in the patient’s image volume. In this study, we propose a novel two-step combinatorial algorithm to detect DAs on a per-slice basis in CT image datasets. Conventional image processing methods based on histogram-based thresholding and the CT sinogram are combined with a previously-published CNN network in order to create a three-class DA classifier and DA location detector for large radiomic datasets. This algorithm works on patient CT volumes with minimal preprocessing or manual annotation. Finally, we examined the correlation between quantitative imaging features and the physical distance between the DA and the gross tumour volume (GTV).

Inclusion: The dataset used for this study consists of 3346 head and neck cancer CT image volumes collected from2005-2017 treated with definitive RT at the University Health Network (UHN) in Toronto, Canada

...