Summary
Excerpt |
---|
Sparsely Annotated Region and Organ Segmentation (SAROS) contributes a large heterogeneous semantic segmentation annotation dataset for existing CT imaging cases on TCIA. The goal of this dataset is to provide high-quality annotations for building body composition analysis tools (see [Koitka 2020: https://doi.org/10.1007/s00330-020-07147-3]). Existing in-house segmentation models were employed to generate annotation candidates on randomly selected cases. All generated annotations were manually reviewed and corrected by medical residents and students on every fifth axial slice while other slices were set to an ignore label (numeric value 255). In total 13 semantic body regions and 6 body part labels were annotated (index corresponds to numeric value in the segmentation file). Both annotation files define foreground labels on the same axial slices and match pixel-perfect.
The labels which were modified or require further commentary are listed and explained below:
TCIA Collections All900 sample cases, 450 female and 450 male, were randomly selected from the following TCIA collections (number of cases in parenthesis): ACRIN-FLT-Breast (32), ACRIN-HNSCC-FDG-PET-CT (48), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), Anti-PD-1_MELANOMA (2), C4KC-KiTS (175), COVID-19-NY-SBU (1), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), HNSCC (17), Head-Neck Cetuximab (12), LIDC-IDRI (133), Lung-PET-CT-Dx (17), NSCLC Radiogenomics (7), NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20), Pancreas-CT (58), QIN-HEADNECK (94), Soft-tissue-Sarcoma (6), TCGA-HNSC (1), TCGA-LIHC (33), TCGA-LUAD (2), TCGA-LUSC (3), TCGA-STAD (2), TCGA-UCEC (1). The annotations are provided in DICOM and NIfTI format. Both annotation files define foreground labels on the same axial slices and match pixel-perfect. In total, 13 semantic body regions and 6 body part labels were annotated with an index that corresponds to a numeric value in the segmentation file. Body Regions
Body Parts
The labels which were modified or require further commentary are listed and explained below:
For reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined and are described in the provided spreadsheet. Segmentation was conducted strictly in accordance with anatomical guidelines and only modified if required for the gain of segmentation efficiency. Benefit to Researchers: Researchers can build upon this data to develop fully automated body composition analysis pipelines or use this data for different purposes in medical image segmentation. |
Acknowledgements
We would like to acknowledge the individuals and institutions that have provided data for this collection:
...