Child pages
  • SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data (SAROS)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Summary

Excerpt

Image Added

Sparsely Annotated Region and Organ Segmentation (SAROS) contributes a large heterogeneous semantic segmentation annotation dataset for existing CT imaging cases on TCIA. The goal of this dataset is to provide high-quality annotations for building body composition analysis tools (see [Koitka 2020: https://doi.org/10.1007/s00330-020-07147-3]). Existing in-house segmentation models were employed to generate annotation candidates on randomly selected cases. All generated annotations were manually reviewed and corrected by medical residents and students on every fifth axial slice while other slices were set to an ignore label (numeric value 255).

900 sample cases, 450 female and 450 male, CT series from 882 patients were randomly selected from the following TCIA collections (number of cases CTs per collection in parenthesis):    ACRIN-FLT-Breast (32), ACRIN-HNSCC-FDG-PET/CT (48), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), Anti-PD-1_MELANOMA (2), C4KC-KiTS (175), COVID-19-NY-SBU (1), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), HNSCC (17), Head-Neck Cetuximab (12), LIDC-IDRI (133), Lung-PET-CT-Dx (17), NSCLC Radiogenomics (7), NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20), Pancreas-CT (58), QIN-HEADNECK (94), Soft-tissue-Sarcoma (6), TCGA-HNSC (1), TCGA-LIHC (33), TCGA-LUAD (2), TCGA-LUSC (3), TCGA-STAD (2), TCGA-UCEC (1).

A script to download and resample the images is provided in our GitHub repository: https://github.com/UMEssen/saros-dataset

The annotations are provided in DICOM and NIfTI format . Both and were performed on 5mm slice thickness. The annotation files define foreground labels on the same axial slices and match pixel-perfect. In total, 13 semantic body regions and 6 body part labels were annotated with an index that corresponds to a numeric value in the segmentation file. 

Body Regions

  1. Subcutaneous Tissue
  2. Muscle
  3. Abdominal Cavity
  4. Thoracic Cavity
  5. Bones
  6. Parotid Glands
  7. Pericardium
  8. Breast Implant
  9. Mediastinum
  10. Brain
  11. Spinal Cord
  12. Thyroid Glands
  13. Submandibular Glands

Body Parts

  1. Torso
  2. Head
  3. Right Leg
  4. Left Leg
  5. Right Arm
  6. Left Arm

The labels which were modified or require further commentary are listed and explained below:

  • Subcutaneous Adipose Tissue: The cutis was included into this label due to its limited differentiation in 5mm-CT.
  • Muscle: All muscular tissue was segmented contiguously and not separated into single muscles. Thus, fascias and intermuscular fat were included into the label. Inter- and intramuscular fat is subtracted automatically in the process.
  • Abdominal Cavity: This label includes the pelvis. The label does not separate between the positional relationships of the peritoneum.
  • Mediastinum: The International Thymic Malignancy Group (ITMIG) scheme was used for the segmentation guidelines.
  • Head + Neck: The neck is confined by the base of the trapezius muscle.
  • Right + Left Leg: The legs are separated from the torso by the line between the two lowest points of the Rami ossa pubis.
  • Right + Left Arm: The arms are separated from the torso by the diagonal between the most lateral point of the acromion and the tuberculum infraglenoidale.

For reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined and are described in the provided spreadsheet. Segmentation was conducted strictly in accordance with anatomical guidelines and only modified if required for the gain of segmentation efficiency. 

Benefit to Researchers: Researchers can build upon this data to develop fully automated body composition analysis pipelines or use this data for different purposes in medical image segmentation.

Acknowledgements

We would like to acknowledge the individuals and institutions that have provided data for this collection:

...

Hospital/Institution Name city, state, country - Special thanks to First Last Names, degree PhD, MD, etc from the Department of xxxxxx, Additional Names from same location.

To the entire annotation lab team at the Institute for Artificial Intelligence in Medicine (IKIM, University Hospital Essen), we express our profound gratitude for your meticulous efforts in data segmentation. Your dedication ensures accuracy and efficiency, paving the way for this collection. Thank you for your invaluable contribution.

To all collections that shared their data and made it possible that we could prepare the segmentations: thank you! Your contributions made it possible to provide an open available segmentation dataset for CT based body composition analysis

...

.


(Download requires the NBIA Data RetrieverTissue Slide Images (SVS, XX.X GB
Localtab Group


Localtab
activetrue
titleData Access

Data Access

Note to curators! This macro is for collections that are restricted due to facial reconstruction possibility.

Tcia head license access

Note to curators! This macro is for collections that are restricted because of a PI's embargo request

General restricted license access

Note to curators! Only use the appropriate macro for your collection above, or the text below in the case of NCTN trials.  It will need to be tailored with the correct NCT#.

This is a limited access data set. To request access please register an account on the NCTN Data Archive.  After logging in, use the "Request Data" link in the left side menu.  Follow the on screen instructions, and enter NCT00352534 when asked which trial you want to request.  In step 2 of the Create Request form, be sure to select “Imaging Data Requested”. Please contact NCINCTNDataArchive@mail.nih.gov for any questions about access requests.

Data TypeDownload all or Query/FilterLicense
SAROS Segmentations (NIfTI, 140 MB)

Images, Segmentations, and Radiation Therapy Structures/Doses/Plans (DICOM, XX.X GB)

<< latter two items only if DICOM SEG/RTSTRUCT/RTDOSE/PLAN exist >>


Tcia button generator
Tcia button generator
labelSearch
(Download requires NBIA Data Retriever
urlhttps://wiki.cancerimagingarchive.net/download/attachments/157287899/SAROS%20Collection%20NIfTI%20files%20-%209-12-2023.zip?api=v2


(Download requires Aspera plugin)

Tcia cc by 4

Tcia restricted license

Nctn_with_collab license only

Nctn_no_collab license only


Tissue Slide Images (SVS, XX.X GBSegmentation Information Spreadsheet (CSV,176 KB)


Tcia button generator
Tcia button generator
labelSearch
(Download requires Aspera plugin)

Tcia cc by 4

Tcia restricted license

Nctn_with_collab license only

Nctn_no_collab license only

Clinical data (CSV)
tcia-button-generator
urlhttps://wiki.cancerimagingarchive.net/download/attachments/157287899/Segmentation%20Info_09-29-2023.csv?api=v2



Tcia cc by 4

Tcia restricted license

Nctn_with_collab license only

Nctn_no_collab license only


Click the Versions tab for more info about data releases.

Additional Resources Additional Resources for this Dataset

Note to curators! Use this any time you are linking to NCI's IDC/GDC/PDC resources.  The links below are examples and will need to be tailored to point to the specific dataset (see parameters in URLS).

Nci_crdc additional resources

Note to curators! The link below is an example for NCTN trials and will need to be tailored to the proper URL for the corresponding data on the NCTN Data Archive.

Nctn additional resources

Note to curators! Below are examples for what to do with other external resources/links that don't fit into the above categories.

The following external resources have been made available by the data submitters.  These are not hosted or supported by TCIA, but may be useful to researchers utilizing this collection.

  • Software / Code on Github
  • Genomics data in DbGAP
  • Genomics data in Gene Expression Omnibus

    Collections Used in this Third Party Analysis

    Below is a list of the Collections used in these analyses:.  

    Tcia head license access

    Source Data
    TypeSearch
    Download all or Query/FilterLicense
    ACRIN-FLT-Breast (32),

    Source Images (DICOM, 29.7 GB)

    ACRIN-HNSCC-FDG-PET/CT (48), Anti-PD-1_MELANOMA (2), HNSCC (17), Head-Neck Cetuximab (12), QIN-HEADNECK (94), TCGA-HNSC (1)


    Tcia button generator
    urlhttps://wiki.cancerimagingarchive.net/download/attachments/157287899/SAROS_SourceImages_TCIARestricted_9-12-2023.tcia?api=v2


    (Download requires NBIA Data Retriever)

    Tcia restricted license



    Source Images (DICOM, 66.9 GB)

    ACRIN-FLT-Breast (32), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12),

    Anti-PD-1_MELANOMA (2),

    C4KC-KiTS (175),

     

    CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26

    ), HNSCC (17

    ),

    Head-Neck Cetuximab (12),

    LIDC-IDRI (133),

    Lung-PET-CT-Dx (17),

    NSCLC Radiogenomics (7),

    NSCLC

    Pancreas-

    Radiomics

    CT (

    56

    58),

    NSCLC

    Soft-

    Radiomics

    tissue-

    Genomics

    Sarcoma (

    20

    6),

    Pancreas

    TCGA-

    CT

    LIHC (

    58

    33),

    QIN

    TCGA-

    HEADNECK

    LUAD (

    94

    2),

    Soft-tissue-Sarcoma (6

    TCGA-LUSC (3), TCGA-STAD (2), TCGA-

    HNSC

    UCEC (1)

    ,


    tcia
    -restricted-license

    (note to curator:  check the original collection for its DICOM license)

    -button-generator
    urlhttps://wiki.cancerimagingarchive.net/download/attachments/157287899/SAROS_SourceImages_CCby30_9-12-2023.tcia?api=v2


    (Download requires NBIA Data Retriever)

    Tcia cc by 3

    Source Images (DICOM, 4.63 GB)

    NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20)


    Tcia button generator
    urlhttps://wiki.cancerimagingarchive.net/download/attachments/157287899/SAROS_SourceImages_CCbyNC30_9-12-2023.tcia?api=v2


    (Download requires NBIA Data Retriever)

    Corresponding Original Images from,TCGA-LIHC , TCGA-LUAD , TCGA-LUSC , TCGA-STAD. TCGA-UCEC

    Tcia cc by 3-nc

    (note to curator:  check the original collection for its DICOM license),

    Source Images (DICOM, 1.40 GB)

    COVID-19-NY-SBU

    (1), Lung-PET-CT-Dx (17)


    Tcia button generator
    urlhttps://wiki.cancerimagingarchive.net/download/attachments/157287899/SAROS_SourceImages_CCby40_9-12-2023.tcia?api=v2


    (Download requires NBIA Data Retriever)


    Tcia cc by 4

    (note to curator:  check the original collection for its DICOM license)


    Localtab
    titleDetailed Description

    Detailed Description

    Image Statistics

    Radiology Image StatisticsPathology Image Statistics

    ModalitiesModalities

    CT

    Number of Patients

    882

    Number of Studies

    894

    Number of Series

    900

    Number of Images

    1800

    Images Size (GBMB)
    << Add any additional information that didn't fit or belong in the Summary section. >>
    140



    Localtab
    titleCitations & Data Usage Policy

    Citations & Data Usage Policy

    Tcia limited license policy

    Info
    titleData Citation

    DOI goes here. Create using Datacite with information from Collection Approval form

    Info
    titlePublication Citation

    We ask on the proposal form if they have ONE traditional publication they'd like users to cite.

    Info
    titleAcknowledgement

    Required acknowledgements only (ex:The CPTAC program requests that publications using data from this program...). If they just want to thank someone, that goes in the Acknowledgement section underneath the Summary.Koitka, S., Baldini, G., Kroll, L., van Landeghem, N., Haubold, J., Sung Kim, M., Kleesiek, J., Nensa, F., & Hosch, R. (2023). SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data (SAROS) (Version 1) [Data set]. The Cancer Imaging Archive. https://doi.org/10.25737/SZ96-ZG60


    Info
    titleTCIA Citation

    Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. In Journal of Digital Imaging (Vol. 26, Issue 6, pp. 1045–1057). Springer Science and Business Media LLC.https://doi.org/10.1007/s10278-013-9622-7

    Other Publications Using This Data

    TCIA maintains a list of publications which leverage TCIA data. If you have a manuscript you'd like to add please contact the TCIA Helpdesk.


    Localtab
    titleVersions

    Version

    X

    1 (Current): Updated

    yyyy

    2023/

    mm

    09/

    dd

    28

    Data TypeDownload all or Query/FilterLicense

    Images, Segmentations, and Radiation Therapy Structures/Doses/Plans (DICOM, XX.X GB)

    << latter two items only if DICOM SEG/RTSTRUCT/RTDOSE/PLAN exist >>

    SAROS Segmentations (NIfTI, 140 MB)


    Tcia button generator
    Tcia button generator
    labelSearch
    urlhttps://wiki.cancerimagingarchive.net/download/attachments/157287899/SAROS%20Collection%20NIfTI%20files%20-%209-12-2023.zip?api=v2


    (Download requires Aspera plugin)

    Tcia cc by 4

    Tcia restricted license

    Nctn_with_collab license only

    Nctn_no_collab license only


    Segmentation Information Spreadsheet (CSV, 176 KB)


    Tcia button generator
    Tcia button generator
    labelSearch

    Tcia cc by 4

    Tcia restricted license

    Nctn_with_collab license only

    Nctn_no_collab license only

    Clinical data (CSV) Tcia button generator

    Tcia cc by 4

    Tcia restricted license

    Nctn_with_collab license only

    Nctn_no_collab license only

    << One or two sentences about what you changed since last version.  No note required for version 1. >> 
    urlhttps://wiki.cancerimagingarchive.net/download/attachments/157287899/Segmentation%20Info_09-29-2023.csv?api=v2



    Tcia cc by 4