# Breast and FGT MRI Segmentation Training and Test Data

This folder contains supplemental data used for the paper titled "Fully Automated Deep Learning Method for Fibroglandular Tissue Segmentation in Breast MRI". The objective of this study was to develop and validate a fully automated deep-learning segmentation method based on convolutional neural network (CNN) for breast and fibroglandular tissue (FGT) segmentation and quantification as well as to publicly share annotation masks, radiologists scores, and all code used in the study. This folder contains the data used in the study in a format to supplement the data contained within the TCIA database. 

The image slices were taken from fat-saturated gradient echo T1-weighted pre-contrast MRI studies in the Duke Breast Cancer MRI dataset (https://doi.org/10.7937/TCIA.e3sv-re93). For each volume, three image slices were selected for breast segmentation and three image slices were selected for FGT segmentation. In order to have a comprehensive MRI dataset, we applied the following selection rules for image slices: 
	- To select images for breast segmentation, the MRI volume was evenly divided into thirds from which one image slice was randomly selected from each third of the volume
	- To select images for FGT segmentation, the MRI study was evenly divided into fourths from which two image slices were randomly selected from the middle half (since most fibroglandular tissue is concentrated in the middle of the breast) and one image slice was randomly selected from the first or last quarter sections of the MRI study.

The dataset contains all the data used in the study (training, test, validation). The validation set was 10% (10 subjects) of the full training set, chosen randomly before starting model training, leaving 90 subjects for training. There are 27 test cases that were used for model verification. There are breast radiologist assessments only for the test cases and an additional 23 cases that are not part of the training, test, or validation set, to form an extended test set with a total of 50 cases. 

The segmentation masks are stored in the DICOM file format. Each pixel in the image array is either a 0 or 1, to indicate an absence of or presence of the target tissue (breast or FGT). All segmentation masks were reviewied and verified by fellowship trained breast radiologists. 

Additionally, this folder contains "segmentation_filepath_mapping.csv" which can be used to map the segmentations in TCIA to the slices that were used to generate the segmentations. The slices were looked up by examining the following DICOM header information: PerFrameFunctionalGroupsSequence -> DerivationImageSequence -> SourceImageSequence -> ReferencedSOPInstanceUID. 

The segmentation mask files in this supplemental folder have the following naming scheme:

Breast_MRI_{patient_ID}_pre_{slice_number}.nrrd

# File Directory

.
├── Segmentation_Masks_DICOM       		           # Directory containing true segmentation masks
│   ├── breast_train       						   # Breast tissue training segmentation masks
│   ├── breast_test       						   # Breast tissue testing segmentation masks
│   ├── dense_tissue_train       				   # Dense tissue (FGT) training segmentation masks
│   └── dense_tissue_test     				       # Dense tissue (FGT) teting segmentation masks
├── train_ids.csv                    		       # Subject IDs for full training set
├── test_ids.csv                    		       # Subject IDs for test set
├── Breast_Radiologist_Density_Assessments.xlsx    # Excel file containing breast radiologist BI-RADS breast density assessments
├── segmentation_filepath_mapping.csv              # CSV file that maps TCIA segmentation data to subject slices
└── README.txt