Detailed DescriptionImage Statistics |
|
---|
Modalities | Pathology | Number of Patients | 273 | Number of Images | 273 | Images Size (GB) | 40 |
Description of data sets: Yale HER2 cohort: This dataset presents 188 HER2 positive and negative invasive breast carcinomas H&E slides from the Yale Pathology electronic database. All tissues and data were retrieved under permission from the Yale Human Investigation Committee protocol #9505008219 to DLR. HER2 positive cases defined as those with 3+ score by immunohistochemistry (IHC) or an equivocal (2+) IHC score with subsequent amplification by fluorescence in situ hybridization (FISH) as defined by American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) clinical practice guidelines. H&E slides generated at Yale School of Medicine include 93 HER2+ and 95 HER2- slides. The slides were scanned at Yale Pathology Tissue Services and underwent a slide quality check before they went into the scanner. The tissue samples were scanned using Vectra Polaris by Perkin-Elmer scanner using bright field whole slides scanning at 20× magnification at Brady Memorial Laboratory Rimm’s lab. Yale trastuzumab response cohort: The response cohort cases were identified also by retrospective search of the Yale Pathology electronic database. Cases included those patients with a pre-treatment breast core biopsy with HER2 positive invasive breast carcinoma who then received neoadjuvant targeted therapy with trastuzumab +/- pertuzumab prior to definitive surgery. HER2 positivity was defined as previously described for the HER2 negative/positive cohort. The response to targeted therapy was obtained from the pathology reports of the surgical resection specimens and dichotomized into responders or non-responders. Those with a complete pathologic response, defined as no residual invasive, lymphovascular invasion or metastatic carcinoma, were designated as responders (n=36). Cases with only residual in situ carcinoma were included in the responder category. Those cases with any amount of residual invasive carcinoma, lymphovascular invasion or metastatic carcinoma were categorized as non-responders (n=49). TCGA HER2 cohort: A total of 668 TCGA-BRCA HER2+/- samples with available HER2 status were downloaded from the GDC portal. Slides were visually inspected by our pathology team to exclude low quality samples with tissue folding or those that appeared to be from frozen tissue. A total of 187 samples (92 HER2- and 95 HER2+) were retained for use as independent test set. TCGA data: These are TCGA-BRCA HER2+/- samples with available HER2 status (Total 668). Slides were visually inspected by our pathology team to exclude low quality samples with tissue folding or those that appeared to be from frozen tissue. A total of 187 samples (92 HER2- and 95 HER2+) were retained and annotated to mark tumor ROIs by our pathology team. Data annotation: Annotation of digital slides was performed, circling areas of invasive carcinoma (Region of Interests, ROIs). Regions of necrosis, in situ carcinoma or benign stroma and epithelium were excluded. The images were annotated with ROIs associated to HER2+/- tumor area (TA) by a senior breast pathologist. The annotations were marked tumor boundaries and annotated by Aperio ImageScope software. The annotations were exported from the Aperio software in The Extensible Markup Language (XML) format, including X and Y coordinates corresponding to the annotated regions. We used these coordinates for each slide image to tile these regions separately from the rest of the image, labeled as HER2+ or HER2- class. This data can be used for training of HER2 status deep-learning classifiers as well as trastuzumab response predictions. The manual annotation of ROIs significantly enhances the prediction accuracy and reduces the need for extensively large datasets. |