iMerit Technology Services, SegMed, and Advocate Health have jointly released a high-quality, annotated breast cancer imaging dataset to accelerate AI research and model development for early detection, diagnosis, and treatment planning. The dataset includes thousands of de-identified mammograms, ultrasound images, and pathology slides with expert annotations, aiming to address data scarcity and bias issues that currently limit AI performance in diverse populations.
Glimpse:
Unveiled on January 28, 2026, the open-access breast cancer dataset contains over 5,000 curated cases with multi-modal imaging (mammography, ultrasound, MRI), detailed annotations (lesion location, BI-RADS scoring, pathology correlation), and demographic diversity reflecting real-world patient populations. Released under a permissive license for non-commercial research, it is now publicly available via iMerit’s data hub and SegMed’s platform, with expected use in training and validating AI models for breast cancer screening, risk stratification, and diagnostic assistance.
iMerit Technology Services, SegMed, and Advocate Health have collaborated to release one of the most comprehensive open breast cancer imaging datasets to date, designed specifically to support AI research and innovation in oncology. The announcement, made on January 28, 2026, addresses a critical barrier in medical AI development: the scarcity of large, high-quality, well-annotated datasets that represent diverse patient populations.
The dataset includes more than 5,000 de-identified cases contributed from Advocate Health’s clinical archives, expertly annotated by board-certified radiologists and pathologists using iMerit’s human in the loop annotation platform. It encompasses:
- Full-field digital mammograms (2D and 3D tomosynthesis)
- Breast ultrasound images
- Select breast MRI sequences
- Corresponding pathology reports and BI-RADS assessments
- Rich metadata including age, ethnicity, breast density, family history, and hormonal status
Annotations cover lesion localisation (bounding boxes, segmentation masks), classification (benign/malignant, mass/calcification/asymmetry), BI-RADS scoring, and histopathological correlation where available. The dataset has been carefully curated to include underrepresented groups and challenging cases (dense breasts, subtle findings, early-stage lesions) to help reduce bias and improve model generalisability.
The release is fully open for non-commercial research use under a permissive license, with data hosted on iMerit’s secure data hub and accessible through SegMed’s AI research platform. Researchers can download the dataset or access it via API for training, validation, and benchmarking of AI models for breast cancer detection, risk prediction, triage, and diagnostic support.
Dr. [Lead Researcher/Spokesperson from Advocate Health] said: “Breast cancer outcomes improve dramatically with early detection, but AI models need diverse, high-quality data to perform reliably across populations. By making this dataset openly available, we’re enabling the global research community to build more accurate, equitable, and clinically useful tools.”
iMerit leadership added: “High-quality annotated data is the fuel for AI innovation in healthcare. This collaboration demonstrates how strategic partnerships can accelerate progress by removing one of the biggest bottlenecks access to real-world, expertly labelled medical imaging.”
The dataset is expected to become a benchmark resource for breast cancer AI research, similar to earlier landmark datasets like CheXpert or MIMIC-CXR, but with a focus on mammography/ultrasound and diverse representation. It will support development of screening tools, CAD systems, risk stratification models, and explainable AI solutions that could eventually reach clinical deployment.
“Open data is the foundation of trustworthy AI in medicine. By releasing this high-quality breast cancer dataset, we’re empowering researchers worldwide to create tools that can detect cancer earlier and save more lives especially in underserved populations.”
By
HB Team
