Motivation

Breast cancer is one of the leading cancer-related causes of death worldwide, especially for women. To reduce mortality, mammography has been used as the primary screening tool for early cancer detection since the 1970s.

With the development of Artificial Intelligence, computer-aided detection and diagnostic systems have been developed to help radiologists in breast screening. With the development of deep learning, many systems in academia and industry claim that their performance is equal to, or superior to, human observers. However, the end user (clinician, screening program etc.) have no real-world information to substantiate such claims. To bridge the gap between the envisaged results of AI systems and the real screening environment, this Challenge is specifically targeted to examine the validation of AI systems.

The question in this Challenge is: can current AI systems replace the radiologist in screening?

To achieve that, it is proposed to use cases that are carefully selected challenging recent screening cases from the UK PERFORMS scheme.

To maintain screening quality in the UK, radiologists are regularly studied using carefully selected challenging recent screening cases. The PERFORMS program is an annually organized scheme which assesses the reporting skills of all UK breast radiologists. Up to 1,000 skilled radiologists participate in this program annually and the FFDM exam cases used are randomly selected from the UK screening exam pool of over 10 million cases.

A detailed comparison between the performance of radiologists and machine systems on the same set of cases can elucidate the advantages and disadvantages in the performance of both groups and therefore, further guide screening research, and the development of computer methods for mammogram interpretation in the future.

Other Challenges of breast screening algorithms have utilized sets of recent screening cases which have typically only been interpreted by one or a few radiologists. In this unique Challenge a dataset of known difficult cases which have been specially selected from the UK screening programme is used. All these cases have been reported in routine screening practice by one or more screening radiologists but have additionally been (1) read and selected as difficult and challenging by a panel of expert breast screening radiologists in the UK and (2) read and reported by circa 1,000 UK radiologists. Thus, for every case in the Challenge there are expert opinions and also a range of opinions from the bank of radiologist readers against which the decisions of algorithms can be compared. With this Challenge, we hope to answer the question of to what extent the performance of UK screening radiologists can be replaced, or enhanced, by Artificial Intelligence in screening.

Materials & Method

We welcome any AI system developer in academia or industry to participate in this Challenge. We do not require developers to upload their systems. Only a system/algorithm description and testing results are required to be uploaded.

We have selected cases from our PERFORMS certified case data pool. The same cases have been read by up to 1,000 radiologists from the UK breast screening program. AI developers need to upload the cancer likelihood score for each examination case as determined by their algorithms. Please note that only processed mammograms are provided which are the most common in PACS.

We will directly compare ROC curves, partial area under the ROC curves, and specificity at the average correct recall rate of UK breast radiologists as comparative measurements. The combined score will be used in ranking the algorithms. The winner of this Challenge will be endorsed by the PERFORMS program as a system aid for screening if the algorithm’s score is better than the average radiologist. It is planned that the Challenge results will be presented at top radiology conferences, such as RSNA, ECR, etc and be published at top journals.

Important dates

Challenge Website Launched August 5, 2019
Validation Data Release (Images + Groundtruth) August 20, 2019
Site Open For Validation Submissions August 31, 2019
Test Data Release (Images Only) October 5, 2019
Site Open For Test Submissions October 20, 2019
Release Results
November 5, 2019

Sponsors and partners