Performance Evaluation

The performance is evaluated by measuring the area under the receiver operating characteristic (ROC) curve and the specificity at the correct recall rate of the average UK breast radiologists. The final score is value of 100 x (AUC + specificity) at the correct recall rate of the average UK breast radiologists. The maximum score is 200. It should be noted that the evaluation is based on breast level. Each breast is treated as one independent subject.

Submission Result File

Each participant should upload a Result file in ZIP format, and a Description file in PDF format. The ZIP file must be called "result.zip" and must contain one or two files in CSV format with the following names:

  • breast.csv: for breast level evaluation which determines the final score.
  • finding.csv: optional, for potential further analysis which will not be used for evaluation.

breast.csv should contain lines with a string indicating exam number, a string "l" or "r" indicating the laterality of the breast, and one number indicating the cancer likelihood for this breast in a scale of 0-100, per line. The values are comma (,) separated. If the first line is a header line, the line is silently ignored; any other lines not containing a valid exam number will cause the submission to be rejected. The test set exam numbers are 3 characters while the actual dataset exam numbers are 6 characters long.

In finding.csv, the first string is the exam number, the second string should be laterality (l or r), the third string is the projection mlo or cc, the next is the finding location x, finding location y coordinates in pixel coordinates in the original image, the last one is the cancer likelihood for this finding in the scale of 0-100. The participants are encouraged to submit the second optional file. Here, too, the values are comma separated and a header line will be ignored.

An example breast.csv may look like this (with CSV parsed to table form):

Examid Laterality Cancer likelihood
0000001        l 99
0000002        r 100

The finding.csv file may look like this:

Examid Laterality Projection x (pixel)
y (pixel)
Cancer likelihood
0000001      l mlo 13 15 25
0000001      r mlo 43 35 90
0000002      l mlo 14 19 25

Description of the Computer System

A pdf file describing the computer system shall be submitted together with the result. The pdf file can be in the form or a scientific report describing the development of the system for the submission (method, results, evaluation, etc). For commercial systems, it is understandable that participates are not willing to give details of the implementation, the general ideas of the system shall be provided. The specific version of the product and company associated to shall be mentioned. In the description, contact information shall be mentioned. The (rough) number of positive (how many softtissue lesions and how many calcification groups) and negative exams used for training,  exams vendors, general overall and detail explanation of the algorithm including normalization, segmentation, candidate selection, feature computation, patch extraction, network, training strategy, etc are encouraged to mention.

Data Sets

To reflect the real screening situations, the exams used in this challenge come from various vendors (GE, Hologic, Siemens and Sectra).  We have prepared two batchs of datasets. The first batch of 40 breast data and the ground truth information in the first round is provided to give participant an idea of how the data looks like. The second batch of data are used for real evalaution of systems including 5 groups of datasets. Each dataset includes 60 breasts.  The portion of cancers and bengin lesions will not be released before the deadline.

Submission Rule

We have two rounds of submission.

The first round is for validating correctness of the submission file. Participants are encouaged to download the validation cases (40 breasts) and upload the submission file.  The score will be immediately computed by the server and shown in our website.

The second round is for real evaluation of systems in this challenge. To prevent tuning the algorithm against the ground truth, the results for this round will not be given immedidately. The scores of all particpants will be released after the submission deadline. There is no limitation on the number of submission from the same team. The latest score from the history will be the final score.