Challenge Overview - STOIC2021 - COVID-19 AI Challenge

Challenge overview¶

The STOIC 21 challenge will be organised in two phases:

A qualification round with a publicly available sample of the STOIC database.
The use of this sample of STOIC database is subject to the CC-BY NC 4.0 license.
The best-performing teams will be invited to participate in the final round. This round will use the same test set. For this round, the finalists will upload a Docker container that can train an improved model, using the model from the qualification round as a starting point. The code for this container needs to be released under a permissive open-source license. This container will be trained on the grand-challenge.org platform, hosted on AWS, in a secure and protected environment with all scans from the entire STOIC dataset (minus the 1000 test scans, of course). The resulting solutions will be made available on https://grand-challenge.org/algorithms/ for research use. The access to the STOIC database and the Algorithms by the finalists shall be subject to the signature of a specific agreement between AP-HP and each finalist, which notably specifies the disposition of Intellectual Property. Each disposition of each agreement shall be the same for all finalists.

The focus of the challenge is the prediction of COVID19 severity at one month (AUC computed with COVID19 positive patients only, primary metric). In all stages of the challenge, only this primary metric will be used to rank participants.
To provide additional feedback to participants, AUC for COVID19 RT-PCR test positivity will be assessed as a secondary metric on the Leaderboard.

Qualification round¶

The participants can download 2,000 CT scans (~20% of the dataset), randomly selected from the STOIC database, including corresponding clinical labels (RT-PCR +, severity) for training their Algorithms. See the database page for instructions on how to download the public training set.

At the end of the qualification round all teams must submit a github repository including training and testing code, with a permissive open source license (Apache 2, MIT). Their performance will be assessed using a test set consisting of ~1 000 extra CT scans.

The test data is kept secure on GC and cannot be downloaded by participants. Participants upload a docker inference container to test their ML model against test data in the form of a GC Algorithm.

A tutorial on how to submit to the qualification round can be found here.

Qualification Leaderboard (open)¶

During the qualification round, participants can monitor their Algorithm's performance by submitting to the "Qualification" Leaderboard. This Leaderboard shows the performance of submitted algorithms based on a part of the test set (~200 scans). Participating teams are allowed one submission per team per week to this Leaderboard. The processing of a single CT scan by a submitted Algorithm should preferably take no more than 5 minutes on an NVIDIA T4 GPU (16 GB) with 8 CPUs (32 GB).

Qualification (last submission) Leaderboard (open upon request)¶

Note: Now that STOIC2021 has concluded, submission to this leaderboard is possible upon request to the challenge organizers.

Each team may make one submission to the "Qualification (last submission)" Leaderboard. This submission will be tested on a separate test set (different to the one used for the “Qualification” Leaderboard). Submissions toward the "Qualification (last submission)" Leaderboard do not count towards the "Qualification" Leaderboard submission limit.

The best 10 teams on this Leaderboard that outperform the baseline will be invited to the Final phase.

When submitting to the "Qualification (last submission)" Leaderboard, we request teams to additionally submit a PDF of 1 page maximum explaining their methods. This PDF should contain details about the model you used, preprocessing methods (if any), augmentation, training strategy, etc. This PDF will only be visible to the challenge organizers and not to any participants.

We will use these PDFs to document an overview of the methods used in the peer reviewed article we intend to publish about STOIC2021, and to prepare the form that teams must fill in for submitting to the Final phase (see below).

Final round (closed)¶

The selected teams will submit their training code to the final stage of the challenge. This code will be used to train their algorithm on both the public training set (2,000 CT scans + labels) and the private training data (7,000+ CT scans + labels) on the Grand Challenge platform.

Submitting to the final phase¶

A tutorial on how to structure your code for submitting to the Final round is available here.

Submissions to the Final phase must be github repos including:

Training and inference code structured as described in the template.
A permissive open source license in the root

Teams will additionally be required to fill in a form about the methods they used. This form is still to be released. More details about this form will be posted here later.

Training environment¶

Each of the finalist teams gets to make one last submission of their training code base, which will be trained on the full training set (9000+scans). This full training set set consists of:

the 2000 scans in Public training set,
the 200 scans used for the "Qualification" leaderboard
the ~800 scans used for the "Qualification (last submission)" leaderboard
an extra 6000+ scan from the STOIC database

The full training set will be structured the same as the public training set, and the patients in the public training set will retain their patient IDs in the full training set.
Your algorithm submission will be trained with access to the following resources:

Two Tesla V100 GPUs of 32 GB each,
16 cpus with a total of 128G cpu memory
120 hours of training time.

If your training algorithm fails in the first hour of training, this will not count as your training run.

Evaluation¶

After training, the submissions to the Final phase will be evaluated on the final test set (the remaining ~1000 cases from the STOIC database). The resulting performances will be posted on the “Final” Leaderboard.

Prizes will be awarded to the best performing participants in the final round. At the end of the challenge, the Type 2 algorithms (from the qualification round) and Type 3 algorithms (from the final round) are available on https://grand-challenge.org/algorithms/. We furthermore plan to publish the findings of STOIC2021 in a peer reviewed article. The three best performing teams in the Final phase will be invited to collaborate in writing this article. Other teams invited to the Final phase may also be invited to help writing based on their performance and methodology.