The STOIC Dataset

The STOIC Dataset (see for a full description) contains Computed Tomography scans from 10,735 patients.  For this challenge, one CT scan from each patient has been selected, and the dataset has been divided randomly into a public training set (2,000 patients), a test set (~1,000 patients) and a private training set (7,000+ patients). The CT scans are stored as as mha files.


RT-PCR was positive for 6 448 subjects, corresponding to a disease prevalence of 60.0% during the study period. This includes subjects who had positive RT-PCR within the first week of presentation after a first negative test.

At 1- month follow-up, 964 patients had died (267 after intubation, 697 not intubated) and 611 were alive but had to be intubated at one point.

In total, 24% (1575/6448) of the COVID cases were severe (severity defined as either need for intubation at one point or death). The main goal of the STOIC2021 challenge is to predict from the CT scan who will develop severe disease. The secondary goal is to predict who had a positive RT-PCR.

Clinical Data

The following information is available as headers in the mha files:

  • Age category (<40 years/40-50 years/50-60 years/60-70 years/70-80 years/ >80 years)
  • Gender: male/female

The following two outputs are available for each subject in the data set and need to be predicted:

  • RT-PCR results : binary
  • Outcome at 1 month : severe or non-severe (Severity defined as death or need for intubation)

How to download the public training data

The public training set of 2000 CT scans is available on the Registry of Open Data on AWS. To download the public training set, please make sure that the latest version of the AWS CLI is installed on your system by following these instructions:

With the AWS CLI installed, you can download the public training set (no AWS account required) by running:

aws s3 cp s3://stoic2021-training/ /path/to/destination/ --recursive --no-sign-request

If you have any difficulties downloading the data, please ask a question in the forum.