Note: This is an automated translation (using DeepL) of the original German article.
One of the biggest challenges facing science and research in the COVID-19 pandemic is the lack of reliable and accessible data. The reasons for this lack are many and make it difficult to arrive at firm findings and reliable conclusions - which are necessary for decision makers.
To meet this challenge, DEXHELPP, together with dwh GmbH and TU Wien, created a synthetic dataset for scientific research. The work and methods used to develop this have now been accepted and published as a paper by the prestigious Data Science Journal.
The paper describes how our agent-based model was used to create a dataset that represents the statistical population at the individual level. Historical real data were used as a basis for this.
Especially two aspects which could be achieved in this way are worth noting:
This is because, although the result resembles this real data in form and statistical expression, it only represents statistical representatives of the population and not real individuals. This ensures that the scientific community has practical data, but not personal information.
The generated dataset is richer in some respects that are not measurable in real life at all. For example, the data contain information about when and by whom a person was infected. They also contain persons whose infection was in reality not detected at all (asymptomatic cases). This offers the possibility for further research, and for other researchers, to provide additional data, which cannot be recorded as reality at all.
N. Popper, M. Zechmeister, D. Brunmeir, C. Rippinger, N. Weibrecht, C. Urach, M. Bicher, G. Schneckenreither, A. Rauber, “Synthetic Reproduction and Augmentation of COVID-19 Case Reporting Data by Agent-Based Simulation”. Data Science Journal, 20(1), 16, 2021. doi: 10.5334/dsj-2021-016