Project 2 – Data & Methods

Short Description

Similar to Project 6 also Project 2 ‚Data and Methods‘ addressed the development of methods. Hereby methods for data treatment and preparation as well as techniques to improve outcome and validity of computer-simulations were focussed.

Completed Objectives

Core objectives of the project can be summarised by the following research questions:

  • How to validly determine outliers in highly complex health-care data-sets?
    In case a data-base is distorted by outliers, even the most innovative statistical methods are condemned to fail. These outliers distinguish themselves by significantly unusual values and may appear due to data- and measurement-errors. Anyway they vitiate statistics and have to be removed before statistical methods are applied. As health-care data is highly complex and inhomogeneous the determination of outliers is very difficult and required new technology and methods, which were developed in this project.

  • How can we standardise population and population data, respectively?
    This question strongly correlates with the reproducibility of data. As calculated research findings for decision support of dexhelpp must always be consistent, the calculations need to be done based on the same underlying virtual population. For this a respective standardization was developed, which allowed us to produce - and publish - our research results.

  • How can we determine, if a computer simulation model depicts the real-system?
    Regarding a Matchbox-car as a very simple model of a real car, it is very simple to determine the level of detail of the model. It is easy to find the existing (e.g. tires, windows,..) and the missing functionalities (e.g. engine,…) of the model. As a consequence its legit field of application (the area where the model behaves like the real system) can be determined easily as well. For dynamic computer simulation models for the prediction of diseases the situation is a lot more complex: Hereby, in the contrary to the car-example, the mechanics and causal relationships of the real system are not perfectly known.

  • How can we determine simulation-parameters if they are not directly measurable in the real-system?
    Imagine a model of an epidemic spread similar to the water-wave doubtless the speed of the wave is a very important parameter. In order to simulate the model the value of this parameter for a specific disease has to be determined, but unfortunately it can not be measured directly in the real system. Yet, other elements of the system like the length of the wave of disease can be measured. Based on these measurements the unknown parameter can be estimated somehow – by so called calibration.


To treat statistical research questions simulation-based methods like bootstrapping and certain sampling algorithms were applied on the health care datasets gained from Project 1. To deal with the simulation related problems, several methods like Virtual Overlay Multi-Agent System (VOMAS) as well as Genetic- und Great Deluge-algorithms were tested.

Achieved Results

Numerours scientific publications, bachelor, diploma and PhD theses were produced in the course of this project (see link). The established methods contribute to data preparation and data analysis of routine health-care data and increase the quality of simulations. In combination they allow for reproducible and stakeholder-oriented decision support.