Bacteria Model Fitting

Overview

This app illustrates how to fit a mechanistic dynamical model to data and how to use simulated data to evaluate if it is possible to fit a specific model. It is assumed that you have worked through the Basic Virus Model Fitting app.

Learning Objectives

The Model

Model Overview

The underlying model that is being fit to the data is a version of the Extended Bacteria Model used in the app of that name (and some of the others). See that app for a detailed description of the model.

Model Equations

While we use the underlying Extended Bacteria Model code, for the purpose of this app, we make a simplification by ignoring the adaptive response. In the code, this is implemented by setting all quantities related to the adpative response to 0. This leads to these simplified equations.

The model is implemented as a set of ordinary differential equations:

\[\begin{align} \dot B &= g B (1-\frac{B}{B_{max}}) - d_B B - k_I B I \\ \dot I &= r_I B (1 -\frac{I}{I_{max}} ) - d_I I \\ \end{align}\]

Model Diagram

The model is represented by this flow diagram, with the understanding that the adaptive response, A, and all its processes are not present:

Model Diagram

Model Diagram

Data

For this app, data from Streptococcus pneumoniae infection in mice is used. The data comes from this paper (Schirm et al. 2020). The original study reports are large number of measurements under different conditions. For our purpose, we only use the Strep pneumo bacteria load and the neutrophils in BAL following an untreated infection, which corresponds to panels a) and c) in Figure 4 of the original study. In the original study, those quantities are labeled as P and N. For consistency with our models, we use the labels B for the bacteria and I for the neutrophils.

Model Fitting

For this example, we have data on both (all) model variables. That is rare! We can thus fit the model to both the B and I variables at the same time. When we do that, we need to decide how to weight them. We usually can’t just add them, since they are different quantities measured with different assays, often also having different units. To prevent us adding “apples and oranges”, we need to standardize. The way we do it here is to divide by the maximum measured value for each variable. There are other ways, e.g. by dividing with the standard deviation/variance.

We are again fitting in a frequentist framework and minimize the sum of square errors between data and model. This gives us the following equation which we numerically try to minimize by finding the model parameter values which achieve the best match between model and data:

\[ SSR= \sum_t \left( \frac{Bm_t - Bd_t}{max(Bd_t)}\right) ^2 + \sum_t \left( \frac{Im_t - Id_t}{max(Id_t)}\right) ^2 \]

In our notation, \(Bm_t\) and \(Im_t\) are model-predicted results for bacteria and immune response (neutrophils), while \(Bd_t\) and \(Id_t\) are the data. As is customary, we do the fitting in log-transformed units.

Notes

In prior apps, we discussed the idea of overfitting, i.e. trying to estimate too many parameters given the available data. This is likely happening here. We allow all the parameters to be fit, and the model is fairly flexible. That means there are likely multiple combinations of parameter values that produce (almost) equally good fits (as measured by SSR). In practice, this is a problem one needs to decide how to deal with. One can fix some parameters, simplify the model, or get more data. For this teaching example, we just acknowledge that we might be trying to estimate too many parameters given the data, but we’ll go with it anyway 😁.

What To Do

The model is assumed to run in units of hours.

Task 1

Start with the default settings. Run the model for 100 iterations for all 3 solvers (1-3). At the end of each fit, take a look at both the visual agreement between model and data, and the reported SSR. You’ll probably want to have the plots with a y-axis on the log scale. Note that for this example, we have multiple measurements for each quantity for each time point. In our fitting approach, each such data point contributes to the SSR, so we are in some sense trying to fit average trajectories to the data. You should find that after the fitting, solver 2 produces the lowest SSR. However, the predicted model shows ups and downs in the bacteria variable that we don’t really see in the data. While we’ll find a better model later, it can sometimes happen that the model which fits best based on some statistical quantity, such as SSR, does not look good when doing e.g. visual checks. It’s up to the modeler to decide what to conclude from that. My general suggestion is to try and further improve/tweak the model.

Record

Task 2

Further tasks to come. For now, you are free to explore on your own.

Record

Further Information

For this app, the underlying function running the simulation is called simulate_bacteria_fit. That function repeatedly calls simulate_extendedbacteria_ode.

This app (and all others) are structured such that the Shiny part (the graphical interface you are using) calls one or several underlying R functions which run the simulation for the model of interest and return the results. You can call them directly, without going through the shiny app. Use the help() command for more information on how to use the functions directly. If you go that route, you need to use the results returned from this function and produce useful output (such as a plot) yourself.

You can also download all simulator functions and modify them for your own purposes. Of course to modify these functions, you’ll need to do some coding.

For examples on using the simulators directly and how to modify them, read the package vignette by typing vignette('DSAIRM') into the R console.

The data for this app comes from (Schirm et al. 2020). In that paper, the authors fit a much more complex model to a richer dataset. If you want to learn more about Streptococcus pneumoniae infections in mice and how to model them, or are just generally curious to see a more complex model being fit to data, you should check out that paper.

References

Schirm, Sibylle, Peter Ahnert, Sarah Berger, Geraldine Nouailles, Sandra-Maria Wienhold, Holger Müller-Redetzky, Norbert Suttorp, Markus Loeffler, Martin Witzenrath, and Markus Scholz. 2020. “A Biomathematical Model of Immune Response and Barrier Function in Mice with Pneumococcal Lung Infection.” PLOS ONE 15 (12): e0243147. https://doi.org/10.1371/journal.pone.0243147.