Basics - Modelling and Simulation Cookbook

In this chapter we discuss the basics of modelling and simulation and introduce the most important terminology.

Objectives:

What does one do in modelling and simulation?
What is a model and what is a simulation?
What are common terms in this research field and what do they mean?

import numpy as np
from matplotlib import pyplot as plt
import holoviews as hv
hv.extension('bokeh')

Modelling and Conceptual Model¶

Modeling is a scientific method for developing a simplified and targeted version of the real system, known as a model. Starting with a specific research problem, the modelling purpose, relevant aspects of a real-world situation must be identified and put together to develop an abstract replica of the situation (compare the three characteristics of a model: representation, reduction, pragmatism from Stachowiak (1973) (or summarised in this wordpress article). As depicted in the figure below (motivated from Dori & Mordecai (2025)) there are fundamentally different types of models for different purposes. In modelling and simulation (M&S), we limit ourselves to formal immaterial or abstract models, i.e., models that are developed using a specific formalism (graphical, algorithmic, mathematical, or as program code). In M&S, a major distinction is made between conceptual and implemented or computerised models. The former are described in writing, i.e., as analytical, descriptive, mathematical models or using a mixed approach, while the latter are developed in a computer environment and are executable. In a traditional M&S workflow, an implemented model is always based on one (or more) conceptual models (see Figure 1).

The term modelling process refers to the process of deriving a conceptual model from a given real-life situation. The corresponding modeller, i.e. the person who develops the model, usually follows one of several predefined workflows, known as modelling strategies or modelling approaches (e.g. system dynamics, event graphs, Lagrange formalism, etc.). The optimal selection and correct application of the strategy is crucial in order to obtain a model that adequately represents the real situation and fulfils the modelling purpose. To quote George Box: ‘All models are wrong, but some are useful’. Useful models are referred to as valid and can be used to gain insights into the real system – either directly through formal analysis or indirectly after developing a corresponding implemented model. Depending on the modelling strategy, the final result of the modelling process is a conceptual model of a specific model type (e.g. differential equation model, discrete event model, agent-based model, etc.).

Casuality, Parameters, Inputs and Outputs¶

In order to serve a purpose, models necessarily need to map certain inputs and parameters onto certain model states and outputs, as depicted in Figure 2.

Figure 2:Traditional visualisation of model with input, parameters and output.

The distinction between inputs/parameters and outputs/states arises naturally from the causality inherent in the model. This means that the model is designed in such a way that values for inputs and parameters can be freely selected and imply a specific behaviour for the model states and outputs. Mathematically speaking, this means that the model can be interpreted as a function that maps inputs and parameters to states and outputs.

It should be noted that the corresponding causality aims to mimic the known causality from the real system. This clearly distinguishes M&S from approaches such as machine learning or statistics, where the causality of the model is not defined manually but is (mainly) trained from data. Therefore, M&S is often referred to as causal or mechanistic modelling. Note that this does not prevent corresponding models to be completely wrong. The modellig process relies on our correct understanding of the real system and our abilities to correctly quantify them. That means, in situations where we system knowledge is incomplete, we will struggle developing a valid model. Likely a sole data-based approach would be preferable in these cases.

We further distinguish between parameters and inputs in terms of what the corresponding quantities describe. A parameter defines a system-intrinsic property that can be considered fixed once its value has been defined (at least within a certain range). An input, however, defines a property of the experimental setup of the modelling study and is usually varied when the model is applied. Both states and outputs are causally dependent on inputs and parameters, whereby in principle every state can also be regarded as a model output. In most cases, however, this is not necessary as it can lead to unnecessary overheads and difficulties in formal analysis. Ultimately, the choice of outputs depends on the purpose of the model.

Case Study: Free Fall - Conceptual Model¶

To illustrate the ideas, we give a quick and simple example from physics:

The initial value problem (IVP)

\begin{array}{c}\frac{dx}{dt}(t)=v(t)\\\frac{dv}{dt}(t)=-g,\end{array}\quad v(0)=v_0,x(0)=x_0,

(1)

can be regarded as a model for the position $x(t)$ [m] and velocity $v(t)$ [m] of a physical object in a gravitational field with acceleration $g$ [m/s^2]. The model has the analytic solution

x(t)=-\frac{gt^2}{2}+v_0t+x_0,\quad v(t)=-gt+v_0.

(2)

Input-Output Analysis¶

In this model, $g$ can be considered the only model parameter and the initial values $x_0,v_0$ can be considered model inputs. Since nothing can be examined for an indefinite period of time, the examined time frame $[0,t_{end}]$ [s] must also be an input. The model has the states $x(t),v(t)$ , but the selection of outputs depends on the problem. It can range from a single position at a specific point in time, e.g. $x(t_{end})$ , up to the two time-dependent states $x(t),v(t)$ themselves. In our case, we define $x(t)$ as the output of the model.

It is clear that the model aims to represent the actual causality of the system, in which the effect of time within the gravitational field causes the body to fall from a given initial position and initial velocity. In a purely data-driven approach, we would train e.g. a statistical model on a set of data points $(t_i,x_i)_{i=1}^M$ . However, the ansatz for the statistical model (i.e. which regression function, error metric, etc.) would be made based on the nature of the data (best fit) and not on observations from the real system.

Finally, note that the model’s output is not necessarily the variable of interest. Suppose we want to determine the time it takes for a body to fall to the ground from a predefined height, then we need to embed the model in a top-level process in such a way, that it deterimes the time-input $t_{end}$ for a given position-output $x(t_{end})=0$ , i.e input and output switch places. While it is easy to transform the analytical solution according to time in this case, this is no longer possible for more complex models. A problem in which a certain target output value is to be matched by varying the input can, depending on the exact specification, be referred to as a control- or optimisation problem, for example.

Model Implementation and Implemented Model¶

The development of the computerised model is often called model implementation (or model translation in Sargent (2013)). In this process, the modeller or a programmer with in-depth understanding of the modelling approach translates the conceptual model once more to develop an executable computer program that perfoms the defined input-outut mapping. This program is often referred to as the simulation or the simulation model.

Dependent on the maturity of the project in the simulation lifecycle, the program can vary in terms of the technology readiness level TRL ranging from a simple research-prototype to a fully portable dashboard for decision support. The more advanced the implemented model is in this context, the more the actual simulation recedes into the background and other aspects of model execution become more important. This includes, for example, the question of how the input-output mapping is embedded in the overall process (e.g. is the simulation evaluated only once or multiple times, is it integrated into an optimisation process, is it part of a digital twin, etc.). It also concerns the question of how the simulation results are used and evaluated (e.g. statistics, visualisations, animations, automatic reports, etc.).

Even the basic implementation of the simulation, i.e. the translation of the input-output relationship from the conceptual model, is far from trivial, as the programmer is confronted with many decisions. First of all, the programming language or programming environment must be decided upon. The spectrum ranges from basic programming languages (e.g. C, C++, Python, Java, etc.) to programming languages with pre-implemented functionalities or libraries for M&S (e.g. SimPY for Python, Repast Simphony for Java, Simulink/Simscape in MATLAB, etc.) to programming environments that are completely specified for modelling and simulation (e.g. AnyLogic, NetLogo, PowerDEVs, VisualPDE, etc.). The latter are called simulators or simulation environments, dependent on how well they also support the conceptual modelling process and the output processing. Furthermore, the programmer may need to make additional abstractions or simplifications in order to implement the model, which may lead to new technical parameters. Examples of this can be found primarily in the field of numerics, where, for example, continuous variables must be approximated by discrete ones. For solveing of differential equation models in a computer, for example, one needs to specify a numerical solution method and needs to specify its parameters, such as step-widths (see, example below).

The number of adjustments correlates strongly with the chosen modelling method. The more the conceptual description language already uses algorithms or algorithmic concepts, the less needs to be adapted; the more compact and mathematical the description language, the more numerical approximations are usually necessary. Extreme examples include, on the one hand, partial differential equation models, where the numerical solution almost surpasses the conceptual modelling in terms of complexity. On the other hand, discrete event models can usually be implemented directly as they are conceptualised without further adaptation. The process of checking whether an implemented model corresponds well with the conceptualised model is called verification.

Case Study: Free Fall - Implemented Model¶

An implemented model of the free-fall example may look as follows:

class FreeFallAnalytic:
    def __init__(self, g: float):
        self.g = g

    def x(self, t: float, x0: float, v0: float) -> float:
        return -self.g * t * t / 2 + v0 * t + x0

    def run(self, x0: float, v0: float, tend: float) -> (np.ndarray, np.ndarray):
        T = np.arange(0, tend, tend / 100)
        X = np.array([self.x(t, x0, v0) for t in T])
        return T, X

In FreeFallAnalytic we made use of the (rare) feature that the conceptual model has an analytic solution. Hence, the only necessary adaptation to the conceptual model was the use of a discrete time series $(t_i,x_i)$ instead of the continuous output $x(t)$ . This is legitimate because computational analysis of the output (in particular plotting) also relies on time-series. Moreover, the solution of the conceptual model is (up to machine precision) identical to the solution of the conceptual model for the given points of the time-series.

Assuming that the model cannot be solved analytically, we would have to perform a numerical approximation. The simplest method for this is the Explicit Euler method, which (without proof) utilises the approximation

y(t+h)\approx y(t)+h\frac{dy}{dt}(t)=x(t)+hf(t,y),

(3)

whereas $y$ refers to the vector of states and $f$ to the right hand side of the differential equation. Parameter $h$ hereby refers to the stepwidth of the numerical solver and must be regarded as an additional model parameter. A corresponding implemented model may look as follows:

class FreeFallNumeric:
    def __init__(self, g: float, h: float):
        self.g = g
        self.h = h

    def f(self, t: float, x: float, v: float) -> (float, float):
        return (-self.g,)

    def run(self, x0: float, v0: float, tend: float) -> (np.ndarray, np.ndarray):
        t = 0
        x = x0
        v = v0
        T = [t]
        X = [x]
        while t < tend:
            dx, dv = self.f(t, x, v)
            x = x + self.h * dx
            v = v + self.h * dv
            t = t + self.h
            T.append(t)
            X.append(x)
        return np.array(T), np.array(X)

Simulation Experiments¶

Experimenting with the computerised model on the computer is called simulation which can be used as verb (to simulate) as well as a nomen (perform a simulation). A single evaluation of the input-output mapping is called one simulation run. In the simulation process quantiative parameters and inputs are processed to quantitative outcomes.

The process of assigning parameter values to the parameters is called parametrisation or parameter identification. Note that we differentiate between the parameter and its value. The prior is, essentially, a variable or symbol for a quantity, the latter is an actual numeric or categorical value. It must be said that it is not always possible to determine all parameter values directly from data or other sources of information. Sometimes it is necessary to fit them as a minimisation problem using certain known input-output relationships. In M&S, this process is referred to as calibration. The collection of parameter values is usually called a parameter set and in most cases, the parameter set of a simulation model remains fixed as soon as it is defined (or fitted). However there are cases in which the use of multiple parameter sets makes sense. The most obvious reason for this is for the purpose of model analysis. In a so-called parameter variation, parameter values are deliberately varied in order to analyse their influence on the outputs of the simulation model. Transferring the model to another causally similar/equivalent system or changing the system boundaries (e.g. in terms of time, space, etc.) may be another reason. In this case, one parameter set would be valid for each system or system state and would allow simulation studies to be carried out there.

Specification of the input values is part of the experimental design process, which also includes specification of certain properties of the simulation execution, e.g. number of simulation runs, comparison metrics, interfaces, etc.. The resulting experimental configuration or setup uniquely defines a so called simulation scenario. These are then either meaningful in themselves or can be compared with other simulation scenarios to generate added value with regard to the research question. In so-called simulation-optimisation settings, inputs can also be systematically optimised with regard to target variables. Finally, similar to the parameters, a so-called input-variation can also be carried out for analysis purposes in order to better understand the behaviour of the model.

Case Study: Free Fall - Simulation Experiments¶

We quickly identify $g=9.81$ [m/s^2] as a reasonable value for the parameter $g$ on earth from other studies. To give an example for a different reasonable parameter set: value $g=26.0$ [m/s^2] would allow the model to be transferred to planet Jupiter, for example. Considering, that earth is neither homegeneously dense, perfectly round nor equally flat, it makes sense to perform a parameter study for $g$ in the range between 9.76, the value at high altitude close to the equator, and 9.83, the value at the poles.

gmin = 9.76
gbase = 9.81
gmax = 9.83
plots = list()
for x0 in [5, 10]:  # some x0 input values
    for dx0 in [6, 10]:  # some dx0 input values
        cm = plt.get_cmap("plasma")  # get some colormapping
        plots2 =list()
        for g in np.arange(gmin, gmax + 0.01, 0.01):
            fac = (g - gmin) / (gmax - gmin)
            col = cm(fac)  # get color
            mdl = FreeFallAnalytic(g)
            T, X = mdl.run(x0, dx0, 2)
            if (
                round(g, 2) == gmin or round(g, 2) == gbase or round(g, 2) == gmax
            ):  # round, becasue of numerical inaccuracy of the range
                plots2.append(hv.Curve((T,X),label=f"{g:.02f}").opts(color=col)) # only show labels for certain values of g
            else:
                plots2.append(hv.Curve((T,X)).opts(color=col)) # only show labels for certain values of g
        plots.append(hv.Overlay(plots2).opts(show_legend=True,title=f"{x0=},{dx0=}",xlabel='time [s]',ylabel="height [m]",height=400,max_width=600,responsive=True))
display(hv.Layout(plots).opts(shared_axes=False).cols(2)) # join all plots and show them
mdl1 = FreeFallAnalytic(gmin)
mdl2 = FreeFallAnalytic(gmax)
T1, X1 = mdl1.run(0, 10, 3)
T2, X2 = mdl2.run(0, 10, 3)
print(f"maximum height: {gmin}: {max(X1):.03f}, {gmax}: {max(X2):.03f}")

maximum height: 9.76: 5.123, 9.83: 5.086

The result show minor differences for different values of $g$ indicating that an object which was thrown to about 5 meters on the equator, would fly roughly 4 centimeter higher on the poles.

In a second study, we want to analyse the impact of uncertainty regarding the initial velocity. We preform an input variation for the initial velocity $dx_0$ between 3 and 5 [m/s].

g = 9.81
dx0min = 3
dx0max = 5
x0 = 5
mdl = FreeFallAnalytic(g)  # note that the model only needs to be created once
cm = plt.get_cmap("plasma")  # get some colormapping
plots = list()
for dx0 in np.arange(dx0min, dx0max + 0.1, 0.1):
    fac = (dx0 - dx0min) / (dx0max - dx0min)
    col = cm(fac)  # get color
    T, X = mdl.run(x0, dx0, 2)
    plots.append(hv.Curve((T,X),label=f"{dx0:.01f}").opts(color=col))
display(hv.Overlay(plots).opts(show_legend=True,legend_cols=3,legend_position="bottom_left",xlabel='time [s]',ylabel="height [m]",height=400,max_width=600,responsive=True))
T1, X1 = mdl1.run(x0, dx0min, 3)
T2, X2 = mdl2.run(x0, dx0max, 3)
print(f"maximum height: {gmin}: {max(X1):.03f}, {gmax}: {max(X2):.03f}")

maximum height: 9.76: 5.461, 9.83: 6.272

References¶

Stachowiak, H. (1973). Allgemeine Modelltheorie. Springer-Verlag.
Dori, D., & Mordecai, Y. (2025). Types of Models. In N. Hutchison (Ed.), The Guide to the Systems Engineering Body of Knowledge (SEBoK), v. 2.13. The Trustees of the Stevens Institute of Technology. https://sebokwiki.org/wiki/Types_of_Models
Sargent, R. G. (2013). Verification and validation of simulation models. Journal of Simulation, 7(1), 12–24. 10.1109/WSC.2010.5679166