Declining exponentials

Author

Andrew Gelman

Published

2022-08-15

Modified

2026-04-07

This notebook includes the code for the Bayesian Workflow book Section 12.4 Using modeling ideas to address computing problems.

1 Introduction

  • Declining exponential: A nonlinear model that comes up in many science and engineering applications, also illustrating the use of bounds in parameter estimation.

  • Sum of declining exponentials: An important class of nonlinear models that can be surprisingly difficult to estimate from data. This example illustrates how Stan can fail, and also the way in which default prior information can add stability to an otherwise intractable problem.

Load packages and set options

library(rprojroot)
root <- has_file(".Bayesian-Workflow-root")$make_fix_file()
library(cmdstanr)
options(mc.cores = 4)
options(digits = 2)
## options(htmltools.dir.version = FALSE)
set.seed(1123)
# utility functions
fround <- function(x, digits) {
  format(round(x, digits), nsmall = digits)
}
print_stan_file <- function(file) {
  code <- readLines(file)
  if (isTRUE(getOption("knitr.in.progress")) &
        identical(knitr::opts_current$get("results"), "asis")) {
    # In render: emit as-is so Pandoc/Quarto does syntax highlighting
    block <- paste0("```stan", "\n", paste(code, collapse = "\n"), "\n", "```")
    knitr::asis_output(block)
  } else {
    writeLines(code)
  }
}

2 Declining exponential

Let’s fit the following simple model: y = ae^{-bx} + \mathrm{error}, given data (x,y)_i: y_i = ae^{-bx_i} + \epsilon_i, \text{ for } i=1,\dots,N, We shall assume the errors are independent and normally distributed: \epsilon_i \sim \operatorname{normal}(0,\sigma).

Here is the model in Stan:

print_stan_file(root("declining_exponentials",  "exponential.stan"))
data {
  int N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real a;
  real b;
  real<lower=0> sigma;
}
model {
  y ~ normal(a * exp(-b * x), sigma);
  a ~ normal(0, 10);
  b ~ normal(0, 10);
  sigma ~ normal(0, 10);
}

We have given the parameters a, and b, and \sigma normal prior distributions centered at 0 with standard deviation 10. In addition, the parameter \sigma is constrained to be positive. The purpose of the prior distributions is to keep the computations at a reasonable value. If we were working on a problem in which we thought that a, b, or \sigma could be much greater than 10, we would want to use a weaker prior distribution.

Another point about the above Stan program: the model for y is vectorized and could instead have been written more explicitly as a loop:

for (i in 1:N) {
  y[i] ~ normal(a*exp(-b*x[i]), sigma);
}

We prefer the vectorized version as it is more compact and it also runs faster in Stan, for reasons discussed elsewhere in this book.

2.1 Simulated data

To demonstrate our exponential model, we fit it to fake data. We’ll simulate N=100 data points with predictors x uniformly distributed between 0 and 10, from the above model with a=0.2, b=0.3, \sigma=0.5.

a <- 5
b <- 0.3
sigma <- 0.5
N <- 100
x <- runif(N, 0, 10)
y <- rnorm(N, a * exp(-b * x), sigma)

Here is a graph of the true curve and the simulated data:

curve(a * exp(-b * x), from = 0, to = 10, ylim = range(x, y), 
      xlab = "x", ylab = "y", bty = "l", 
      main = "Data and true model", cex.main = 1)
points(x, y, pch = 20, cex = 0.2)
Figure 1

And then we fit the model:

data_1 <- list(N = N, x = x, y = y)
mod_1 <- cmdstan_model(root("declining_exponentials", "exponential.stan"))
fit_1 <- mod_1$sample(data = data_1)

Posterior summary:

print(fit_1)
 variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
    lp__  20.60  20.91 1.25 0.99 18.25 21.92 1.00     1690     2189
    a      5.15   5.15 0.19 0.19  4.85  5.48 1.00     1487     1445
    b      0.30   0.30 0.02 0.02  0.28  0.33 1.00     1670     1575
    sigma  0.49   0.49 0.04 0.04  0.44  0.55 1.00     2407     2550

Recall that the true parameter values were a=5.0, b=0.3, \sigma=0.5. Here the model is simple enough and the data are clean enough that we can estimate all three of these parameters with reasonable precision from the data, as can be seen from the 95% intervals above.

Alternatively, we might want to say ahead of time that we are fitting a declining exponential curve that starts positive and descends to zero. We would thus want to constrain the parameters a and b to be positive, which we can do in the parameters block:

  real<lower=0> a;
  real<lower=0> b;

Otherwise we leave the model unchanged. In this case the results turn out to be very similar:

mod_1b <- cmdstan_model(root("declining_exponentials","exponential_positive.stan"))
fit_1b <- mod_1b$sample(data = data_1)

Posterior summary:

print(fit_1b)
 variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
    lp__  21.08  21.41 1.24 0.99 18.75 22.39 1.00     1794     2181
    a      5.15   5.15 0.18 0.18  4.85  5.46 1.00     1849     1911
    b      0.30   0.30 0.02 0.01  0.28  0.33 1.00     1786     2002
    sigma  0.49   0.49 0.04 0.03  0.44  0.56 1.00     2628     2442

2.2 Small simulated data

With weaker data, though, the constraints could make a difference. We could experiment on this by doing the same simulation but with just N=10 data points:

N <- 10
x <- runif(N, 0, 10)
y <- rnorm(N, a * exp(-b * x), sigma)
curve(a * exp(-b * x), from = 0, to = 10, ylim = range(x, y),
      xlab = "x", ylab = "y", bty = "l", 
      main = "Just 10 data points", cex.main = 1)
points(x, y, pch = 20, cex = 0.2)
Figure 2

First we fit the unconstrained model:

data_2 <- list(N = N, x = x, y = y)
fit_2 <- mod_1$sample(data = data_2)

Posterior summary:

print(fit_2)
 variable  mean median   sd  mad    q5  q95 rhat ess_bulk ess_tail
    lp__  -0.80  -0.04 2.41 1.32 -5.73 1.24 1.02      241       92
    a      6.00   5.94 2.20 1.06  4.21 8.49 1.02     1122      426
    b      0.53   0.39 0.94 0.08  0.27 0.79 1.02      311       90
    sigma  0.70   0.62 0.32 0.17  0.41 1.31 1.01      213      106

Then we fit the constrained model:

fit_2b <- mod_1b$sample(data = data_2)

Posterior summary:

print(fit_2b)
 variable mean median   sd  mad    q5  q95 rhat ess_bulk ess_tail
    lp__  0.16   0.71 1.93 1.33 -3.88 2.06 1.00      661      326
    a     6.13   5.95 1.85 1.01  4.20 8.21 1.00     1075      862
    b     0.48   0.39 0.78 0.08  0.27 0.67 1.00      890      408
    sigma 0.68   0.61 0.27 0.18  0.40 1.16 1.00      730      518

Different things can happen with different sets of simulated data, but the inference with the positivity constraints will typically be much more stable. Of course, we would only want to constrain the model in this way if we knew that the positivity restriction is appropriate.

2.3 Positive simulated data

Now suppose that the data are also restricted to be positive. Then we need a different error distribution, as the above model with additive normal errors can yield negative data.

Let’s try a multiplicative error, with a lognormal distribution: y_i = ae^{-bx_i} * \epsilon_i, \text{ for } i=1,\dots,N\\ \log\epsilon_i \sim \operatorname{normal}(0,\log\sigma), \text{ for } i=1,\dots,N

Here is the model in Stan:

print_stan_file(root("declining_exponentials", "exponential_positive_lognormal.stan"))
data {
  int N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real<lower=0> a;
  real<lower=0> b;
  real<lower=0> sigma;
}
model {
  vector[N] y_pred = a * exp(-b * x);
  y ~ lognormal(log(y_pred), sigma);
  a ~ normal(0, 10);
  b ~ normal(0, 10);
  sigma ~ normal(0, 10);
}

As before, we can simulate fake data from this model:

a <- 5
b <- 0.3
sigma <- 0.5
N <- 100
x <- runif(N, 0, 10)
epsilon <- exp(rnorm(N, 0, sigma))
y <- a * exp(-b * x) * epsilon
curve(a * exp(-b * x), from = 0, to = 10, ylim = range(x, y), 
      xlab = "x", ylab = "y", bty = "l", 
      main = "Data and true model", cex.main = 1)
points(x, y, pch = 20, cex = 0.2)
Figure 3

We can then fit the model to the simulated data and check that the parameters are approximately recovered:

data_3 <- list(N = N, x = x, y = y)
mod_3 <- cmdstan_model(root("declining_exponentials", "exponential_positive_lognormal.stan"))
fit_3 <- mod_3$sample(data = data_3)

Posterior summary:

print(fit_3)
 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
    lp__  -71.40 -71.08 1.22 1.01 -73.83 -70.07 1.00     1504     2068
    a       4.95   4.93 0.49 0.47   4.20   5.80 1.00     1393     1680
    b       0.30   0.30 0.02 0.02   0.27   0.33 1.00     1341     1654
    sigma   0.50   0.49 0.04 0.03   0.44   0.56 1.00     2112     2232

3 Sum of declining exponentials

From the numerical analysis literature, here is an example of an inference problem that appears simple but can be surprisingly difficult. The challenge is to estimate the parameters of a sum of declining exponentials: y = a_1e^{-b_1x} + a_2e^{-b_2x}. This is also called an inverse problem, and it can be challenging to decompose these two declining functions.

This expression, and others like it, arise in many examples, including in pharmacology, where x represents time and y could be the concentration of a drug in the blood of someone who was given a specified dose at time 0. In a simple two-compartment model, the total concentration will look like a sum of declining exponentials.

To set this up as a statistics problem, we add some noise to the system. We want the data to always be positive so our noise will be multiplicative: y_i = (a_1e^{-b_1x_i} + a_2e^{-b_2x_i}) * \epsilon_i, \text{ for } i=1,\dots,N, with lognormally-distributed errors \epsilon.

Here is the model in Stan:

print_stan_file(root("declining_exponentials", "sum_of_exponentials.stan"))
data {
  int N;
  vector[N] x;
  vector[N] y;
}
parameters {
  vector<lower=0>[2] a;
  positive_ordered[2] b;
  real<lower=0> sigma;
}
model {
  vector[N] y_pred = a[1] * exp(-b[1] * x) + a[2] * exp(-b[2] * x);
  y ~ lognormal(log(y_pred), sigma);
}

The coefficients a and the residual standard deviation \sigma are constrained to be positive. The parameters b are also positive—these are supposed to be declining, not increasing, exponentials—and are also constrained to be ordered, so that b_1<b_2. We need this to keep the model identified: Without some sort of restriction, there would be no way from the data to tell which component is labeled 1 and which is 2. So we arbitrarily label the component with lower value of b—that is, the one that declines more slowly—as the first one, and the component with higher value of b to be the second. We added the positive_ordered type to Stan because this sort of identification problem comes up fairly often in applications.

We’ll try out our Stan model by simulating fake data from a model where the two curves should be cleanly distinguished, setting b_1=0.1 and b_2=2.0, a factor of 20 apart in scale. We’ll simulate 1000 data points where the predictors x are uniformly spaced from 0 to 10, and, somewhat arbitrarily, set a_1=1.0, a_2=0.8, and \sigma=0.2. We can then simulate from the lognormal distribution to generate the data y.

a <- c(1, 0.8)
b <- c(0.1, 2)
N <- 1000
x <- seq(0, 10, length = N)
sigma <- 0.2
epsilon <- exp(rnorm(N, 0, sigma))
y <- (a[1] * exp(-b[1] * x) + a[2] * exp(-b[2] * x)) * epsilon

Here is a graph of the true curve and the simulated data:

data_graph <- function(a, b, sigma, x, y) {
  curve(a[1] * exp(-b[1] * x) + a[2] * exp(-b[2] * x), 
        from = min(x), to = max(x), 
        ylim = c(0, 1.05 * max(y)),  xlim = c(0, max(x)), 
        xlab = "x", ylab = "y", xaxs = "i", yaxs = "i", bty = "l", 
        main = "Data and true model", cex.main = 1)
  points(x, y, pch = 20, cex = 0.2)
  text(max(x), 0.5 * max(y), 
       paste("y = ", fround(a[1], 1), "*exp(", fround(-b[1], 1), "*x) + ",  
             fround(a[2], 1), "*exp(", fround(-b[2], 1), "*x)", sep = ""), adj = 1)
}
data_graph(a, b, sigma, x, y)
Figure 4

And then we fit the model:

data_4 <- list(N = N, x = x, y = y)
mod_4 <- cmdstan_model(root("declining_exponentials", "sum_of_exponentials.stan"))
fit_4 <- mod_4$sample(data = data_4)

Posterior summary:

print(fit_4)
 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
    lp__  203.09 203.38 1.55 1.40 200.18 205.02 1.00     1624     2626
    a[1]    0.99   0.99 0.02 0.02   0.96   1.02 1.00     1882     1911
    a[2]    0.86   0.86 0.09 0.09   0.72   1.01 1.00     2657     2382
    b[1]    0.10   0.10 0.00 0.00   0.09   0.10 1.00     1915     2000
    b[2]    2.25   2.23 0.32 0.31   1.76   2.82 1.00     1951     2151
    sigma   0.20   0.20 0.00 0.00   0.19   0.20 1.00     2858     2492

The parameters are recovered well, with the only difficulty being b_2, where the estimate is 1.89 but the true value is 2.0—but that is well within the posterior uncertainty. Stan worked just fine on this nonlinear model.

But now let’s make the problem just slightly more difficult. Instead of setting the two parameters b to 0.1 and 2.0, we’ll make them 0.1 and 0.2, so now only a factor of 2 separates the scales of the two declining exponentials.

b <- c(0.1, 0.2)
y_2 <- (a[1] * exp(-b[1] * x) + a[2] * exp(-b[2] * x)) * epsilon
data_5 <- list(N = N, x = x, y = y_2)

This should still be easy to fit in Stan, right? Wrong:

fit_5 <- mod_4$sample(data = data_5)

Posterior summary:

print(fit_5)
 variable     mean   median   sd      mad       q5      q95 rhat ess_bulk
    lp__   9.1e+02  9.1e+02 1.78  1.6e+00  9.1e+02  9.1e+02 1.05       88
    a[1]   1.7e+00  1.7e+00 0.02  2.0e-02  1.7e+00  1.8e+00 1.01      284
    a[2]   5.7e-01  4.7e-01 0.41  4.0e-01  1.0e-01  1.4e+00 1.27       11
    b[1]   1.3e-01  1.3e-01 0.00  0.0e+00  1.3e-01  1.3e-01 1.01      314
    b[2]  8.9e+307 8.7e+307  Inf 6.5e+307 8.1e+306 1.7e+308 1.01      348
    sigma  2.0e-01  2.0e-01 0.00  0.0e+00  1.9e-01  2.0e-01 1.01      310
 ess_tail
      627
      598
       72
      706
      585
      449

What happened?? It turns out that these two declining exponentials are very hard to detect. Look: here’s a graph of the two-component model for the expected data, y=1.0e^{-0.1x}+0.8e^{-0.2x}:

curve(a[1] * exp(-b[1] * x) + a[2] * exp(-b[2] * x), 
      from = 0, to = 10, ylim = c(0, 1.9), xlim = c(0, 10), 
      xlab = "x", ylab = "y", xaxs = "i", yaxs = "i", bty = "l", 
      main = "True model", cex.main = 1)
text(5.6, 0.7, "y = 1.0*exp(-0.1x) + 0.8*exp(-0.2x)", adj = 1)
Figure 5

And now we’ll overlay a graph of a particular one-component model, y=1.8e^{-0.135x}:

curve(a[1] * exp(-b[1] * x) + a[2] * exp(-b[2] * x), from = 0, to = 10, 
      ylim = c(0, 1.9), xlim = c(0, 10), 
      xlab = "x", ylab = "y", xaxs = "i", yaxs = "i", bty = "l", 
      main = "True model and scarily-close one-component approximation", cex.main = 1)
curve(1.8 * exp(-0.135 * x), from = 0, to = 10, add = TRUE)
text(5.6, 0.7, "y = 1.0*exp(-0.1x) + 0.8*exp(-0.2x)", adj = 1)
text(6.1, 1.3, "y = 1.8*exp(-0.135x)", adj = 1)
Figure 6

The two lines are strikingly close, and it would be essentially impossible to tell them apart based on noisy data, even 1000 measurements. So Stan had trouble recovering the true parameters from the data.

Still, if the parameters are difficult to fit, this should just result in a high posterior uncertainty. Why did the Stan fit explode? The problem in this case is that, since only one term in the model was required to fit these data, the second term was completely free—and the parameter \beta_2 was unbounded: there was nothing stopping it from being estimated as arbitrarily large. This sort of unbounded posterior distribution is called improper (see Bayesian Data Analysis for a more formal definition), and there is no way of drawing simulations from such a distribution, hence Stan does not converge. The simulations drift off to infinity, as there is nothing in the prior or likelihood that is keeping them from doing so.

To fix the problem, we can add some prior information. Here we shall use our default, which is independent \operatorname{normal}(0,1) prior densities on all the parameters; thus, we add these lines to the model block in the Stan program:

  a ~ normal(0, 1);
  b ~ normal(0, 1);
  sigma ~ normal(0, 1);

For this particular example, all we really need is a prior on b (really, just b_2 because of the ordering), but to demonstrate the point we shall assign default priors to everything. The priors are in addition to the rest of the model; that is, they go on top of the positivity and ordering constraints. So, for example, the prior for \sigma is the positive half of a normal, which is sometimes written as \operatorname{normal}^+(0,1).

We now can fit this new model to our data, and the results are much more stable:

mod_5_with_priors <- cmdstan_model(root("declining_exponentials", "sum_of_exponentials_with_priors.stan"))
fit_5_with_priors <- mod_5_with_priors$sample(data = data_5)

Posterior summary:

print(fit_5_with_priors)
 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
    lp__  198.96 199.31 1.80 1.58 195.62 201.14 1.01      766      765
    a[1]    1.35   1.56 0.44 0.23   0.40   1.73 1.05      113      452
    a[2]    0.43   0.24 0.42 0.24   0.03   1.36 1.05      114      445
    b[1]    0.11   0.12 0.02 0.01   0.07   0.13 1.03      336      588
    b[2]    0.58   0.31 0.56 0.24   0.14   1.80 1.03      273      547
    sigma   0.20   0.20 0.00 0.00   0.19   0.20 1.00     1340     1381

The fit is far from perfect—compare to the true parameter values, a_1=1.0, a_2=0.8, b_1=0.1, b_2=0.2—but we have to expect that. As explained above, the data at hand do not identify the parameters, so all we can hope for in a posterior distribution is some summary of uncertainty.

The question then arises, what about those prior distributions? We can think about them in a couple different ways.

From one direction, we can think of scaling. We are using priors centered at 0 with a scale of 1; this can be reasonable if the parameters are on “unit scale,” meaning that we expect them to be of order of magnitude around 1. Not all statistical models are on unit scale. For example, in the above model, if the data y_i are on unit scale, but the data values x_i take on values in the millions, then we’d probably expect the parameters b_1 and b_2 to be roughly on the scale of 10^{-6}. In such a case, we’d want to rescale x so that the coefficients b are more interpretable. Similarly, if the values of y ranged in the millions, then the coefficients a would have to be of order 10^6, and, again, we would want to rescale the data or the model so that a would be on unit scale. By using unit scale priors, we are implicitly assuming the model has been scaled.

From the other direction, instead of adapting the model to the prior distribution, we could adapt the prior to the model. That would imply an understanding of a reasonable range of values for the parameters, based on the context of the problem. In any particular example this could be done by simulating parameter vectors from the prior distribution and graphing the corresponding curves of expected data, and seeing if these could plausibly cover the possible cases that might arise in the particular problem being studied.

No matter how it’s done, inference has to come from somewhere, and if the data are weak, you need to put in prior information if your goal is to make some statement about possible parameter values, and from there to make probabilistic predictions and decisions.


Licenses

  • Code © 2022–2025, Andrew Gelman, licensed under BSD-3.
  • Text © 2022–2025, Andrew Gelman, licensed under CC-BY-NC 4.0.