Model building and expansion: Golf putting

Author

Andrew Gelman and Aki Vehtari

Published

2019-09-24

Modified

2026-06-15

This notebook includes the code for Bayesian Workflow book chapter 25 Model building and expansion: Golf putting.

1 Introduction

We demonstrate the basic workflow of Bayesian modeling using an example of a set of models fit to data on golf putting.

Load packages

library(rprojroot)
root <- has_file(".Bayesian-Workflow-root")$make_fix_file()
library(cmdstanr)
options(mc.cores = 4)
library(bayesplot)
library(loo)
library(posterior)
options(pillar.neg = FALSE,
        pillar.subtle = FALSE,
        pillar.sigfig = 2)
options(width = 90)
# utility functions
logit <- qlogis
invlogit <- plogis
fround <- function (x, digits) format(round(x, digits), nsmall = digits)
print_stan_file <- function(file) {
  code <- readLines(file)
  if (isTRUE(getOption("knitr.in.progress")) &
        identical(knitr::opts_current$get("results"), "asis")) {
    # In render: emit as-is so Pandoc/Quarto does syntax highlighting
    block <- paste0("```stan", "\n", paste(code, collapse = "\n"), "\n", "```")
    knitr::asis_output(block)
  } else {
    writeLines(code)
  }
}

2 Data

The following graph shows data from professional golfers on the proportion of successful putts as a function of distance from the hole (from Berry 1996). Unsurprisingly, the probability of making the shot declines as a function of distance:

golf <- read.table(root("golf", "data", "golf_data.txt"), 
                   header = TRUE, skip = 2)
x <- golf$x
y <- golf$y
n <- golf$n
J <- length(y)
r <- (1.68 / 2) / 12
R <- (4.25 / 2) / 12
se <- sqrt((y / n) * (1 - y / n) / n)

par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Data on putts in pro golf", 
     type = "n")
points(x, y / n, pch = 20, col = "blue")
segments(x, y / n + se, x, y / n - se, lwd = .5, col = "blue")
text(x + .4,
     y / n + se + .02,
     paste(y, "/", n, sep = ""),
     cex = .6,
     col = "gray40")

The error bars associated with each point j in the above graph are simple classical standard deviations, \sqrt{\hat{p}_j(1-\hat{p}_j)/n_j}, where \hat{p}_j=y_j/n_j is the success rate for putts taken at distance x_j.

3 Logistic regression

Can we model the probability of success in golf putting as a function of distance from the hole? Given usual statistical practice, the natural starting point would be logistic regression:

y_j\sim\mathrm{binomial}(n_j, \mathrm{logit}^{-1}(a + bx_j)), \text{ for } j=1,\dots, J. In Stan, this is:

print_stan_file(root("golf", "golf_logistic.stan"))

data {
  int J;
  vector[J] x;
  array[J] int n;
  array[J] int y;
}
parameters {
  real a;
  real b;
}
model {
  y ~ binomial_logit(n, a + b*x);
}

The code in the above model block is (implicitly) vectorized, so that it is mathematically equivalent to modeling each data point, y[i] ~ binomial_logit(n[i], a + b*x[i]). The vectorized code is more compact (no need to write a loop, or to include the subscripts) and faster (because of more efficient gradient evaluations).

We fit the model to the data:

golf_data <- list(x = x, y = y, n = n, J = J)
model_1 <- cmdstan_model(root("golf", "golf_logistic.stan"))
fit_1 <- model_1$sample(data = golf_data, refresh = 0)

And here is the result:

draws_1 <- fit_1$draws(format = "df")
a_sim <- draws_1$a
b_sim <- draws_1$b
a_hat <- mean(a_sim)
b_hat <- mean(b_sim)
n_sims <- nrow(draws_1)
print(fit_1)

 variable     mean   median   sd  mad       q5      q95 rhat ess_bulk ess_tail
     lp__ -3021.14 -3020.85 1.00 0.70 -3023.06 -3020.20 1.01     1388     2129
     a        2.23     2.23 0.06 0.06     2.14     2.33 1.01     1104     1107
     b       -0.26    -0.26 0.01 0.01    -0.27    -0.25 1.01     1161     1264

Going through the columns of the above table: Stan has computed the posterior means \pm standard deviations of a and b to be 2.23 \pm 0.06 and -0.26 \pm 0.01, respectively. The Monte Carlo standard error of the mean of each of these parameters is 0 (to two decimal places), indicating that the simulations have run long enough to estimate the posterior means precisely. The posterior quantiles give a sense of the uncertainty in the parameters, with 50% posterior intervals of [2.19, 2.27] and [-0.26, -0.25] for a and b, respectively. Finally, the values of \widehat{R} near 1 tell us that the simulations from Stan’s four simulated chains have mixed well.

The following graph shows the fit plotted along with the data:

par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Fitted logistic regression", 
     type = "n")
for (i in sample(n_sims, 10)) {
  curve(invlogit(a_sim[i] + b_sim[i]*x),
        from = 0, to = 1.1 * max(x), 
        wd = 0.5, add = TRUE, col = "green")
}
curve(invlogit(a_hat + b_hat * x), from = 0, to = 1.1 * max(x), add = TRUE)
points(x, y/n, pch = 20, col = "blue")
segments(x, y/n + se, x, y/n-se, lwd = .5, col = "blue")
text(11, .57, paste("Logistic regression,\n    a = ",
                      fround(a_hat, 2), ", b = ", fround(b_hat, 2), sep=""))

The black line shows the fit corresponding to the posterior median estimates of the parameters a and b; the green lines show 10 draws from the posterior distribution.

In this example, posterior uncertainties in the fits are small, and for simplicity we will just plot point estimates based on posterior median parameter estimates for the remaining models. Our focus here is on the sequence of models that we fit, not so much on uncertainty in particular model fits.

4 Modeling from first principles

As an alternative to logistic regression, we shall build a model from first principles and fit it to the data.

The graph below shows a simplified sketch of a golf shot. The dotted line represents the angle within which the ball of radius r must be hit so that it falls within the hole of radius R. This threshold angle is \sin^{-1}((R-r)/x). The graph, which is not to scale, is intended to illustrate the geometry of the ball needing to go into the hole.

par(mar = c(0, 0, 0, 0))
dist <- 2
r_plot <- r
R_plot <- R
plot(0, 0, xlim = c(-R_plot, dist + 3 * R_plot), ylim = c(-2 * R_plot, 2 * R_plot), 
     xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", bty = "n", 
     xlab = "", ylab = "", type = "n", asp = 1)
symbols(0, 0, circles=r_plot, inches=FALSE, add=TRUE)
symbols(dist, 0, circles = R_plot - r_plot, inches = FALSE, lty = 2, add = TRUE)
symbols(dist, 0, circles = R_plot, inches = FALSE, add = TRUE)
curve(0 * x, from = 0, to = dist, add = TRUE)
curve(((R_plot - r_plot) / dist) * x, from = 0, to = dist, lty = 2, add = TRUE)
curve(-((R_plot - r_plot) / dist) * x, from = 0, to = dist, lty = 2, add = TRUE)
text(0.5 * dist, -1.5 * R_plot, "x")
arrows(0.5 * dist + 0.05, -1.5 * R_plot, dist, -1.5 * R_plot, 2, length = .1)
arrows(0.5 * dist - 0.05, -1.5 * R_plot, 0, -1.5 * R_plot, 2, length = .1)
text(dist + 1.2 * R_plot, .5 * R_plot, "R")
arrows(dist + 1.2 * R_plot, .7 * R_plot, dist + 1.2 * R_plot, R_plot, length = .05)
arrows(dist + 1.2 * R_plot, .3 * R_plot, dist + 1.2 * R_plot, 0, length = .05)
text(0, r_plot / 2, "r")

The next step is to model human error. We assume that the golfer is attempting to hit the ball completely straight but that many small factors interfere with this goal, so that the actual angle follows a normal distribution centered at 0 with some standard deviation \sigma.

par(mar = c(3, 3, 0, 0), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(-4, 4), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", bty = "n", 
     xlab = "Angle of shot", ylab = "", type = "n")
axis(1, seq(-4,4), 
     c("", "", expression(-2*sigma), "", 
       0, "", expression(2*sigma),"", ""))
curve(dnorm(x) / dnorm(0), add = TRUE)

The probability the ball goes in the hole is then the probability that the angle is less than the threshold; that is, \mathrm{Pr}\left(|\mathrm{angle}| < \sin^{-1}((R-r)/x)\right) = 2\Phi\left(\frac{\sin^{-1}((R-r)/x)}{\sigma}\right) - 1, where \Phi is the cumulative normal distribution function. The only unknown parameter in this model is \sigma, the standard deviation of the distribution of shot angles. Stan (and, for that matter, R) computes trigonometry using angles in radians, so at the end of our calculations we will need to multiply by 180/\pi to convert to degrees, which are more interpretable by humans.

Our model then has two parts: \begin{align} y_j &\sim \mathrm{binomial}(n_j, p_j)\\ p_j &= 2\Phi\left(\frac{\sin^{-1}((R-r)/x_j)}{\sigma}\right) - 1 , \text{ for } j=1,\dots, J. \end{align} Here is a graph showing the curve for some potential values of the parameter \sigma.

par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", bty = "l", 
     xlab = "Distance from hole (feet)", ylab = "Probability of success", 
     main = expression(paste("Modeled Pr(success) for different values of ", sigma)), 
     type = "n")
sigma_degrees_plot <- c(0.5, 1, 2, 5, 20)
x_text <- c(15, 10, 6, 4, 2)
for (i in 1:length(sigma_degrees_plot)){
  sigma <- (pi / 180) * sigma_degrees_plot[i]
  x_grid <- seq(R - r, 1.1 * max(x), .01)
  p_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma) - 1
  lines(c(0, R - r, x_grid), c(1, 1, p_grid))
  text(x_text[i] + 0.7,
       2 * pnorm(asin((R - r) / x_text[i]) / sigma) - 1,
       bquote(sigma == .(sigma_degrees_plot[i]) * degree),
       adj = 0)
}

The highest curve on the graph corresponds to \sigma=0.5^\circ: if golfers could control the angles of their putts to an accuracy of approximately half a degree, they would have a very high probability of success, making over 80% of their ten-foot putts, over 50% of their fifteen-foot putts, and so on. At the other extreme, the lowest plotted curve corresponds to \sigma=20^\circ: if your putts could be off as high as 20 degrees, then you would be highly inaccurate, missing more than half of your two-foot putts. When fitting the model in Stan, the program moves around the space of \sigma, sampling from the posterior distribution.

We now write the Stan model in preparation to estimating \sigma:

print_stan_file(root("golf", "golf_angle_binomial.stan"))

data {
  int J;
  vector[J] x;
  array[J] int n;
  array[J] int y;
  real r, R;
}
transformed data {
  vector[J] threshold_angle = asin((R-r) ./ x);
}
parameters {
  real<lower=0> sigma;
}
model {
  vector[J] p;
  p = 2*Phi(threshold_angle / sigma) - 1;
  y ~ binomial(n, p);
}
generated quantities {
  real sigma_degrees = sigma * 180 / pi();
}

In the transformed data block above, the ./ in the calculation of p corresponds to componentwise division in this vectorized computation.

The data J,n,x,y have already been set up; we just need to define r and R (the golf ball and hole have diameters 1.68 and 4.25 inches, respectively), and run the Stan model:

r <- (1.68/2)/12
R <- (4.25/2)/12
golf_data <- c(golf_data, r = r, R = R)
model_2 <- cmdstan_model(root("golf", "golf_angle_binomial.stan"))
fit_2 <- model_2$sample(data = golf_data, refresh = 0)

Here is the result:

draws_2 <- fit_2$draws(format = "df")
sigma_sim <- draws_2$sigma
sigma_degrees_sim <- draws_2$sigma_degrees
sigma_hat <- mean(sigma_sim)
print(fit_2)

      variable     mean   median   sd  mad       q5      q95 rhat ess_bulk ess_tail
 lp__          -2926.76 -2926.48 0.72 0.29 -2928.22 -2926.27 1.00     1805     2107
 sigma             0.03     0.03 0.00 0.00     0.03     0.03 1.00     1208     1761
 sigma_degrees     1.53     1.53 0.02 0.02     1.49     1.57 1.00     1208     1761

The model has a single parameter, \sigma. From the output, we find that Stan has computed the posterior mean of \sigma to be 0.03. Multiplying this by 180/\pi, this comes to 1.53 degrees. The Monte Carlo standard error of the mean is 0 (to two decimal places), indicating that the simulations have run long enough to estimate the posterior mean precisely. The posterior standard deviation is calculated at 0.02 degrees, indicating that \sigma itself has been estimated with high precision, which makes sense given the large number of data points and the simplicity of the model. The precise posterior distribution of \sigma can also be seen from the narrow range of the posterior quantiles. Finally, \widehat{R} is near 1, telling us that the simulations from Stan’s four simulated chains have mixed well.

We next plot the data and the fitted model (here using the posterior median of \sigma but in this case the uncertainty is so narrow that any reasonable posterior summary would give essentially the same result), along with the logistic regression fitted earlier:

par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Two models fit to the golf putting data", type = "n")
segments(x, y / n + se, x, y / n - se, lwd = .5)
curve(invlogit(a_hat + b_hat*x), from = 0, to = 1.1 * max(x), add = TRUE)
x_grid <- seq(R - r, 1.1 * max(x), .01)
p_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma_hat) - 1
lines(c(0, R - r, x_grid), c(1, 1, p_grid), col = "blue")
points(x, y / n, pch = 20, col = "blue")
text(10.3, .58, "Logistic regression")
text(18.5, .24, "Geometry-based model", col="blue")

The custom nonlinear model fits the data much better. This is not to say that the model is perfect—any experience of golf will reveal that the angle is not the only factor determining whether the ball goes in the hole—but it seems like a useful start.

5 Testing the fitted model on new data

Several years after fitting the above model, we were presented with a newer and more comprehensive dataset on professional golf putting (Broadie 2018). For simplicity we’ll just look here at the summary data, probabilities of the ball going into the hole for shots up to 75 feet from the hole. The graph below shows these new data (in red), along with our earlier dataset (in blue) and the already-fit geometry-based model from before, extending to the range of the new data.

golf_new <- read.table(root("golf", "data", "golf_data_new.txt"), 
                       header = TRUE, skip = 2)

par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Checking already-fit model to new data")
x_grid <- seq(R - r, 1.1 * max(golf_new$x), .01)
p_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma_hat) - 1
lines(c(0, R - r, x_grid), c(1, 1, p_grid), col = "blue")
points(golf$x, golf$y / golf$n, pch = 20, col = "blue")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")
legend(60, 0.4, legend = c("Old data", "New data"), col = c("blue", "red"), pch = 20)

Comparing the two datasets in the range 0-20 feet, the success rate is similar for longer putts but is much higher than before for the short putts. This could be a measurement issue, if the distances to the hole are only approximate for the old data, and it could also be that golfers are better than they used to be.

Beyond 20 feet, the empirical success rates become lower than would be predicted by the old model. These are much more difficult attempts, even after accounting for the increased angular precision required as distance goes up. In addition, the new data look smoother, which perhaps is a reflection of more comprehensive data collection.

6 A new model accounting for how hard the ball is hit

To get the ball in the hole, the angle isn’t the only thing you need to control; you also need to hit the ball just hard enough.

Broadie (2018) added this feature to the geometric model by introducing another parameter corresponding to the golfer’s control over distance. Supposing u is the distance that golfer’s shot would travel if there were no hole, the assumption is that the putt will go in if (a) the angle allows the ball to go over the hole, and (b) u is in the range (x,x+3). That is the ball must be hit hard enough to reach the whole but not go too far. Factor (a) is what we have considered earlier; we must now add factor (b).

The following sketch, which is not to scale, illustrates the need for the distance as angle as well as the angle of the shot to be in some range, in this case the gray zone which represents the trajectories for which the ball would reach the hole and stay in it.

par(mar = c(0, 0, 0, 0))
dist <- 2
r_plot <- r
R_plot <- R
distance_tolerance <- 0.6
plot(0, 0, 
     xlim = c(-R_plot, dist + 3 * R_plot + 1.5 * distance_tolerance), 
     ylim = c(-2 * R_plot, 2 * R_plot), 
     xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", bty = "n", 
     xlab = "", ylab = "", type = "n", asp = 1)
polygon(
  c(dist, dist, dist + distance_tolerance, dist + distance_tolerance),
  c(
    R_plot - r_plot,
    -(R_plot - r_plot),
    -(R_plot - r_plot) * (dist + distance_tolerance) / dist,
    (R_plot - r_plot) * (dist + distance_tolerance) / dist
  ),
  border = NA,
  col = "gray"
)
symbols(0, 0, circles = r_plot, inches = FALSE, add = TRUE)
symbols(dist, 0, circles = R_plot, inches = FALSE, add = TRUE)
symbols(dist, 0, circles = R_plot - r_plot, inches = FALSE, 
        lty = 2, bg = "gray", add = TRUE)
curve(((R_plot - r_plot) / dist) * x, from = 0, to = dist + 1.5 * distance_tolerance, 
      lty = 2, add = TRUE)
curve(-((R_plot - r_plot) / dist) * x, from = 0, to = dist + 1.5 * distance_tolerance, 
      lty = 2, add = TRUE)
text(0.5 * dist, -1.5 * R_plot, "x")
arrows(0.5 * dist + 0.05, -1.5 * R_plot, dist, -1.5 * R_plot, 2, length = .1)
arrows(0.5 * dist - 0.05, -1.5 * R_plot, 0, -1.5 * R_plot, 2, length = .1)

Suppose that a golfer will aim to hit the ball one foot past the hole but with a multiplicative error in the shot’s potential distance, so that u=(x+1)(1+\epsilon), where the error \epsilon has a normal distribution with mean 0 and standard deviation \sigma_{\rm distance}. In statistics notation, this model is, u\sim\normal(x+1,(x+1)\sigma_{\rm distance}), and the distance is acceptable if u\in [x,x+3], an event that has probability \Phi\left(\frac{2}{(x+1)\sigma_{\rm distance}}\right)-\Phi\left(\frac{-1}{(x+1)\sigma_{\rm distance}}\right).

Putting these together, the probability a shot goes in becomes, \left(2\Phi\left(\frac{\sin^{-1}((R-r)/x)}{\sigma_{\rm angle}}\right) - 1\right)\left(\Phi\left(\frac{2}{(x+1)\,\sigma_{\rm distance}}\right) - \Phi\left(\frac{-1}{(x+1)\,\sigma_{\rm distance}}\right)\right), where we have renamed the parameter \sigma from our earlier model to \sigma_{\rm angle} to distinguish it from the new \sigma_{\rm distance} parameter. We write the new model in Stan, giving it the name golf_angle_distance_binomial.stan to convey that it accounts both for angle and distance:

print_stan_file(root("golf", "golf_angle_distance_binomial.stan"))

data {
  int J;
  vector[J] x;
  array[J] int n;
  array[J] int y;
  real r, R;
  real overshot, distance_tolerance;
}
transformed data {
  vector[J] threshold_angle = asin((R-r) ./ x);
}
parameters {
  real<lower=0> sigma_angle;
  real<lower=0> sigma_distance;
}
model {
  [sigma_angle, sigma_distance] ~ normal(0, 1);
  vector[J] p_angle = 2*Phi(threshold_angle / sigma_angle) - 1;
  vector[J] p_distance =
    Phi((distance_tolerance - overshot) ./ ((x + overshot)*sigma_distance)) -
    Phi(-overshot ./ ((x + overshot)*sigma_distance));
  vector[J] p = p_angle .* p_distance;
  y ~ binomial(n, p);
}
generated quantities {
  real sigma_degrees = sigma_angle * 180 / pi();
}

The result is a model with two parameters, \sigma_{\rm angle} and \sigma_{\rm distance}. Even this improved geometry-based model is a gross oversimplification of putting, and the average distances in the binned data are not the exact distances for each shot. But it should be an advance on the earlier one-parameter model; the next step is to see how it fits the data.

Here we have defined overshot and distance_tolerance as data, which Broadie (2018) has specified as 1 and 3 feet, respectively. We might wonder why if the distance range is 3 feet, the overshot is not 1.5 feet. One reason could be that it is riskier to hit the ball too hard than too soft. In addition we assigned weakly informative half-normal(0,1) priors on the scale parameters, \sigma_{\rm angle} and \sigma_{\rm distance}, which are required in this case to keep the computations stable.

We fit the model to the new dataset.

overshot <- 1
distance_tolerance <- 3
golf_new_data <- list(
  x = golf_new$x,
  y = golf_new$y,
  n = golf_new$n,
  J = nrow(golf_new),
  r = r,
  R = R,
  overshot = overshot,
  distance_tolerance = distance_tolerance
)
model_3 <- cmdstan_model(root("golf", "golf_angle_distance_binomial.stan"))
fit_3 <- model_3$sample(data = golf_new_data, refresh = 0)

Here is the result:

print(fit_3)

       variable       mean     median        sd       mad         q5        q95 rhat
 lp__           -484918.76 -493939.46 102368.77 140344.56 -588600.95 -365925.95 4.13
 sigma_angle          0.03       0.04      0.01      0.02       0.02       0.05 3.48
 sigma_distance       0.09       0.09      0.00      0.00       0.09       0.09 2.60
 sigma_degrees        2.00       2.10      0.75      0.95       0.98       2.75 3.48
 ess_bulk ess_tail
        4       11
        4       11
        4       11
        4       11

There is poor convergence. Very high Rhat and very low ESSs indicate multimodality. We try initializing MCMC with Pathfinder

pth_3 <- model_3$pathfinder(
  data = golf_new_data,
  refresh = 0,
  num_paths = 20,
  max_lbfgs_iters = 100
)
fit_3 <- model_3$sample(
  data = golf_new_data, 
  refresh = 0, 
  init = pth_3
)

Here is the result:

print(fit_3)

       variable       mean     median   sd  mad         q5        q95 rhat ess_bulk
 lp__           -364909.91 -364909.60 0.99 0.70 -364911.91 -364908.98 1.00     1342
 sigma_angle          0.01       0.01 0.00 0.00       0.01       0.01 1.00     1003
 sigma_distance       0.14       0.14 0.00 0.00       0.14       0.14 1.00     1059
 sigma_degrees        0.76       0.76 0.00 0.00       0.76       0.77 1.00     1003
 ess_tail
     1728
      984
     1010
      984

Now the convergence looks fine. We graph the new data and the fitted model:

draws_3 <- fit_3$draws(format = "df")
sigma_angle_hat <- mean(draws_3$sigma_angle)
sigma_distance_hat <- mean(draws_3$sigma_distance)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Checking model fit", type = "n")
x_grid <- seq(R - r, 1.1 * max(golf_new$x), .01)
p_angle_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot) * sigma_distance_hat)) -
  pnorm(-overshot / ((x_grid + overshot) * sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y / golf_new$n, pch = 20, col = "red")

There are problems with the fit in the middle of the range of x. We suspect this is a problem with the binomial error model, as it tries harder to fit points where the counts are higher. Look at how closely the fitted curve hugs the data at the very lowest values of x.

Here are the first few rows of the data:

print(golf_new[1:5,])

     x      n      y
1 0.28  45198  45183
2 0.97 183020 182899
3 1.93 169503 168594
4 2.92 113094 108953
5 3.93  73855  64740

With such large values of n_j, the likelihood enforces an extremely close fit at these first few points, and that drives the entire fit of the model.

7 Expanding the model by including a fudge factor

To fix this problem we took the data model, y_j \sim \mathrm{binomial}(n_j, p_j), and added an independent error term to each observation. There is no easy way to add error directly to the binomial distribution—we could replace it with its overdispersed generalization, the beta-binomial, but this would not be appropriate here because the variance for each data point i would still be roughly proportional to the sample size n_j, and our whole point here is to get away from this assumption and allow for model misspecification—so instead we first approximate the binomial data distribution by a normal and then add independent variance; thus: y_j/n_j \sim \mathrm{normal}\left(p_j, \sqrt{p_j(1-p_j)/n_j + \sigma_y^2}\right), To write this in Stan there are some complications:

y and n are integer variables, which we convert to vectors so that we can multiply and divide them.
To perform componentwise multiplication or division using vectors, you need to use .* or ./ so that San knows not to try to perform vector/matrix multiplication and division. Stan is opposite from R in this way: Stan defaults to vector/matrix operations and has to be told otherwise, whereas R defaults to componentwise operations, and vector/matrix multiplication in R is indicated using the %*% operator.

We implement these via the following new code in the transformed data block:

  vector[J] raw_proportions = to_vector(y) ./ to_vector(n);

And then in the model block:

  raw_proportions ~ normal(p, sqrt(p .* (1-p) ./ to_vector(n) + sigma_y^2));

To complete the model we add \sigma_y to the parameters block and assign it a weakly informative half-normal(0,1) prior distribution. Here’s the new Stan program:

print_stan_file(root("golf", "golf_angle_distance_normal.stan"))

data {
  int J;
  array[J] int n;
  vector[J] x;
  array[J] int y;
  real r;
  real R;
  real overshot;
  real distance_tolerance;
}
transformed data {
  vector[J] threshold_angle = asin((R-r) ./ x);
  vector[J] raw_proportions = to_vector(y) ./ to_vector(n);
}
parameters {
  real<lower=0> sigma_angle;
  real<lower=0> sigma_distance;
  real<lower=0> sigma_y;
}
model {
  [sigma_angle, sigma_distance, sigma_y] ~ normal(0, 1);
  vector[J] p_angle = 2*Phi(threshold_angle / sigma_angle) - 1;
  vector[J] p_distance =
    Phi((distance_tolerance - overshot) ./ ((x + overshot)*sigma_distance)) -
    Phi(-overshot ./ ((x + overshot)*sigma_distance));
  vector[J] p = p_angle .* p_distance;
  raw_proportions ~ normal(p, sqrt(p .* (1-p) ./ to_vector(n) + sigma_y^2));
}
generated quantities {
  real sigma_degrees = sigma_angle * 180 / pi();
}

We now fit this model to the data:

model_4 <- cmdstan_model(root("golf", "golf_angle_distance_normal.stan"))
fit_4 <- model_4$sample(data = golf_new_data, refresh = 0)

Here is the result

draws_4 <- fit_4$draws(format = "df")
print(fit_4)

       variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
 lp__           146.56 146.88 1.28 1.07 144.07 147.95 1.00     1268     1836
 sigma_angle      0.02   0.02 0.00 0.00   0.02   0.02 1.00     1477     1748
 sigma_distance   0.08   0.08 0.00 0.00   0.08   0.08 1.00     1474     1502
 sigma_y          0.00   0.00 0.00 0.00   0.00   0.00 1.00     1814     1847
 sigma_degrees    1.02   1.02 0.01 0.01   1.01   1.03 1.00     1477     1748

The new parameter estimates are:

\sigma_{\rm angle} is estimated at 0.02, which when corresponds to \sigma_{\rm degrees}= 1.0. According to the fitted model, there is a standard deviation of 1.0 degree in the angles of putts taken by pro golfers. The estimate of \sigma_{\rm angle} has decreased compared to the earlier model that only had angular errors. This makes sense: now that distance errors have been included in the model, there is no need to explain so many of the missed shots using errors in angle.
\sigma_{\rm distance} is estimated at 0.08. According to the fitted model, there is a standard deviation of 8% in the errors of distance.
\sigma_y is estimated at 0.003. The fitted model fits the aggregate data (success rate as a function of distance) to an accuracy of 0.3 percentage points.

And now we graph:

draws_4 <- fit_4$draws(format = "df")
sigma_angle_hat <- mean(draws_4$sigma_angle)
sigma_distance_hat <- mean(draws_4$sigma_distance)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Checking model fit", type = "n")
x_grid <- seq(R - r, 1.1 * max(golf_new$x), .01)
p_angle_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot) * sigma_distance_hat)) -
  pnorm(-overshot / ((x_grid + overshot) * sigma_distance_hat))
lines(c(0, R - r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y / golf_new$n, pch = 20, col = "red")

We can go further and plot the residuals from this fit. First we augment the Stan model to compute residuals in the generated quantities block.

model_4_with_resids <- cmdstan_model(root("golf", "golf_angle_distance_normal_with_resids.stan"))
fit_4_with_resids <- model_4_with_resids$sample(data = golf_new_data, refresh = 0)

Then we compute the posterior means of the residuals, y_j/n_j - p_j, then plot these vs. distance:

posterior_mean_residual <- mean(as_draws_rvars(fit_4_with_resids$draws())$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)), 
     xaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "y/n - fitted E(y/n)", 
     main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)

The fit is good, and the residuals show no strong pattern, also they are low in absolute value—the model predicts the success rate to within half a percentage point at most distances, suggesting not that the model is perfect but that there are no clear problems given the current data.

The above model fit, but we were bothered by the normal approximation, not so much for these particular data but rather because it was a sort of admission of failure to not be able to directly use the binomial model.

8 Binomial with errors in logit scale

What we wanted to do was to keep the binomial model and then add the error on the logistic scale or something like that, something to keep the probabilities bounded between 0 and 1. We added an error term on the logistic scale with a scale parameter, sigma_eta, estimated from the data.

print_stan_file(root("golf", "golf_angle_distance_binomial_with_logit_errors.stan"))

data {
  int J;
  array[J] int n;
  vector[J] x;
  array[J] int y;
  real r;
  real R;
  real overshot;
  real distance_tolerance;
}
transformed data {
  vector[J] threshold_angle = asin((R-r) ./ x);
}
parameters {
  real<lower=0> sigma_angle;
  real<lower=0> sigma_distance;
  real<lower=0> sigma_eta;
  vector[J] eta;
}
model {
  eta ~ normal(0, 1);
  [sigma_angle, sigma_distance, sigma_eta] ~ normal(0, 1);
  vector[J] p_angle = 2*Phi(threshold_angle / sigma_angle) - 1;
  vector[J] p_distance =
    Phi((distance_tolerance - overshot) ./ ((x + overshot)*sigma_distance)) -
    Phi(-overshot ./ ((x + overshot)*sigma_distance));
  vector[J] p = inv_logit(logit(p_angle .* p_distance) + sigma_eta*eta);
  y ~ binomial(n, p);
}
generated quantities {
  real sigma_degrees = sigma_angle * 180 / pi();
}

We fit the model to the data:

model_5 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_logit_errors.stan"))
fit_5 <- model_5$sample(data = golf_new_data, refresh = 0)

Here is the result:

print(fit_5)

       variable       mean     median       sd       mad         q5        q95 rhat
 lp__           -453116.92 -433316.71 93740.14 103353.12 -581991.56 -363608.68 2.31
 sigma_angle          0.03       0.03     0.02      0.03       0.01       0.05 2.18
 sigma_distance       0.14       0.13     0.06      0.07       0.08       0.21 2.27
 sigma_eta            1.10       1.16     0.47      0.68       0.59       1.83 2.87
 eta[1]              -0.92      -0.87     1.82      2.19      -3.53       1.82 2.01
 eta[2]               2.33       1.85     0.84      0.71       1.37       3.78 2.09
 eta[3]               1.99       1.69     1.29      2.07       0.09       3.70 2.12
 eta[4]               1.59       1.64     0.25      0.25       1.30       2.00 1.31
 eta[5]               0.70       0.74     0.13      0.05       0.44       0.83 1.82
 eta[6]              -0.16       0.15     0.78      0.54      -1.47       0.52 2.71
 ess_bulk ess_tail
        5       11
        5       11
        5       11
        4       11
        5       12
        5       26
        5       26
       10      404
       14      421
        4       11

 # showing 10 of 36 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

Unfortunately, when we try to fit this new model to our data, the console fills up with warnings and the chains don’t mix. We try initializing with Pathfinder.

pth_5 <- model_5$pathfinder(
  data = golf_new_data,
  refresh = 0,
  num_paths = 20,
  max_lbfgs_iters = 100
)
fit_5 <- model_5$sample(
  data = golf_new_data,
  refresh = 0,
  init = pth_5
)

Here is the result:

print(fit_5)

       variable       mean     median   sd  mad         q5        q95 rhat ess_bulk
 lp__           -363615.03 -363614.71 5.66 5.52 -363624.73 -363606.17 1.00      779
 sigma_angle          0.01       0.01 0.00 0.00       0.01       0.01 1.01      626
 sigma_distance       0.20       0.20 0.01 0.01       0.19       0.21 1.00     1142
 sigma_eta            0.71       0.70 0.10 0.09       0.57       0.90 1.00      933
 eta[1]              -2.38      -2.36 0.90 0.92      -3.85      -0.88 1.00     1258
 eta[2]               2.99       2.98 0.54 0.54       2.10       3.90 1.00     1000
 eta[3]               3.06       3.05 0.45 0.44       2.32       3.82 1.00      921
 eta[4]               1.63       1.62 0.26 0.26       1.21       2.07 1.00      939
 eta[5]               0.61       0.61 0.14 0.15       0.38       0.85 1.01     1005
 eta[6]               0.14       0.14 0.12 0.11      -0.04       0.35 1.00      947
 ess_tail
     1370
      714
     1613
     1364
     1899
     1604
     1420
     1484
     1525
     1341

 # showing 10 of 36 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

The sampling is slower than for earlier models, but the convergence diagnostic look just fine. We graph the new data and the fitted model:

draws_5 <- fit_5$draws(format = "df")
sigma_angle_hat <- mean(draws_5$sigma_angle)
sigma_distance_hat <- mean(draws_5$sigma_distance)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
  pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")

There are now problems with the fit in the low range of x. Clearly the additive error in logit scale is not good.

9 Binomial with proportional errors

Instead of additive errors in logit scale, we next fit a three-parameter model that scales all the probabilities down from 1. Each observation has its own proportional error term. The key is to make each element of the multiplier vector (1- epsilon) positive and less than 1. This eliminates the problem with the boundary and the need for the logit. The prior distribution for epsilon keeps the errors under control.

print_stan_file(root("golf", "golf_angle_distance_binomial_with_proportional_errors.stan"))

data {
  int J;
  array[J] int n;
  vector[J] x;
  array[J] int y;
  real r;
  real R;
  real overshot;
  real distance_tolerance;
}
transformed data {
  vector[J] threshold_angle = asin((R-r) ./ x);
  vector[J] raw_proportion = to_vector(y) ./ to_vector(n);
}
parameters {
  real<lower=0> sigma_angle;
  real<lower=0> sigma_distance;
  real<lower=0> sigma_epsilon;
  vector<lower=0, upper=1>[J] epsilon;
}
transformed parameters {
  vector[J] p_angle = 2*Phi(threshold_angle / sigma_angle) - 1;
  vector[J] p_distance =
    Phi((distance_tolerance - overshot) ./ ((x + overshot)*sigma_distance)) -
    Phi(-overshot ./ ((x + overshot)*sigma_distance));
  vector[J] p = p_angle .* p_distance .* (1 - epsilon);
}
model {
  [sigma_angle, sigma_distance] ~ normal(0, 1);
  epsilon ~ exponential(1/sigma_epsilon);
  y ~ binomial(n, p);
}
generated quantities {
  real sigma_degrees = sigma_angle * 180 / pi();
  vector[J] residual = raw_proportion - p_angle .* p_distance;
}

We fit the model to the data:

model_6 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_proportional_errors.stan"))
fit_6 <- model_6$sample(data = golf_new_data, refresh = 0)

Here is the result:

print(fit_6)

       variable       mean     median   sd  mad         q5        q95 rhat ess_bulk
 lp__           -363658.73 -363658.40 5.33 5.40 -363668.11 -363650.67 1.01      729
 sigma_angle          0.02       0.02 0.00 0.00       0.02       0.02 1.00      943
 sigma_distance       0.08       0.08 0.00 0.00       0.08       0.08 1.01     1269
 sigma_epsilon        0.01       0.01 0.00 0.00       0.01       0.02 1.01      880
 epsilon[1]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     4098
 epsilon[2]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     4160
 epsilon[3]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     2244
 epsilon[4]           0.00       0.00 0.00 0.00       0.00       0.00 1.00      943
 epsilon[5]           0.00       0.00 0.00 0.00       0.00       0.00 1.00      968
 epsilon[6]           0.01       0.01 0.00 0.00       0.00       0.01 1.01      958
 ess_tail
     1574
      686
     1693
     1823
     2630
     2921
     2175
      982
      836
      900

 # showing 10 of 160 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

We graph the new data and the fitted model:

draws_6 <- fit_6$draws(format = "df")
sigma_angle_hat <- mean(draws_6$sigma_angle)
sigma_distance_hat <- mean(draws_6$sigma_distance)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
  pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid*p_distance_grid), col = "red")
points(golf_new$x, golf_new$y / golf_new$n, pch = 20, col = "red")

posterior_mean_residual <- mean(as_draws_rvars(draws_6)$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)), 
     xaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "y/n - fitted E(y/n)", 
     main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)

The model fit looks good. The residual plot shows that for short distances the model overestimates the probabilities. This pattern could be explained by sensitivity to distance_tolerance and overshot parameters that were fixed.

We change distance_tolerance to be a parameter. We assume Broadie (2018) did use his expertise to choose the value of 3 feet. We use log-normal prior with mean log(3) and standard deviation 0.2 to include that expert information but still have the prior to be relatively weak. We initialize MCMC using the draws from the previous model posterior expect for the new distance_tolerance parameter,

model_7 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_proportional_errors_2.stan"))
fit_7 <- model_7$sample(data = golf_new_data, refresh = 0, init = fit_6)

Here is the result:

print(fit_7)

       variable       mean     median   sd  mad         q5        q95 rhat ess_bulk
 lp__           -363659.60 -363659.29 4.69 4.61 -363668.01 -363652.48 1.00     1168
 sigma_angle          0.02       0.02 0.00 0.00       0.02       0.02 1.01      666
 sigma_distance       0.11       0.11 0.01 0.01       0.09       0.12 1.01      619
 sigma_epsilon        0.00       0.00 0.00 0.00       0.00       0.01 1.01      525
 epsilon[1]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     4735
 epsilon[2]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     4469
 epsilon[3]           0.00       0.00 0.00 0.00       0.00       0.00 1.01      696
 epsilon[4]           0.00       0.00 0.00 0.00       0.00       0.00 1.00      813
 epsilon[5]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     1721
 epsilon[6]           0.00       0.00 0.00 0.00       0.00       0.01 1.00     1317
 ess_tail
     2057
     1604
     1636
      724
     2310
     2620
     2229
     1414
     1939
     1611

 # showing 10 of 161 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

We graph the new data and the fitted model:

draws_7 <- fit_7$draws(format = "df")
sigma_angle_hat <- mean(draws_7$sigma_angle)
sigma_distance_hat <- mean(draws_7$sigma_distance)
distance_tolerance <- mean(draws_7$distance_tolerance)
overshot <- 1
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
  pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")

posterior_mean_residual <- mean(as_draws_rvars(draws_7)$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)), 
     xaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "y/n - fitted E(y/n)", 
     main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)

The residuals are now smaller with standard deviation halved, and there is no obvious pattern. Looking at the posterior of distance_tolerance most of the posterior mass is above 3 and the posterior is much narrower than the prior and thus likelihood is informative about it.

fit_7$summary(variables = "distance_tolerance")

# A tibble: 1 × 10
  variable            mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>              <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 distance_tolerance   3.9    4.0  0.22  0.21   3.5   4.1   1.0     629.    1887.

For the final model we change overshot to be a parameter, too. We assume Broadie (2018) did use his expertise to choose the value of 1 feet. We use log-normal prior with mean log(1) and standard deviation 0.2 to include that expert information but still have the prior to be relatively weak. We initialize MCMC using the draws from the previous model posterior expect for the new overshot parameter and Pathfinder,

model_8 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_proportional_errors_3.stan"))
pth_8 <- model_8$pathfinder(
  data = golf_new_data,
  refresh = 0,
  num_paths = 40,
  max_lbfgs_iters = 100,
  init = fit_7
)
fit_8 <- model_8$sample(
  data = golf_new_data,
  refresh = 0,
  init = pth_8
)

Here is the result:

print(fit_8)

       variable       mean     median   sd  mad         q5        q95 rhat ess_bulk
 lp__           -363660.71 -363660.32 4.80 4.74 -363669.11 -363653.58 1.00     1066
 sigma_angle          0.02       0.02 0.00 0.00       0.02       0.02 1.00      591
 sigma_distance       0.10       0.10 0.01 0.01       0.08       0.12 1.00     1087
 sigma_epsilon        0.00       0.00 0.00 0.00       0.00       0.01 1.00      581
 epsilon[1]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     4285
 epsilon[2]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     4076
 epsilon[3]           0.00       0.00 0.00 0.00       0.00       0.00 1.00      716
 epsilon[4]           0.00       0.00 0.00 0.00       0.00       0.00 1.00      951
 epsilon[5]           0.00       0.00 0.00 0.00       0.00       0.00 1.00     1385
 epsilon[6]           0.00       0.00 0.00 0.00       0.00       0.01 1.00     1597
 ess_tail
     1874
      815
     1210
     1429
     3004
     2396
     1726
     1731
     1963
     1831

 # showing 10 of 162 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

We graph the new data and the fitted model:

draws_8 <- fit_8$draws(format = "df")
sigma_angle_hat <- mean(draws_8$sigma_angle)
sigma_distance_hat <- mean(draws_8$sigma_distance)
distance_tolerance <- mean(draws_8$distance_tolerance)
overshot <- mean(draws_8$overshot)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
  pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")

posterior_mean_residual <- mean(as_draws_rvars(draws_8)$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)), 
     xaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "y/n - fitted E(y/n)", 
     main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)

The residuals are very similar to the previous model residuals and with similar standard deviation. If we examine the posterior marginals of distance_tolerance and overshot the standard deviations are only half from the prior standard deviations, hinting that likelihood would be weakly informative on these.

fit_8$summary(variables = c("distance_tolerance","overshot"))

# A tibble: 2 × 10
  variable            mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>              <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 distance_tolerance  3.5    3.5   0.42  0.43  2.9    4.2   1.0    1523.    2185.
2 overshot            0.87   0.86  0.13  0.13  0.67   1.1   1.0    1447.    2088.

However, when we examine the bivariate posterior we see strong dependency and their ratio is well informed by the likelihood. The posterior standard deviation of the ratio is one fifth of the prior standard deviation. This also explains why adding overshot did not reduce much the residual standard deviation.

fit_8$draws(variables = c("distance_tolerance","overshot")) |>
  mcmc_scatter() +
  theme_default(base_family = "sans", base_size=16)

As we can expect the residual standard deviation to decrease when we add more parameters, we also use cross-validation to compare the models. As we have one epsilon parameter for each observation, we need to use integrated PSIS-LOO. We can compute the integrated log_lik with the following stand alone generated quantities code.

gq_ll <- cmdstan_model(root("golf", "golf_log_lik.stan"))

When calling generated quantities, we pass only the required variables which allowed us to write more compact Stan code for the log_lik computation.

loo_6 <- gq_ll$generate_quantities(
  fit_6$draws(variables = c("sigma_epsilon",
                            "p_angle",
                            "p_distance")),
  data = golf_new_data)$draws(variables = "log_lik") |> loo()
loo_7 <- gq_ll$generate_quantities(
  fit_7$draws(variables = c("sigma_epsilon",
                            "p_angle",
                            "p_distance")),
  data = golf_new_data)$draws(variables = "log_lik") |> loo()
loo_8 <- gq_ll$generate_quantities(
  fit_8$draws(variables = c("sigma_epsilon",
                            "p_angle",
                            "p_distance")),
  data = golf_new_data)$draws(variables = "log_lik") |> loo()

As we have integrated out the epsilon parameters, the effective number of parameters match the number of remaining parameters, except for the last model where distance_tolerance and overshot parameters are not well informed by the likelihood separately.

print(loo_6)


Computed from 4000 by 31 log-likelihood matrix.

         Estimate   SE
elpd_loo   -185.7  8.5
p_loo         3.0  0.7
looic       371.4 17.1
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume independent draws (r_eff=1).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.

print(loo_7)


Computed from 4000 by 31 log-likelihood matrix.

         Estimate   SE
elpd_loo   -173.4  7.4
p_loo         4.1  1.0
looic       346.8 14.8
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume independent draws (r_eff=1).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     29    93.5%   672     
   (0.7, 1]   (bad)       2     6.5%   <NA>    
   (1, Inf)   (very bad)  0     0.0%   <NA>    
See help('pareto-k-diagnostic') for details.

print(loo_8)


Computed from 4000 by 31 log-likelihood matrix.

         Estimate   SE
elpd_loo   -172.8  7.3
p_loo         4.2  0.9
looic       345.5 14.6
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume independent draws (r_eff=1).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     30    96.8%   1008    
   (0.7, 1]   (bad)       1     3.2%   <NA>    
   (1, Inf)   (very bad)  0     0.0%   <NA>    
See help('pareto-k-diagnostic') for details.

Finally we compare the expected predictive performances.

loo_compare(
  list(
    `Distance tolerance and overshot fixed` = loo_6,
    `Distance tolerance parameter and overshot fixed` = loo_7,
    `Distance tolerance and overshot parameters` = loo_8
  )
)

                                           model elpd_diff se_diff p_worse diag_diff
      Distance tolerance and overshot parameters       0.0     0.0      NA          
 Distance tolerance parameter and overshot fixed      -0.7     0.3    0.99   N < 100
           Distance tolerance and overshot fixed     -12.9     4.5    1.00   N < 100
      diag_elpd
 1 k_psis > 0.7
 2 k_psis > 0.7

Adding distance_tolerance parameter significantly improves the performance, but adding overshot parameter does not improve the fit. However the model which has both distance_tolerance and overshot has posterior that is more informative on what can be learned about this aspect of the model.

10 Binomial with simplified error term

Now we fit a new set of models where the error term epsilon is a constant rather than a vector. Now epsilon is simply the probability of completely blowing your shot.

model_9 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_constant_errors.stan"))

fit_9 <- model_9$sample(
  data = golf_new_data,
  refresh = 0
)

fit_9$summary(variables = c("sigma_angle", "sigma_distance", "epsilon"))

# A tibble: 3 × 10
  variable         mean median       sd      mad     q5    q95  rhat ess_bulk ess_tail
  <chr>           <dbl>  <dbl>    <dbl>    <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 sigma_angle    0.018  0.018  0.000040 0.000040 0.018  0.018    1.0    2239.    2482.
2 sigma_distance 0.080  0.080  0.00057  0.00056  0.079  0.081    1.0    1924.    2124.
3 epsilon        0.0013 0.0013 0.000076 0.000075 0.0011 0.0014   1.0    1816.    1936.

model_10 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_constant_errors_2.stan"))

pth_10 <- model_10$pathfinder(
  data = golf_new_data,
  refresh = 0,
  num_paths = 40,
  max_lbfgs_iters = 100,
  init = fit_9
)
fit_10 <- model_10$sample(
  data = golf_new_data,
  refresh = 0,
  init = pth_10
)

fit_10$summary(variables = c("sigma_angle", "sigma_distance", "epsilon"))

# A tibble: 3 × 10
  variable          mean  median       sd      mad      q5     q95  rhat ess_bulk ess_tail
  <chr>            <dbl>   <dbl>    <dbl>    <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 sigma_angle    0.015   0.015   0.000052 0.000053 0.015   0.015     1.0    2643.    2646.
2 sigma_distance 0.13    0.13    0.00064  0.00065  0.13    0.13      1.0    2567.    2265.
3 epsilon        0.00060 0.00060 0.000055 0.000055 0.00051 0.00069   1.0    2387.    2479.

model_11 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_constant_errors_3.stan"))

pth_11 <- model_11$pathfinder(
  data = golf_new_data,
  refresh = 0,
  num_paths = 40,
  max_lbfgs_iters = 100,
  init = fit_10
)
fit_11 <- model_11$sample(
  data = golf_new_data,
  refresh = 0,
  init = pth_11
)

fit_11$summary(variables = c("sigma_angle", "sigma_distance", "distance_tolerance", "overshot", "epsilon"))

# A tibble: 5 × 10
  variable              mean  median       sd    mad     q5    q95  rhat ess_bulk ess_tail
  <chr>                <dbl>   <dbl>    <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 sigma_angle        0.015   0.015   0.00014  1.4e-4 1.5e-2 1.5e-2   1.0     679.    1254.
2 sigma_distance     0.13    0.13    0.0092   9.4e-3 1.2e-1 1.5e-1   1.0     590.     866.
3 distance_tolerance 4.3     4.3     0.36     3.6e-1 3.8e+0 4.9e+0   1.0     587.     899.
4 overshot           1.1     1.0     0.11     1.1e-1 8.8e-1 1.2e+0   1.0     590.     855.
5 epsilon            0.00060 0.00060 0.000058 6.0e-5 5.0e-4 6.9e-4   1.0    1476.    1369.

We graph the new data and the fitted model 11:

draws_11 <- fit_11$draws(format = "df")
sigma_angle_hat <- mean(draws_8$sigma_angle)
sigma_distance_hat <- mean(draws_8$sigma_distance)
distance_tolerance <- mean(draws_8$distance_tolerance)
overshot <- mean(draws_8$overshot)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02), 
     xaxs = "i", yaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "Probability of success", 
     main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
  pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")

posterior_mean_residual <- mean(as_draws_rvars(draws_11)$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)), 
     xaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "y/n - fitted E(y/n)", 
     main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)

Finally we compare the expected predictive performances

gq_ll_c <- cmdstan_model(root("golf", "golf_log_lik_constant_errors.stan"))
loo_11 <- gq_ll_c$generate_quantities(
  fit_11$draws(variables = "p"),
  data = golf_new_data)$draws(variables = "log_lik") |> loo()

We compare the predictive performance of the varying errors and constant error models.

loo_compare(
  list(
    `Model 8 with varying error terms` = loo_8,
    `Model 11 with constant error term` = loo_11
  )
)

                             model elpd_diff se_diff p_worse diag_diff      diag_elpd
  Model 8 with varying error terms       0.0     0.0      NA           1 k_psis > 0.7
 Model 11 with constant error term     -30.4    17.0    0.96   N < 100 3 k_psis > 0.7

The constant error term is clearly worse with respect to expected log predictive probability, but that difference is small enough that it is difficult to see when plotting the prediction on top of the data.

If we examine the predictive performance difference at different distances, we see that the constant error model tends to be worse specifically at short distances.

pointwise_elpd_diff <- pointwise(loo_8, "elpd_loo")- pointwise(loo_11, "elpd_loo")
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, pointwise_elpd_diff, xlim = c(0, 1.1 * max(golf_new$x)), 
     xaxs = "i", pch = 20, bty = "l", 
     xlab = "Distance from hole (feet)", 
     ylab = "pointwise elpd_loo difference", 
     main = "Predictive performance difference Model 8 vs Model 11")
abline(0, 0, col = "gray", lty = 2)

References

Berry, D. 1996. Statistics: A Bayesian Perspective. Belmont, Calif.: Wadsworth.

Broadie, M. 2018. “Two Simple Putting Models in Golf.” https://statmodeling.stat.columbia.edu/wp-content/uploads/2019/03/putt_models_20181017.pdf.

1 Introduction

2 Data

3 Logistic regression

4 Modeling from first principles

5 Testing the fitted model on new data

6 A new model accounting for how hard the ball is hit

7 Expanding the model by including a fudge factor

8 Binomial with errors in logit scale

9 Binomial with proportional errors

10 Binomial with simplified error term

References

Licenses