library(rprojroot)
root <- has_file(".Bayesian-Workflow-root")$make_fix_file()
library(cmdstanr)
options(mc.cores = 4)
library(bayesplot)
library(loo)
library(posterior)
options(pillar.neg = FALSE,
pillar.subtle = FALSE,
pillar.sigfig = 2)
options(width = 90)
# utility functions
logit <- qlogis
invlogit <- plogis
fround <- function (x, digits) format(round(x, digits), nsmall = digits)
print_stan_file <- function(file) {
code <- readLines(file)
if (isTRUE(getOption("knitr.in.progress")) &
identical(knitr::opts_current$get("results"), "asis")) {
# In render: emit as-is so Pandoc/Quarto does syntax highlighting
block <- paste0("```stan", "\n", paste(code, collapse = "\n"), "\n", "```")
knitr::asis_output(block)
} else {
writeLines(code)
}
}Model building and expansion: Golf putting
This notebook includes the code for Bayesian Workflow book chapter 25 Model building and expansion: Golf putting.
1 Introduction
We demonstrate the basic workflow of Bayesian modeling using an example of a set of models fit to data on golf putting.
Load packages
2 Data
The following graph shows data from professional golfers on the proportion of successful putts as a function of distance from the hole (from Berry 1996). Unsurprisingly, the probability of making the shot declines as a function of distance:
golf <- read.table(root("golf", "data", "golf_data.txt"),
header = TRUE, skip = 2)
x <- golf$x
y <- golf$y
n <- golf$n
J <- length(y)
r <- (1.68 / 2) / 12
R <- (4.25 / 2) / 12
se <- sqrt((y / n) * (1 - y / n) / n)par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Data on putts in pro golf",
type = "n")
points(x, y / n, pch = 20, col = "blue")
segments(x, y / n + se, x, y / n - se, lwd = .5, col = "blue")
text(x + .4,
y / n + se + .02,
paste(y, "/", n, sep = ""),
cex = .6,
col = "gray40")The error bars associated with each point j in the above graph are simple classical standard deviations, \sqrt{\hat{p}_j(1-\hat{p}_j)/n_j}, where \hat{p}_j=y_j/n_j is the success rate for putts taken at distance x_j.
3 Logistic regression
Can we model the probability of success in golf putting as a function of distance from the hole? Given usual statistical practice, the natural starting point would be logistic regression:
y_j\sim\mathrm{binomial}(n_j, \mathrm{logit}^{-1}(a + bx_j)), \text{ for } j=1,\dots, J. In Stan, this is:
print_stan_file(root("golf", "golf_logistic.stan"))data {
int J;
vector[J] x;
array[J] int n;
array[J] int y;
}
parameters {
real a;
real b;
}
model {
y ~ binomial_logit(n, a + b*x);
}The code in the above model block is (implicitly) vectorized, so that it is mathematically equivalent to modeling each data point, y[i] ~ binomial_logit(n[i], a + b*x[i]). The vectorized code is more compact (no need to write a loop, or to include the subscripts) and faster (because of more efficient gradient evaluations).
We fit the model to the data:
golf_data <- list(x = x, y = y, n = n, J = J)
model_1 <- cmdstan_model(root("golf", "golf_logistic.stan"))
fit_1 <- model_1$sample(data = golf_data, refresh = 0)And here is the result:
draws_1 <- fit_1$draws(format = "df")
a_sim <- draws_1$a
b_sim <- draws_1$b
a_hat <- mean(a_sim)
b_hat <- mean(b_sim)
n_sims <- nrow(draws_1)
print(fit_1) variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -3021.15 -3020.86 0.98 0.73 -3023.12 -3020.21 1.00 1301 2073
a 2.23 2.23 0.06 0.06 2.14 2.33 1.00 1155 1271
b -0.26 -0.26 0.01 0.01 -0.27 -0.24 1.00 1133 1142
Going through the columns of the above table: Stan has computed the posterior means \pm standard deviations of a and b to be 2.23 \pm 0.06 and -0.26 \pm 0.01, respectively. The Monte Carlo standard error of the mean of each of these parameters is 0 (to two decimal places), indicating that the simulations have run long enough to estimate the posterior means precisely. The posterior quantiles give a sense of the uncertainty in the parameters, with 50% posterior intervals of [2.19, 2.27] and [-0.26, -0.25] for a and b, respectively. Finally, the values of \widehat{R} near 1 tell us that the simulations from Stan’s four simulated chains have mixed well.
The following graph shows the fit plotted along with the data:
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Fitted logistic regression",
type = "n")
for (i in sample(n_sims, 10)) {
curve(invlogit(a_sim[i] + b_sim[i]*x),
from = 0, to = 1.1 * max(x),
wd = 0.5, add = TRUE, col = "green")
}
curve(invlogit(a_hat + b_hat * x), from = 0, to = 1.1 * max(x), add = TRUE)
points(x, y/n, pch = 20, col = "blue")
segments(x, y/n + se, x, y/n-se, lwd = .5, col = "blue")
text(11, .57, paste("Logistic regression,\n a = ",
fround(a_hat, 2), ", b = ", fround(b_hat, 2), sep=""))The black line shows the fit corresponding to the posterior median estimates of the parameters a and b; the green lines show 10 draws from the posterior distribution.
In this example, posterior uncertainties in the fits are small, and for simplicity we will just plot point estimates based on posterior median parameter estimates for the remaining models. Our focus here is on the sequence of models that we fit, not so much on uncertainty in particular model fits.
4 Modeling from first principles
As an alternative to logistic regression, we shall build a model from first principles and fit it to the data.
The graph below shows a simplified sketch of a golf shot. The dotted line represents the angle within which the ball of radius r must be hit so that it falls within the hole of radius R. This threshold angle is \sin^{-1}((R-r)/x). The graph, which is not to scale, is intended to illustrate the geometry of the ball needing to go into the hole.
par(mar = c(0, 0, 0, 0))
dist <- 2
r_plot <- r
R_plot <- R
plot(0, 0, xlim = c(-R_plot, dist + 3 * R_plot), ylim = c(-2 * R_plot, 2 * R_plot),
xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", bty = "n",
xlab = "", ylab = "", type = "n", asp = 1)
symbols(0, 0, circles=r_plot, inches=FALSE, add=TRUE)
symbols(dist, 0, circles = R_plot - r_plot, inches = FALSE, lty = 2, add = TRUE)
symbols(dist, 0, circles = R_plot, inches = FALSE, add = TRUE)
curve(0 * x, from = 0, to = dist, add = TRUE)
curve(((R_plot - r_plot) / dist) * x, from = 0, to = dist, lty = 2, add = TRUE)
curve(-((R_plot - r_plot) / dist) * x, from = 0, to = dist, lty = 2, add = TRUE)
text(0.5 * dist, -1.5 * R_plot, "x")
arrows(0.5 * dist + 0.05, -1.5 * R_plot, dist, -1.5 * R_plot, 2, length = .1)
arrows(0.5 * dist - 0.05, -1.5 * R_plot, 0, -1.5 * R_plot, 2, length = .1)
text(dist + 1.2 * R_plot, .5 * R_plot, "R")
arrows(dist + 1.2 * R_plot, .7 * R_plot, dist + 1.2 * R_plot, R_plot, length = .05)
arrows(dist + 1.2 * R_plot, .3 * R_plot, dist + 1.2 * R_plot, 0, length = .05)
text(0, r_plot / 2, "r")The next step is to model human error. We assume that the golfer is attempting to hit the ball completely straight but that many small factors interfere with this goal, so that the actual angle follows a normal distribution centered at 0 with some standard deviation \sigma.
par(mar = c(3, 3, 0, 0), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(-4, 4), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", bty = "n",
xlab = "Angle of shot", ylab = "", type = "n")
axis(1, seq(-4,4),
c("", "", expression(-2*sigma), "",
0, "", expression(2*sigma),"", ""))
curve(dnorm(x) / dnorm(0), add = TRUE)The probability the ball goes in the hole is then the probability that the angle is less than the threshold; that is, \mathrm{Pr}\left(|\mathrm{angle}| < \sin^{-1}((R-r)/x)\right) = 2\Phi\left(\frac{\sin^{-1}((R-r)/x)}{\sigma}\right) - 1, where \Phi is the cumulative normal distribution function. The only unknown parameter in this model is \sigma, the standard deviation of the distribution of shot angles. Stan (and, for that matter, R) computes trigonometry using angles in radians, so at the end of our calculations we will need to multiply by 180/\pi to convert to degrees, which are more interpretable by humans.
Our model then has two parts: \begin{align} y_j &\sim \mathrm{binomial}(n_j, p_j)\\ p_j &= 2\Phi\left(\frac{\sin^{-1}((R-r)/x_j)}{\sigma}\right) - 1 , \text{ for } j=1,\dots, J. \end{align} Here is a graph showing the curve for some potential values of the parameter \sigma.
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", bty = "l",
xlab = "Distance from hole (feet)", ylab = "Probability of success",
main = expression(paste("Modeled Pr(success) for different values of ", sigma)),
type = "n")
sigma_degrees_plot <- c(0.5, 1, 2, 5, 20)
x_text <- c(15, 10, 6, 4, 2)
for (i in 1:length(sigma_degrees_plot)){
sigma <- (pi / 180) * sigma_degrees_plot[i]
x_grid <- seq(R - r, 1.1 * max(x), .01)
p_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma) - 1
lines(c(0, R - r, x_grid), c(1, 1, p_grid))
text(x_text[i] + 0.7,
2 * pnorm(asin((R - r) / x_text[i]) / sigma) - 1,
bquote(sigma == .(sigma_degrees_plot[i]) * degree),
adj = 0)
}The highest curve on the graph corresponds to \sigma=0.5^\circ: if golfers could control the angles of their putts to an accuracy of approximately half a degree, they would have a very high probability of success, making over 80% of their ten-foot putts, over 50% of their fifteen-foot putts, and so on. At the other extreme, the lowest plotted curve corresponds to \sigma=20^\circ: if your putts could be off as high as 20 degrees, then you would be highly inaccurate, missing more than half of your two-foot putts. When fitting the model in Stan, the program moves around the space of \sigma, sampling from the posterior distribution.
We now write the Stan model in preparation to estimating \sigma:
print_stan_file(root("golf", "golf_angle_binomial.stan"))data {
int J;
vector[J] x;
array[J] int n;
array[J] int y;
real r, R;
}
transformed data {
vector[J] threshold_angle = asin((R-r) ./ x);
}
parameters {
real<lower=0> sigma;
}
model {
vector[J] p;
p = 2*Phi(threshold_angle / sigma) - 1;
y ~ binomial(n, p);
}
generated quantities {
real sigma_degrees = sigma * 180 / pi();
}In the transformed data block above, the ./ in the calculation of p corresponds to componentwise division in this vectorized computation.
The data J,n,x,y have already been set up; we just need to define r and R (the golf ball and hole have diameters 1.68 and 4.25 inches, respectively), and run the Stan model:
r <- (1.68/2)/12
R <- (4.25/2)/12
golf_data <- c(golf_data, r = r, R = R)
model_2 <- cmdstan_model(root("golf", "golf_angle_binomial.stan"))
fit_2 <- model_2$sample(data = golf_data, refresh = 0)Here is the result:
draws_2 <- fit_2$draws(format = "df")
sigma_sim <- draws_2$sigma
sigma_degrees_sim <- draws_2$sigma_degrees
sigma_hat <- mean(sigma_sim)
print(fit_2) variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -2926.75 -2926.49 0.68 0.31 -2928.13 -2926.27 1.00 1941 2546
sigma 0.03 0.03 0.00 0.00 0.03 0.03 1.00 1563 2242
sigma_degrees 1.53 1.53 0.02 0.02 1.49 1.56 1.00 1563 2242
The model has a single parameter, \sigma. From the output, we find that Stan has computed the posterior mean of \sigma to be 0.03. Multiplying this by 180/\pi, this comes to 1.53 degrees. The Monte Carlo standard error of the mean is 0 (to two decimal places), indicating that the simulations have run long enough to estimate the posterior mean precisely. The posterior standard deviation is calculated at 0.02 degrees, indicating that \sigma itself has been estimated with high precision, which makes sense given the large number of data points and the simplicity of the model. The precise posterior distribution of \sigma can also be seen from the narrow range of the posterior quantiles. Finally, \widehat{R} is near 1, telling us that the simulations from Stan’s four simulated chains have mixed well.
We next plot the data and the fitted model (here using the posterior median of \sigma but in this case the uncertainty is so narrow that any reasonable posterior summary would give essentially the same result), along with the logistic regression fitted earlier:
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Two models fit to the golf putting data", type = "n")
segments(x, y / n + se, x, y / n - se, lwd = .5)
curve(invlogit(a_hat + b_hat*x), from = 0, to = 1.1 * max(x), add = TRUE)
x_grid <- seq(R - r, 1.1 * max(x), .01)
p_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma_hat) - 1
lines(c(0, R - r, x_grid), c(1, 1, p_grid), col = "blue")
points(x, y / n, pch = 20, col = "blue")
text(10.3, .58, "Logistic regression")
text(18.5, .24, "Geometry-based model", col="blue")The custom nonlinear model fits the data much better. This is not to say that the model is perfect—any experience of golf will reveal that the angle is not the only factor determining whether the ball goes in the hole—but it seems like a useful start.
5 Testing the fitted model on new data
Several years after fitting the above model, we were presented with a newer and more comprehensive dataset on professional golf putting (Broadie 2018). For simplicity we’ll just look here at the summary data, probabilities of the ball going into the hole for shots up to 75 feet from the hole. The graph below shows these new data (in red), along with our earlier dataset (in blue) and the already-fit geometry-based model from before, extending to the range of the new data.
golf_new <- read.table(root("golf", "data", "golf_data_new.txt"),
header = TRUE, skip = 2)par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Checking already-fit model to new data")
x_grid <- seq(R - r, 1.1 * max(golf_new$x), .01)
p_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma_hat) - 1
lines(c(0, R - r, x_grid), c(1, 1, p_grid), col = "blue")
points(golf$x, golf$y / golf$n, pch = 20, col = "blue")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")
legend(60, 0.4, legend = c("Old data", "New data"), col = c("blue", "red"), pch = 20) Comparing the two datasets in the range 0-20 feet, the success rate is similar for longer putts but is much higher than before for the short putts. This could be a measurement issue, if the distances to the hole are only approximate for the old data, and it could also be that golfers are better than they used to be.
Beyond 20 feet, the empirical success rates become lower than would be predicted by the old model. These are much more difficult attempts, even after accounting for the increased angular precision required as distance goes up. In addition, the new data look smoother, which perhaps is a reflection of more comprehensive data collection.
6 A new model accounting for how hard the ball is hit
To get the ball in the hole, the angle isn’t the only thing you need to control; you also need to hit the ball just hard enough.
Broadie (2018) added this feature to the geometric model by introducing another parameter corresponding to the golfer’s control over distance. Supposing u is the distance that golfer’s shot would travel if there were no hole, the assumption is that the putt will go in if (a) the angle allows the ball to go over the hole, and (b) u is in the range (x,x+3). That is the ball must be hit hard enough to reach the whole but not go too far. Factor (a) is what we have considered earlier; we must now add factor (b).
The following sketch, which is not to scale, illustrates the need for the distance as angle as well as the angle of the shot to be in some range, in this case the gray zone which represents the trajectories for which the ball would reach the hole and stay in it.
par(mar = c(0, 0, 0, 0))
dist <- 2
r_plot <- r
R_plot <- R
distance_tolerance <- 0.6
plot(0, 0,
xlim = c(-R_plot, dist + 3 * R_plot + 1.5 * distance_tolerance),
ylim = c(-2 * R_plot, 2 * R_plot),
xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", bty = "n",
xlab = "", ylab = "", type = "n", asp = 1)
polygon(
c(dist, dist, dist + distance_tolerance, dist + distance_tolerance),
c(
R_plot - r_plot,
-(R_plot - r_plot),
-(R_plot - r_plot) * (dist + distance_tolerance) / dist,
(R_plot - r_plot) * (dist + distance_tolerance) / dist
),
border = NA,
col = "gray"
)
symbols(0, 0, circles = r_plot, inches = FALSE, add = TRUE)
symbols(dist, 0, circles = R_plot, inches = FALSE, add = TRUE)
symbols(dist, 0, circles = R_plot - r_plot, inches = FALSE,
lty = 2, bg = "gray", add = TRUE)
curve(((R_plot - r_plot) / dist) * x, from = 0, to = dist + 1.5 * distance_tolerance,
lty = 2, add = TRUE)
curve(-((R_plot - r_plot) / dist) * x, from = 0, to = dist + 1.5 * distance_tolerance,
lty = 2, add = TRUE)
text(0.5 * dist, -1.5 * R_plot, "x")
arrows(0.5 * dist + 0.05, -1.5 * R_plot, dist, -1.5 * R_plot, 2, length = .1)
arrows(0.5 * dist - 0.05, -1.5 * R_plot, 0, -1.5 * R_plot, 2, length = .1)Suppose that a golfer will aim to hit the ball one foot past the hole but with a multiplicative error in the shot’s potential distance, so that u=(x+1)(1+\epsilon), where the error \epsilon has a normal distribution with mean 0 and standard deviation \sigma_{\rm distance}. In statistics notation, this model is, u\sim\normal(x+1,(x+1)\sigma_{\rm distance}), and the distance is acceptable if u\in [x,x+3], an event that has probability \Phi\left(\frac{2}{(x+1)\sigma_{\rm distance}}\right)-\Phi\left(\frac{-1}{(x+1)\sigma_{\rm distance}}\right).
Putting these together, the probability a shot goes in becomes, \left(2\Phi\left(\frac{\sin^{-1}((R-r)/x)}{\sigma_{\rm
angle}}\right) -
1\right)\left(\Phi\left(\frac{2}{(x+1)\,\sigma_{\rm
distance}}\right) - \Phi\left(\frac{-1}{(x+1)\,\sigma_{\rm
distance}}\right)\right), where we have renamed the parameter \sigma from our earlier model to \sigma_{\rm angle} to distinguish it from the new \sigma_{\rm distance} parameter. We write the new model in Stan, giving it the name golf_angle_distance_binomial.stan to convey that it accounts both for angle and distance:
print_stan_file(root("golf", "golf_angle_distance_binomial.stan"))data {
int J;
vector[J] x;
array[J] int n;
array[J] int y;
real r, R;
real overshot, distance_tolerance;
}
transformed data {
vector[J] threshold_angle = asin((R-r) ./ x);
}
parameters {
real<lower=0> sigma_angle;
real<lower=0> sigma_distance;
}
model {
[sigma_angle, sigma_distance] ~ normal(0, 1);
vector[J] p_angle = 2*Phi(threshold_angle / sigma_angle) - 1;
vector[J] p_distance =
Phi((distance_tolerance - overshot) ./ ((x + overshot)*sigma_distance)) -
Phi(-overshot ./ ((x + overshot)*sigma_distance));
vector[J] p = p_angle .* p_distance;
y ~ binomial(n, p);
}
generated quantities {
real sigma_degrees = sigma_angle * 180 / pi();
}The result is a model with two parameters, \sigma_{\rm angle} and \sigma_{\rm distance}. Even this improved geometry-based model is a gross oversimplification of putting, and the average distances in the binned data are not the exact distances for each shot. But it should be an advance on the earlier one-parameter model; the next step is to see how it fits the data.
Here we have defined overshot and distance_tolerance as data, which Broadie (2018) has specified as 1 and 3 feet, respectively. We might wonder why if the distance range is 3 feet, the overshot is not 1.5 feet. One reason could be that it is riskier to hit the ball too hard than too soft. In addition we assigned weakly informative half-normal(0,1) priors on the scale parameters, \sigma_{\rm angle} and \sigma_{\rm distance}, which are required in this case to keep the computations stable.
We fit the model to the new dataset.
overshot <- 1
distance_tolerance <- 3
golf_new_data <- list(
x = golf_new$x,
y = golf_new$y,
n = golf_new$n,
J = nrow(golf_new),
r = r,
R = R,
overshot = overshot,
distance_tolerance = distance_tolerance
)
model_3 <- cmdstan_model(root("golf", "golf_angle_distance_binomial.stan"))
fit_3 <- model_3$sample(data = golf_new_data, refresh = 0)Here is the result:
print(fit_3) variable mean median sd mad q5 q95 rhat ess_bulk
lp__ -396352.23 -364910.64 54261.45 2.31 -495798.95 -364909.02 1.93 5
sigma_angle 0.02 0.01 0.01 0.00 0.01 0.04 1.93 5
sigma_distance 0.12 0.14 0.02 0.00 0.09 0.14 1.81 5
sigma_degrees 1.12 0.76 0.57 0.01 0.76 2.13 1.93 5
ess_tail
11
11
37
11
There is poor convergence. Very high Rhat and very low ESSs indicate multimodality. We try initializing MCMC with Pathfinder
pth_3 <- model_3$pathfinder(
data = golf_new_data,
refresh = 0,
num_paths = 20,
max_lbfgs_iters = 100
)
fit_3 <- model_3$sample(
data = golf_new_data,
refresh = 0,
init = pth_3
)Here is the result:
print(fit_3) variable mean median sd mad q5 q95 rhat ess_bulk
lp__ -364909.93 -364909.65 0.97 0.73 -364911.86 -364908.98 1.00 1661
sigma_angle 0.01 0.01 0.00 0.00 0.01 0.01 1.00 1031
sigma_distance 0.14 0.14 0.00 0.00 0.14 0.14 1.00 1060
sigma_degrees 0.76 0.76 0.00 0.00 0.76 0.77 1.00 1031
ess_tail
2125
1560
1401
1560
Now the convergence looks fine. We graph the new data and the fitted model:
draws_3 <- fit_3$draws(format = "df")
sigma_angle_hat <- mean(draws_3$sigma_angle)
sigma_distance_hat <- mean(draws_3$sigma_distance)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Checking model fit", type = "n")
x_grid <- seq(R - r, 1.1 * max(golf_new$x), .01)
p_angle_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot) * sigma_distance_hat)) -
pnorm(-overshot / ((x_grid + overshot) * sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y / golf_new$n, pch = 20, col = "red")There are problems with the fit in the middle of the range of x. We suspect this is a problem with the binomial error model, as it tries harder to fit points where the counts are higher. Look at how closely the fitted curve hugs the data at the very lowest values of x.
Here are the first few rows of the data:
print(golf_new[1:5,]) x n y
1 0.28 45198 45183
2 0.97 183020 182899
3 1.93 169503 168594
4 2.92 113094 108953
5 3.93 73855 64740
With such large values of n_j, the likelihood enforces an extremely close fit at these first few points, and that drives the entire fit of the model.
7 Expanding the model by including a fudge factor
To fix this problem we took the data model, y_j \sim \mathrm{binomial}(n_j, p_j), and added an independent error term to each observation. There is no easy way to add error directly to the binomial distribution—we could replace it with its overdispersed generalization, the beta-binomial, but this would not be appropriate here because the variance for each data point i would still be roughly proportional to the sample size n_j, and our whole point here is to get away from this assumption and allow for model misspecification—so instead we first approximate the binomial data distribution by a normal and then add independent variance; thus: y_j/n_j \sim \mathrm{normal}\left(p_j, \sqrt{p_j(1-p_j)/n_j + \sigma_y^2}\right), To write this in Stan there are some complications:
y and n are integer variables, which we convert to vectors so that we can multiply and divide them.
To perform componentwise multiplication or division using vectors, you need to use
.*or./so that San knows not to try to perform vector/matrix multiplication and division. Stan is opposite from R in this way: Stan defaults to vector/matrix operations and has to be told otherwise, whereas R defaults to componentwise operations, and vector/matrix multiplication in R is indicated using the%*%operator.
We implement these via the following new code in the transformed data block:
vector[J] raw_proportions = to_vector(y) ./ to_vector(n);And then in the model block:
raw_proportions ~ normal(p, sqrt(p .* (1-p) ./ to_vector(n) + sigma_y^2));To complete the model we add \sigma_y to the parameters block and assign it a weakly informative half-normal(0,1) prior distribution. Here’s the new Stan program:
print_stan_file(root("golf", "golf_angle_distance_normal.stan"))data {
int J;
array[J] int n;
vector[J] x;
array[J] int y;
real r;
real R;
real overshot;
real distance_tolerance;
}
transformed data {
vector[J] threshold_angle = asin((R-r) ./ x);
vector[J] raw_proportions = to_vector(y) ./ to_vector(n);
}
parameters {
real<lower=0> sigma_angle;
real<lower=0> sigma_distance;
real<lower=0> sigma_y;
}
model {
[sigma_angle, sigma_distance, sigma_y] ~ normal(0, 1);
vector[J] p_angle = 2*Phi(threshold_angle / sigma_angle) - 1;
vector[J] p_distance =
Phi((distance_tolerance - overshot) ./ ((x + overshot)*sigma_distance)) -
Phi(-overshot ./ ((x + overshot)*sigma_distance));
vector[J] p = p_angle .* p_distance;
raw_proportions ~ normal(p, sqrt(p .* (1-p) ./ to_vector(n) + sigma_y^2));
}
generated quantities {
real sigma_degrees = sigma_angle * 180 / pi();
}We now fit this model to the data:
model_4 <- cmdstan_model(root("golf", "golf_angle_distance_normal.stan"))
fit_4 <- model_4$sample(data = golf_new_data, refresh = 0)Here is the result
draws_4 <- fit_4$draws(format = "df")
print(fit_4) variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 146.56 146.93 1.30 1.05 144.12 147.96 1.00 1434 2170
sigma_angle 0.02 0.02 0.00 0.00 0.02 0.02 1.00 1220 1613
sigma_distance 0.08 0.08 0.00 0.00 0.08 0.08 1.00 1228 1424
sigma_y 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1945 1784
sigma_degrees 1.02 1.02 0.01 0.01 1.01 1.03 1.00 1220 1613
The new parameter estimates are:
\sigma_{\rm angle} is estimated at 0.02, which when corresponds to \sigma_{\rm degrees}= 1.0. According to the fitted model, there is a standard deviation of 1.0 degree in the angles of putts taken by pro golfers. The estimate of \sigma_{\rm angle} has decreased compared to the earlier model that only had angular errors. This makes sense: now that distance errors have been included in the model, there is no need to explain so many of the missed shots using errors in angle.
\sigma_{\rm distance} is estimated at 0.08. According to the fitted model, there is a standard deviation of 8% in the errors of distance.
\sigma_y is estimated at 0.003. The fitted model fits the aggregate data (success rate as a function of distance) to an accuracy of 0.3 percentage points.
And now we graph:
draws_4 <- fit_4$draws(format = "df")
sigma_angle_hat <- mean(draws_4$sigma_angle)
sigma_distance_hat <- mean(draws_4$sigma_distance)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Checking model fit", type = "n")
x_grid <- seq(R - r, 1.1 * max(golf_new$x), .01)
p_angle_grid <- 2 * pnorm(asin((R - r) / x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot) * sigma_distance_hat)) -
pnorm(-overshot / ((x_grid + overshot) * sigma_distance_hat))
lines(c(0, R - r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y / golf_new$n, pch = 20, col = "red")We can go further and plot the residuals from this fit. First we augment the Stan model to compute residuals in the generated quantities block.
model_4_with_resids <- cmdstan_model(root("golf", "golf_angle_distance_normal_with_resids.stan"))
fit_4_with_resids <- model_4_with_resids$sample(data = golf_new_data, refresh = 0)Then we compute the posterior means of the residuals, y_j/n_j - p_j, then plot these vs. distance:
posterior_mean_residual <- mean(as_draws_rvars(fit_4_with_resids$draws())$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)),
xaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "y/n - fitted E(y/n)",
main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)The fit is good, and the residuals show no strong pattern, also they are low in absolute value—the model predicts the success rate to within half a percentage point at most distances, suggesting not that the model is perfect but that there are no clear problems given the current data.
The above model fit, but we were bothered by the normal approximation, not so much for these particular data but rather because it was a sort of admission of failure to not be able to directly use the binomial model.
8 Binomial with errors in logit scale
What we wanted to do was to keep the binomial model and then add the error on the logistic scale or something like that, something to keep the probabilities bounded between 0 and 1. We added an error term on the logistic scale with a scale parameter, sigma_eta, estimated from the data.
print_stan_file(root("golf", "golf_angle_distance_binomial_with_logit_errors.stan"))data {
int J;
array[J] int n;
vector[J] x;
array[J] int y;
real r;
real R;
real overshot;
real distance_tolerance;
}
transformed data {
vector[J] threshold_angle = asin((R-r) ./ x);
}
parameters {
real<lower=0> sigma_angle;
real<lower=0> sigma_distance;
real<lower=0> sigma_eta;
vector[J] eta;
}
model {
eta ~ normal(0, 1);
[sigma_angle, sigma_distance, sigma_eta] ~ normal(0, 1);
vector[J] p_angle = 2*Phi(threshold_angle / sigma_angle) - 1;
vector[J] p_distance =
Phi((distance_tolerance - overshot) ./ ((x + overshot)*sigma_distance)) -
Phi(-overshot ./ ((x + overshot)*sigma_distance));
vector[J] p = inv_logit(logit(p_angle .* p_distance) + sigma_eta*eta);
y ~ binomial(n, p);
}
generated quantities {
real sigma_degrees = sigma_angle * 180 / pi();
}We fit the model to the data:
model_5 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_logit_errors.stan"))
fit_5 <- model_5$sample(data = golf_new_data, refresh = 0)Here is the result:
print(fit_5) variable mean median sd mad q5 q95 rhat ess_bulk
lp__ -415798.83 -363617.97 90395.16 9.31 -572350.40 -363607.42 1.57 6
sigma_angle 0.02 0.01 0.02 0.00 0.01 0.05 1.54 7
sigma_distance 0.17 0.20 0.05 0.01 0.08 0.21 1.60 6
sigma_eta 0.57 0.66 0.25 0.15 0.15 0.86 1.59 6
eta[1] -1.33 -1.96 1.95 1.46 -3.75 1.75 1.53 7
eta[2] 2.21 2.80 1.50 0.85 -0.25 3.85 1.53 7
eta[3] 2.29 2.89 1.45 0.71 -0.14 3.77 1.60 6
eta[4] 0.99 1.54 1.16 0.40 -0.98 2.04 1.53 7
eta[5] 0.94 0.68 0.57 0.23 0.40 1.92 1.59 6
eta[6] 0.61 0.20 0.80 0.18 -0.03 1.99 1.55 7
ess_tail
14
43
11
11
23
25
15
26
11
27
# showing 10 of 36 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
Unfortunately, when we try to fit this new model to our data, the console fills up with warnings and the chains don’t mix. We try initializing with Pathfinder.
pth_5 <- model_5$pathfinder(
data = golf_new_data,
refresh = 0,
num_paths = 20,
max_lbfgs_iters = 100
)
fit_5 <- model_5$sample(
data = golf_new_data,
refresh = 0,
init = pth_5
)Here is the result:
print(fit_5) variable mean median sd mad q5 q95 rhat ess_bulk
lp__ -363615.58 -363615.19 5.89 5.88 -363625.91 -363606.57 1.00 648
sigma_angle 0.01 0.01 0.00 0.00 0.01 0.01 1.01 572
sigma_distance 0.20 0.20 0.01 0.01 0.19 0.22 1.00 1042
sigma_eta 0.71 0.70 0.10 0.09 0.56 0.89 1.00 749
eta[1] -2.37 -2.36 0.95 0.96 -3.93 -0.84 1.00 1109
eta[2] 3.02 3.04 0.55 0.54 2.09 3.90 1.00 916
eta[3] 3.08 3.09 0.45 0.44 2.35 3.82 1.00 809
eta[4] 1.64 1.65 0.26 0.26 1.21 2.06 1.00 840
eta[5] 0.62 0.62 0.15 0.15 0.38 0.85 1.00 971
eta[6] 0.14 0.14 0.12 0.12 -0.05 0.34 1.01 886
ess_tail
1077
844
1408
965
1580
1056
916
965
1215
1091
# showing 10 of 36 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
The sampling is slower than for earlier models, but the convergence diagnostic look just fine. We graph the new data and the fitted model:
draws_5 <- fit_5$draws(format = "df")
sigma_angle_hat <- mean(draws_5$sigma_angle)
sigma_distance_hat <- mean(draws_5$sigma_distance)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")There are now problems with the fit in the low range of x. Clearly the additive error in logit scale is not good.
9 Binomial with proportional errors
Instead of additive errors in logit scale, we next fit a three-parameter model that scales all the probabilities down from 1. Each observation has its own proportional error term. The key is to make each element of the multiplier vector (1- epsilon) positive and less than 1. This eliminates the problem with the boundary and the need for the logit. The prior distribution for epsilon keeps the errors under control.
print_stan_file(root("golf", "golf_angle_distance_binomial_with_proportional_errors.stan"))data {
int J;
array[J] int n;
vector[J] x;
array[J] int y;
real r;
real R;
real overshot;
real distance_tolerance;
}
transformed data {
vector[J] threshold_angle = asin((R-r) ./ x);
vector[J] raw_proportion = to_vector(y) ./ to_vector(n);
}
parameters {
real<lower=0> sigma_angle;
real<lower=0> sigma_distance;
real<lower=0> sigma_epsilon;
vector<lower=0, upper=1>[J] epsilon;
}
transformed parameters {
vector[J] p_angle = 2*Phi(threshold_angle / sigma_angle) - 1;
vector[J] p_distance =
Phi((distance_tolerance - overshot) ./ ((x + overshot)*sigma_distance)) -
Phi(-overshot ./ ((x + overshot)*sigma_distance));
vector[J] p = p_angle .* p_distance .* (1 - epsilon);
}
model {
[sigma_angle, sigma_distance] ~ normal(0, 1);
epsilon ~ exponential(1/sigma_epsilon);
y ~ binomial(n, p);
}
generated quantities {
real sigma_degrees = sigma_angle * 180 / pi();
vector[J] residual = raw_proportion - p_angle .* p_distance;
}We fit the model to the data:
model_6 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_proportional_errors.stan"))
fit_6 <- model_6$sample(data = golf_new_data, refresh = 0)Here is the result:
print(fit_6) variable mean median sd mad q5 q95 rhat ess_bulk
lp__ -363658.70 -363658.34 5.37 5.39 -363668.13 -363650.49 1.01 768
sigma_angle 0.02 0.02 0.00 0.00 0.02 0.02 1.00 1442
sigma_distance 0.08 0.08 0.00 0.00 0.08 0.08 1.00 1376
sigma_epsilon 0.01 0.01 0.00 0.00 0.01 0.02 1.00 968
epsilon[1] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 3592
epsilon[2] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 3710
epsilon[3] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 3284
epsilon[4] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1386
epsilon[5] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1536
epsilon[6] 0.01 0.01 0.00 0.00 0.00 0.01 1.00 1510
ess_tail
1089
1510
1629
1707
2183
2787
2566
1540
1683
1601
# showing 10 of 160 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
We graph the new data and the fitted model:
draws_6 <- fit_6$draws(format = "df")
sigma_angle_hat <- mean(draws_6$sigma_angle)
sigma_distance_hat <- mean(draws_6$sigma_distance)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid*p_distance_grid), col = "red")
points(golf_new$x, golf_new$y / golf_new$n, pch = 20, col = "red")posterior_mean_residual <- mean(as_draws_rvars(draws_6)$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)),
xaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "y/n - fitted E(y/n)",
main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)The model fit looks good. The residual plot shows that for short distances the model overestimates the probabilities. This pattern could be explained by sensitivity to distance_tolerance and overshot parameters that were fixed.
We change distance_tolerance to be a parameter. We assume Broadie (2018) did use his expertise to choose the value of 3 feet. We use log-normal prior with mean log(3) and standard deviation 0.2 to include that expert information but still have the prior to be relatively weak. We initialize MCMC using the draws from the previous model posterior expect for the new distance_tolerance parameter,
model_7 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_proportional_errors_2.stan"))
fit_7 <- model_7$sample(data = golf_new_data, refresh = 0, init = fit_6)Here is the result:
print(fit_7) variable mean median sd mad q5 q95 rhat ess_bulk
lp__ -363659.73 -363659.47 4.82 4.83 -363668.14 -363652.32 1.00 942
sigma_angle 0.02 0.02 0.00 0.00 0.02 0.02 1.01 575
sigma_distance 0.11 0.11 0.01 0.01 0.09 0.12 1.01 548
sigma_epsilon 0.00 0.00 0.00 0.00 0.00 0.01 1.00 662
epsilon[1] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 5983
epsilon[2] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 5741
epsilon[3] 0.00 0.00 0.00 0.00 0.00 0.00 1.01 600
epsilon[4] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 811
epsilon[5] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1412
epsilon[6] 0.00 0.00 0.00 0.00 0.00 0.01 1.00 1286
ess_tail
1667
1583
1562
1344
2685
3302
1771
1020
1619
1503
# showing 10 of 161 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
We graph the new data and the fitted model:
draws_7 <- fit_7$draws(format = "df")
sigma_angle_hat <- mean(draws_7$sigma_angle)
sigma_distance_hat <- mean(draws_7$sigma_distance)
distance_tolerance <- mean(draws_7$distance_tolerance)
overshot <- 1
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")posterior_mean_residual <- mean(as_draws_rvars(draws_7)$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)),
xaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "y/n - fitted E(y/n)",
main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)The residuals are now smaller with standard deviation halved, and there is no obvious pattern. Looking at the posterior of distance_tolerance most of the posterior mass is above 3 and the posterior is much narrower than the prior and thus likelihood is informative about it.
fit_7$summary(variables = "distance_tolerance")# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 distance_tolerance 3.9 4.0 0.22 0.21 3.5 4.1 1.0 563. 1833.
For the final model we change overshot to be a parameter, too. We assume Broadie (2018) did use his expertise to choose the value of 1 feet. We use log-normal prior with mean log(1) and standard deviation 0.2 to include that expert information but still have the prior to be relatively weak. We initialize MCMC using the draws from the previous model posterior expect for the new overshot parameter and Pathfinder,
model_8 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_proportional_errors_3.stan"))
pth_8 <- model_8$pathfinder(
data = golf_new_data,
refresh = 0,
num_paths = 40,
max_lbfgs_iters = 100,
init = fit_7
)
fit_8 <- model_8$sample(
data = golf_new_data,
refresh = 0,
init = pth_8
)Here is the result:
print(fit_8) variable mean median sd mad q5 q95 rhat ess_bulk
lp__ -363660.61 -363660.28 4.86 4.77 -363668.92 -363653.17 1.00 774
sigma_angle 0.02 0.02 0.00 0.00 0.02 0.02 1.00 651
sigma_distance 0.10 0.10 0.01 0.01 0.08 0.12 1.00 1254
sigma_epsilon 0.00 0.00 0.00 0.00 0.00 0.01 1.00 708
epsilon[1] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 4521
epsilon[2] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 5313
epsilon[3] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 687
epsilon[4] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 949
epsilon[5] 0.00 0.00 0.00 0.00 0.00 0.00 1.00 1311
epsilon[6] 0.00 0.00 0.00 0.00 0.00 0.01 1.00 1507
ess_tail
1495
1714
1567
1301
2798
3127
1857
1601
1357
1847
# showing 10 of 162 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
We graph the new data and the fitted model:
draws_8 <- fit_8$draws(format = "df")
sigma_angle_hat <- mean(draws_8$sigma_angle)
sigma_distance_hat <- mean(draws_8$sigma_distance)
distance_tolerance <- mean(draws_8$distance_tolerance)
overshot <- mean(draws_8$overshot)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")posterior_mean_residual <- mean(as_draws_rvars(draws_8)$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)),
xaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "y/n - fitted E(y/n)",
main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)The residuals are very similar to the previous model residuals and with similar standard deviation. If we examine the posterior marginals of distance_tolerance and overshot the standard deviations are only half from the prior standard deviations, hinting that likelihood would be weakly informative on these.
fit_8$summary(variables = c("distance_tolerance","overshot"))# A tibble: 2 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 distance_tolerance 3.5 3.5 0.42 0.42 2.8 4.2 1.0 1690. 1885.
2 overshot 0.87 0.87 0.13 0.13 0.67 1.1 1.0 1715. 2414.
However, when we examine the bivariate posterior we see strong dependency and their ratio is well informed by the likelihood. The posterior standard deviation of the ratio is one fifth of the prior standard deviation. This also explains why adding overshot did not reduce much the residual standard deviation.
fit_8$draws(variables = c("distance_tolerance","overshot")) |>
mcmc_scatter() +
theme_default(base_family = "sans", base_size=16)As we can expect the residual standard deviation to decrease when we add more parameters, we also use cross-validation to compare the models. As we have one epsilon parameter for each observation, we need to use integrated PSIS-LOO. We can compute the integrated log_lik with the following stand alone generated quantities code.
gq_ll <- cmdstan_model(root("golf", "golf_log_lik.stan"))When calling generated quantities, we pass only the required variables which allowed us to write more compact Stan code for the log_lik computation.
loo_6 <- gq_ll$generate_quantities(
fit_6$draws(variables = c("sigma_epsilon",
"p_angle",
"p_distance")),
data = golf_new_data)$draws(variables = "log_lik") |> loo()
loo_7 <- gq_ll$generate_quantities(
fit_7$draws(variables = c("sigma_epsilon",
"p_angle",
"p_distance")),
data = golf_new_data)$draws(variables = "log_lik") |> loo()
loo_8 <- gq_ll$generate_quantities(
fit_8$draws(variables = c("sigma_epsilon",
"p_angle",
"p_distance")),
data = golf_new_data)$draws(variables = "log_lik") |> loo()As we have integrated out the epsilon parameters, the effective number of parameters match the number of remaining parameters, except for the last model where distance_tolerance and overshot parameters are not well informed by the likelihood separately.
print(loo_6)
Computed from 4000 by 31 log-likelihood matrix.
Estimate SE
elpd_loo -185.6 8.5
p_loo 2.9 0.7
looic 371.2 17.0
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume independent draws (r_eff=1).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
print(loo_7)
Computed from 4000 by 31 log-likelihood matrix.
Estimate SE
elpd_loo -173.4 7.4
p_loo 4.1 1.0
looic 346.9 14.8
------
MCSE of elpd_loo is 0.1.
MCSE and ESS estimates assume independent draws (r_eff=1).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
print(loo_8)
Computed from 4000 by 31 log-likelihood matrix.
Estimate SE
elpd_loo -173.0 7.3
p_loo 4.2 0.9
looic 346.0 14.6
------
MCSE of elpd_loo is 0.1.
MCSE and ESS estimates assume independent draws (r_eff=1).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
Finally we compare the expected predictive performances.
loo_compare(
list(
`Distance tolerance and overshot fixed` = loo_6,
`Distance tolerance parameter and overshot fixed` = loo_7,
`Distance tolerance and overshot parameters` = loo_8
)
) elpd_diff se_diff
Distance tolerance and overshot parameters 0.0 0.0
Distance tolerance parameter and overshot fixed -0.4 0.3
Distance tolerance and overshot fixed -12.6 4.5
Adding distance_tolerance parameter significantly improves the performance, but adding overshot parameter does not improve the fit. However the model which has both distance_tolerance and overshot has posterior that is more informative on what can be learned about this aspect of the model.
10 Binomial with simplified error term
Now we fit a new set of models where the error term epsilon is a constant rather than a vector. Now epsilon is simply the probability of completely blowing your shot.
model_9 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_constant_errors.stan"))fit_9 <- model_9$sample(
data = golf_new_data,
refresh = 0
)fit_9$summary(variables = c("sigma_angle", "sigma_distance", "epsilon"))# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 sigma_angle 0.018 0.018 0.000041 0.000041 0.018 0.018 1.0 1932. 2338.
2 sigma_distance 0.080 0.080 0.00056 0.00055 0.079 0.081 1.0 2059. 2133.
3 epsilon 0.0013 0.0013 0.000080 0.000081 0.0011 0.0014 1.0 1186. 1341.
model_10 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_constant_errors_2.stan"))pth_10 <- model_10$pathfinder(
data = golf_new_data,
refresh = 0,
num_paths = 40,
max_lbfgs_iters = 100,
init = fit_9
)
fit_10 <- model_10$sample(
data = golf_new_data,
refresh = 0,
init = pth_10
)fit_10$summary(variables = c("sigma_angle", "sigma_distance", "epsilon"))# A tibble: 3 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 sigma_angle 0.015 0.015 0.000055 0.000055 0.015 0.015 1.0 2732. 2446.
2 sigma_distance 0.13 0.13 0.00067 0.00068 0.13 0.13 1.0 2505. 2566.
3 epsilon 0.00060 0.00060 0.000057 0.000055 0.00052 0.00070 1.0 2630. 2319.
model_11 <- cmdstan_model(root("golf", "golf_angle_distance_binomial_with_constant_errors_3.stan"))pth_11 <- model_11$pathfinder(
data = golf_new_data,
refresh = 0,
num_paths = 40,
max_lbfgs_iters = 100,
init = fit_10
)
fit_11 <- model_11$sample(
data = golf_new_data,
refresh = 0,
init = pth_11
)fit_11$summary(variables = c("sigma_angle", "sigma_distance", "distance_tolerance", "overshot", "epsilon"))# A tibble: 5 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 sigma_angle 0.015 0.015 0.00013 1.4e-4 1.5e-2 1.5e-2 1.0 666. 1172.
2 sigma_distance 0.13 0.13 0.0086 8.5e-3 1.2e-1 1.5e-1 1.0 689. 1166.
3 distance_tolerance 4.4 4.4 0.33 3.3e-1 3.8e+0 4.9e+0 1.0 677. 1185.
4 overshot 1.1 1.1 0.10 1.0e-1 8.9e-1 1.2e+0 1.0 680. 1121.
5 epsilon 0.00060 0.00060 0.000056 5.7e-5 5.1e-4 6.9e-4 1.0 1398. 1644.
We graph the new data and the fitted model 11:
draws_11 <- fit_11$draws(format = "df")
sigma_angle_hat <- mean(draws_8$sigma_angle)
sigma_distance_hat <- mean(draws_8$sigma_distance)
distance_tolerance <- mean(draws_8$distance_tolerance)
overshot <- mean(draws_8$overshot)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(0, 0, xlim = c(0, 1.1 * max(golf_new$x)), ylim = c(0, 1.02),
xaxs = "i", yaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "Probability of success",
main = "Checking model fit", type = "n")
x_grid <- seq(R-r, 1.1*max(golf_new$x), .01)
p_angle_grid <- 2*pnorm(asin((R-r)/x_grid) / sigma_angle_hat) - 1
p_distance_grid <- pnorm((distance_tolerance - overshot) / ((x_grid + overshot)*sigma_distance_hat)) -
pnorm(-overshot / ((x_grid + overshot)*sigma_distance_hat))
lines(c(0, R-r, x_grid), c(1, 1, p_angle_grid * p_distance_grid), col = "red")
points(golf_new$x, golf_new$y/golf_new$n, pch = 20, col = "red")posterior_mean_residual <- mean(as_draws_rvars(draws_11)$residual)
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, posterior_mean_residual, xlim = c(0, 1.1 * max(golf_new$x)),
xaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "y/n - fitted E(y/n)",
main = "Residuals from fitted model", type = "n")
abline(0, 0, col = "gray", lty = 2)
lines(golf_new$x, posterior_mean_residual)Finally we compare the expected predictive performances
gq_ll_c <- cmdstan_model(root("golf", "golf_log_lik_constant_errors.stan"))
loo_11 <- gq_ll_c$generate_quantities(
fit_11$draws(variables = "p"),
data = golf_new_data)$draws(variables = "log_lik") |> loo()We compare the predictive performance of the varying errors and constant error models.
loo_compare(
list(
`Model 8 with varying error terms` = loo_8,
`Model 11 with constant error term` = loo_11
)
) elpd_diff se_diff
Model 8 with varying error terms 0.0 0.0
Model 11 with constant error term -30.5 17.4
The constant error term is clearly worse with respect to expected log predictive probability, but that difference is small enough that it is difficult to see when plotting the prediction on top of the data.
If we examine the predictive performance difference at different distances, we see that the constant error model tends to be worse specifically at short distances.
pointwise_elpd_diff <- pointwise(loo_8, "elpd_loo")- pointwise(loo_11, "elpd_loo")
par(mar = c(3, 3, 2, 1), mgp = c(1.7, .5, 0), tck = -.02)
plot(golf_new$x, pointwise_elpd_diff, xlim = c(0, 1.1 * max(golf_new$x)),
xaxs = "i", pch = 20, bty = "l",
xlab = "Distance from hole (feet)",
ylab = "pointwise elpd_loo difference",
main = "Predictive performance difference Model 8 vs Model 11")
abline(0, 0, col = "gray", lty = 2)References
Licenses
- Code © 2019–2026, Andrew Gelman and Aki Vehtari, licensed under BSD-3.
- Text © 2019–2026, Andrew Gelman and Aki Vehtari, licensed under CC-BY-NC 4.0.