subsampling LOO estimates with diff-est-srs-wor start #496

avehtari · 2024-04-10T09:01:57Z

Work in progress.

Tested with

family="normal" and stats %in% c("elpd", "mlpd", "gmpd", "mse", "rmse")
family="bernoulli" and stats %in% c("acc","pctcorr")
family="poisson" usings trad and latent approaches
tests pass expect some plot tests fail with svglite: ... Graphics API version mismatch

Notes

n_loo matters only if validate_search = TRUE (with FALSE there is no speed advantage)
in cv_varsel if nloo<n and fast PSIS-LOO result is not yet available, fast PSIS-LOO result is computed
in cv_varsel if nloo<n, fast PSIS-LOO result is stored in slot $summaries_fast
the subsampling indices are stored in slot $loo_inds
the actual subsampling estimating happens in summary_funs.R get_stat()
removed some NA checking and need to recheck if those need to put back
improved quantiles for "mse" if the value is close to 0 (can't get negative lq)
"auc" is not supported (complicated)

add support for incrementally increasing nloo?

tagging @n-kall

This reverts commit 781e331.

avehtari · 2024-04-15T11:09:05Z

I wanted to use R2, and as I had rewrote summary stats anyway, added R2 and made all mse, rmse, and R2 to use only normal approximation with as much shared computation as possible

With the added R2 support, this PR will close also #483

n-kall · 2024-04-16T13:23:14Z

I can take a look

R/cv_varsel.R

R/glmfun.R

R/misc.R

fweber144 · 2024-04-21T20:05:49Z

I have added some comments, but I'm not done with the review yet.

Besides, I think documentation needs to be updated (at least re-roxygenized, but also progressr should be mentioned), the vignettes perhaps as well, and I haven't run R CMD check (including the unit tests) yet. The NEWS file would also need to be updated.

new column behavior of `summary.vsel()`, and the omittance of option `"best"` of argument `baseline`

fweber144 · 2024-08-26T23:58:19Z

Commit a7458b9 reverts the changes with respect to the new "mixed deltas" variant of plot.vsel(), the new column behavior of summary.vsel(), and the omittance of option "best" of argument baseline (the latter is now re-introduced, but disallowed for subsampled LOO-CV).

Now I still need to do points 1 and 3 from #496 (comment) and, as mentioned in #496 (comment), some other things are still missing (documentation, vignettes, R CMD check (including unit tests), NEWS file), apart from open discussions here.

fweber144 · 2024-08-27T00:04:04Z

Commit a7458b9 reverts the changes with respect to the new "mixed deltas" variant of plot.vsel(), the new column behavior of summary.vsel(), and the omittance of option "best" of argument baseline (the latter is now re-introduced, but disallowed for subsampled LOO-CV).

Branch mixed_deltas_plot holds the state before reverting those things. I'll try to merge any future commits that are unrelated to those features into mixed_deltas_plot.

avehtari · 2024-08-27T16:36:49Z

A question related to your comment here: Is it mentioned in Magnusson et al. (2020) that the variances from eq. (8) and eq. (9) need to be summed? I couldn't find such a statement at the first glance, although it would make sense.

It is not mentioned there, but it is required to get the appropriate variance for the plots. I first followed the paper, and then the variance can be really small, and then thinking it through realized that at least in our use case we need to add them together.

avehtari · 2024-08-27T16:41:56Z

btw. I had kept loo_inds for future use. Eventually it would be useful to be able to run sub-sampling-LOO first with e.g. nloo=50, and then add another 50 if the variances are too big.

fweber144 · 2024-08-29T18:55:34Z

btw. I had kept loo_inds for future use. Eventually it would be useful to be able to run sub-sampling-LOO first with e.g. nloo=50, and then add another 50 if the variances are too big.

Makes sense. Did I break something with respect to loo_inds? If yes, that was not intentional.

avehtari · 2024-08-30T07:19:13Z

Makes sense. Did I break something with respect to loo_inds? If yes, that was not intentional.

It's not used, but I thought it's better to keep it there until the incremental sub-sampling is supported, and then decide whether to keep depending on how the incremental sub-sampling is implemented.

fweber144 · 2024-08-30T19:07:22Z

It's not used

Yes, it wasn't, but in commit 19948e3, I changed that (note that there is also a fixup commit: 519dac2). I think using loo_inds in get_stat() is safer than relying on NAs (remember how hard a time we had finding out under which circumstances NAs can occur).

subsampled LOO (`n_loo < n_full`) and everything else (`n_loo == n_full`)

does not change from definition of `var_mse_e` to use of `var_mse_e`, the extra definition of `var_mse_e` can be avoided

`is.null(summaries_baseline)`); one reason is that I'm not sure whether it was supposed to read `mu_baseline <- y` in that case (instead of `mu_baseline <- 0`)

fweber144 · 2024-08-30T19:36:55Z

R/summary_funs.R

+      # simple transformation of mse
+      value <- sqrt(mse_e) - ifelse(is.null(summaries_baseline), 0, sqrt(mse_b))
+      # the first-order Taylor approximation of the variance
+      value_se <- sqrt(value_se^2 / mse_e / 4)


If is.null(summaries_baseline) (i.e., deltas = FALSE), this line should be correct. But if !is.null(summaries_baseline) (i.e., deltas = TRUE), isn't mse_b also a random variable that needs to be taken into account? That would require a bivariate (instead of scalar) delta method (which is not a problem, because the delta method has a straightforward multivariate extension). What I mean is that the function for which the Taylor series approximation is made would then be $f(x_1, x_2) = \sqrt{x_1} - \sqrt{x_2}$ with gradient $\nabla f(x_1, x_2) = (\frac{1}{2 \sqrt{x_1}}, -\frac{1}{2 \sqrt{x_2}})^T$.

fweber144 · 2024-08-30T19:55:27Z

R/summary_funs.R

-        value.se <- weighted.sd((mu - y)^2 - (mu.bs - y)^2, wcv,
-                                na.rm = TRUE) /
-          sqrt(n_notna)
+    # Use normal approximation for mse and delta method for rmse and R2


Is this supposed to refer to the standard error estimation method or the CI method? The first part ("normal approximation") refers to a CI method, but the second part ("delta method") to a standard error estimation method.

fweber144 · 2024-08-30T19:56:09Z

R/summary_funs.R

+    # Compute mean and variance in log scale by matching the variance of a
+    # log-normal approximation


Is it common to assume a log-normal distribution as the sampling distribution of the MSE and RMSE estimators? I haven't seen that yet (I think), but it might be perfectly fine (motivated by the central limit theorem, I guess).

fweber144 · 2024-08-30T19:56:34Z

R/summary_funs.R

+    # store for later calculations
+    mse_e <- value
+    if (!is.null(summaries_baseline)) {
+      # delta=TRUE, variance of difference of two normally distributed


There is something missing at the end; perhaps "random variables"?

fweber144 · 2024-08-30T20:08:38Z

R/summary_funs.R

+                                 ((mu_baseline - y)^2 - mse_b))[loo_inds],
+                          y_idx = loo_inds,
+                          w = wobs)
+        cov_mse_e_b <- srs_diffe$y_hat / n_full^2


Just for my understanding: This procedure for estimating the covariance between mse_e and mse_b assumes that the summands within mse_e and mse_b coming from different observations are uncorrelated, right? At first, I thought this was violated here because mu, mu_baseline, and summaries_fast$mu are model-based and hence there is potential for cross-observations dependencies, but then I realized that mu, mu_baseline, and summaries_fast$mu all are based on the leave-one-out principle, so is this the reason why we can assume a cross-observation correlation of zero here?

avehtari · 2024-08-31T18:46:51Z

When I changed several bootstraps to analytic approximations and improved other approximations, I thought the math I was writing in the code was so trivial that I didn't write all the derivations and assumptions separately. Now I see I should have done that, as it takes also me a bit of time to re-check any of these when you ask a question, so they are not as obvious as I thought them to be. If you like, I can some day write the equations and assumptions for easier checking. Before that, at least every approximation I wrote is based on the tests at least as accurate as the earlier bootstrap, but much faster.

fweber144 · 2024-09-02T19:23:38Z

R/summary_funs.R

+      mse_y <- mean(wobs * (mean(y) - y)^2)
+      value <- 1 - mse_e / mse_y - ifelse(is.null(summaries_baseline), 0, 1 - mse_b / mse_y)
+      # the first-order Taylor approximation of the variance
+      var_mse_y <- .weighted_sd((mean(y) - y)^2, wobs)^2 / n_full


For mean(y), don't we need to take wobs into account? I think this is similar to line var_mse_b <- .weighted_sd((mu_baseline - y)^2, wobs)^2 / n_full where the parameter estimates from which mu_baseline is computed also take wobs into account.

This also concerns several other occurrences of mean(y) here in get_stat().

fweber144 · 2024-09-02T19:43:03Z

R/summary_funs.R

+      if (!is.null(summaries_baseline)) {
+        # delta=TRUE
+        mse_e <- mse_e - mse_b
+      }
+      value_se <- sqrt((value_se^2 -
+                          2 * mse_e / mse_y * cov_mse_e_y +
+                          (mse_e / mse_y)^2 * var_mse_y) / mse_y^2)


If my understanding from stan-dev/loo#205 (comment) is correct, then I think we would need a trivariate delta method in the !is.null(summaries_baseline) case (because mse_b comes in, too). I haven't checked whether such a trivariate delta method would give the same formula as used here. Have you checked this?

fweber144 · 2024-09-05T19:26:51Z

The tests run if I change in setup.R
nobsv <- 41L
to
nobsv <- 43L
but then of course all the results change. But this shows it's a problem with the small nobsv and random data

For me,

nobsv <- 43L

does not work (runs into some error, similarly as nobsv <- 41L did for you). However,

nobsv <- 39L

works for me. Does it work for you as well (on master)? Then I would pick that for the time being. Ultimately, it would be desirable to completely revise the tests because currently, we mainly test the "big" user-level functions, with the tests for all the "small" internal functions being quite short or not existing at all. The principle of testing should rather be to test the underlying functions extensively, because then it is easier to keep the tests for the "big" (and hence slow) user-level functions short.

avehtari · 2024-09-06T16:32:22Z

With nobsv <- 39L I get [ FAIL 0 | WARN 902 | SKIP 2 | PASS 60545 ]

fweber144 · 2024-09-15T19:12:24Z

With nobsv <- 39L I get [ FAIL 0 | WARN 902 | SKIP 2 | PASS 60545 ]

That sounds good. The warnings probably arise from the first creation of the snapshots. If you are running the tests via R CMD check, then from the second run on, you can avoid these warnings by (locally) removing the entry ^tests/testthat/bfits$ from the .Rbuildignore file. For the two skips, I don't know where they are coming from, but they are probably due to a suggested package that is not installed.

Since this solution seems to be working for you, I will push a commit to master (and merge it here) that changes nobsv to 39. As mentioned above, this is only a quick workaround.

…of `cv_varsel()`

…)` calls

This fixes commit 0d73c8e. However, before commit 0d73c8e, `is.null(mu_baseline)` should have never occurred because if `summaries_baseline` was `NULL`, then `mu_baseline` was set to `0` (and if `summaries_baseline` was not `NULL`, then `mu_baseline` was set to `summaries_baseline$mu` which should not be `NULL` either). Hence, this fixup here does not only fix commit 0d73c8e, but also the incorrect behavior which existed before it.

fweber144 · 2024-09-18T19:53:36Z

R/summary_funs.R

+    # log-normal approximation
+    # https://en.wikipedia.org/wiki/Log-normal_distribution#Arithmetic_moments
+    mul <- log(value^2 / sqrt(value_se^2 + value^2))
+    varl <- log(1 + value_se^2 / value^2)


Would it make sense to use log1p() here (for numerical stability)?

avehtari added 8 commits April 10, 2024 11:45

subsampling LOO estimates with diff-est-srs-wor start

bba6adf

put back unnecessarily removed weights back in kfold

781e331

Revert "put back unnecessarily removed weights back in kfold"

d50f4bd

This reverts commit 781e331.

subsampling LOO for acc and pctcor

4de50b7

ignore nloo if validate_search=FALSE

c50bdbf

fix tests

649f0ad

fix mse interval for delta=TRUE

aba8670

don't stop due to repeated arguments

81b5dc3

avehtari requested a review from fweber144 April 12, 2024 13:03

normal approximation for mse, rmse, and R2

0ed8391

n-kall self-assigned this Apr 16, 2024

avehtari requested a review from n-kall April 16, 2024 13:25

avehtari assigned avehtari and unassigned n-kall Apr 16, 2024

avehtari added 5 commits April 16, 2024 18:15

rename internal function select -> .select

a36a3d8

with delta and mse/rmse/R2/acc/pctcorr/auc, plot values in orig scale

2c846a3

don't warn about subsampling

fbda70e

improve messages

1fa7fcd

if available, use progressr for parallel progress bar

7e7fc7a