Package 'oosse'

Title: Out-of-Sample R² with Standard Error Estimation
Description: Estimates out-of-sample R² through bootstrap or cross-validation as a measure of predictive performance. In addition, a standard error for this point estimate is provided, and confidence intervals are constructed.
Authors: Stijn Hawinkel [cre, aut]
Maintainer: Stijn Hawinkel <[email protected]>
License: GPL-2
Version: 1.0.11
Built: 2024-09-04 05:10:18 UTC
Source: https://github.com/sthawinke/oosse

Help Index


The .632 bootstrap estimation of the MSE

Description

The .632 bootstrap estimation of the MSE

Usage

boot632(y, x, id, fitFun, predFun)

Arguments

y

The vector of outcome values

x

The matrix of predictors

id

the sample indices resampled with replacement

fitFun

The function for fitting the prediction model

predFun

The function for evaluating the prediction model

Details

The implementation follows (Efron and Tibshirani 1997)

Value

The MSE estimate

References

Efron B, Tibshirani R (1997). “Improvements on cross-validation: The 632+ bootstrap method.” J. Am. Stat. Assoc., 92(438), 548 - 560.

See Also

estMSE bootOob


Repeated .632 bootstrapa

Description

Repeated .632 bootstrapa

Usage

boot632multiple(nBootstraps, y, ...)

Arguments

nBootstraps

The number of .632 bootstraps

y

The vector of outcome values

...

passed onto boot632

Value

The estimated MSE


The oob bootstrap (smooths leave-one-out CV)

Description

The oob bootstrap (smooths leave-one-out CV)

Usage

bootOob(y, x, id, fitFun, predFun)

Arguments

y

The vector of outcome values

x

The matrix of predictors

id

sample indices sampled with replacement

fitFun

The function for fitting the prediction model

predFun

The function for evaluating the prediction model

Details

The implementation follows (Efron and Tibshirani 1997)

Value

matrix of errors and inclusion times

References

Efron B, Tibshirani R (1997). “Improvements on cross-validation: The 632+ bootstrap method.” J. Am. Stat. Assoc., 92(438), 548 - 560.

See Also

estMSE boot632


Gene expression and phenotypes of Brassica napus (rapeseed) plants

Description

RNA-sequencing data of genetically identical Brassica napus plants in autumn, with 5 phenotypes next spring, as published by De Meyer S, Cruz DF, De Swaef T, Lootens P, Block JD, Bird K, Sprenger H, Van de Voorde M, Hawinkel S, Van Hautegem T, Inzé D, Nelissen H, Roldán-Ruiz I, Maere S (2022). “Predicting yield traits of individual field-grown Brassica napus plants from rosette-stage leaf gene expression.” bioRxiv. doi:10.1101/2022.10.21.513275, https://www.biorxiv.org/content/early/2022/10/23/2022.10.21.513275.full.pdf..

Usage

Brassica

Format

A list with two components Expr and Pheno

Expr

Matrix with Rlog values of 1000 most expressed genes

Pheno

Data frame with 5 phenotypes and x and y coordinates of the plants in the field

Source

doi:10.1101/2022.10.21.513275

References

(De Meyer et al. 2022)


Calculate a confidence interval for R², MSE and MST

Description

Calculate a confidence interval for R², MSE and MST

Usage

buildConfInt(oosseObj, what = c("R2", "MSE", "MST"), conf = 0.95)

Arguments

oosseObj

The result of the R2oosse call

what

For which property should the ci be found: R² (default), MSE or MST

conf

the confidence level required

Details

The upper bound of the interval is truncated at 1 for the R² and the lower bound at 0 for the MSE

The confidence intervals for R² and the MSE are based on standard errors and normal approximations. The confidence interval for the MST is based on the chi-squared distribution as in equation (16) of (Harding et al. 2014), but with inflation by a factor (n+1)/n. All quantities are out-of-sample.

Value

A vector of length 2 with lower and upper bound of the confidence interval

References

Harding B, Tremblay C, Cousineau D (2014). “Standard errors: A review and evaluation of standard error estimators using Monte Carlo simulations.” The Quantitative Methods for Psychology, 10(2), 107 - 123.

See Also

R2oosse

Examples

data(Brassica)
fitFunLM = function(y, x){lm.fit(y = y, x = cbind(1, x))}
predFunLM = function(mod, x) {cbind(1,x) %*% mod$coef}
R2lm = R2oosse(y = Brassica$Pheno$Leaf_8_width, x = Brassica$Expr[, 1:10],
fitFun = fitFunLM, predFun = predFunLM, nFolds = 10)
buildConfInt(R2lm)
buildConfInt(R2lm, what = "MSE")
buildConfInt(R2lm, what = "MST")

Check whether supplied prediction function meets the requirements

Description

Check whether supplied prediction function meets the requirements

Usage

checkFitFun(fitFun, reqArgs = c("y", "x"))

Arguments

fitFun

The prediction function, or its name as character string

reqArgs

The vector of required arguments

Value

Throws an error when requirements not met, otherwise returns the function


Estimate correlation between MSE and MST estimators

Description

Estimate correlation between MSE and MST estimators

Usage

estCorMSEMST(
  y,
  x,
  fitFun,
  predFun,
  methodMSE,
  methodCor,
  nBootstrapsCor,
  nFolds,
  nBootstraps
)

Arguments

y

The vector of outcome values

x

The matrix of predictors

fitFun

The function for fitting the prediction model

predFun

The function for evaluating the prediction model

methodMSE

The method to estimate the MSE, either "CV" for cross-validation or "bootstrap" for .632 bootstrap

methodCor

The method to estimate the correlation between MSE and MST estimators, either "nonparametric" or "jackknife"

nBootstrapsCor

The number of bootstraps to estimate the correlation

nFolds

The number of outer folds for cross-validation

nBootstraps

The number of .632 bootstraps

Value

the estimated correlation


Estimate MSE and its standard error

Description

Estimate MSE and its standard error

Usage

estMSE(
  y,
  x,
  fitFun,
  predFun,
  methodMSE,
  nFolds,
  nInnerFolds,
  cvReps,
  nBootstraps
)

Arguments

y

The vector of outcome values

x

The matrix of predictors

fitFun

The function for fitting the prediction model

predFun

The function for evaluating the prediction model

methodMSE

The method to estimate the MSE, either "CV" for cross-validation or "bootstrap" for .632 bootstrap

nFolds

The number of outer folds for cross-validation

nInnerFolds

The number of inner cross-validation folds

cvReps

The number of repeats for the cross-validation

nBootstraps

The number of .632 bootstraps

Details

The nested cross-validation scheme follows (Bates et al. 2023), the .632 bootstrap is implemented as in (Efron and Tibshirani 1997)

Value

A vector with MSE estimate and its standard error

References

Bates S, Hastie T, Tibshirani R (2023). “Cross-validation: What does it estimate and how well does it do it?” J. Am. Stat. Assoc., 118(ja), 1 - 22. doi:10.1080/01621459.2023.2197686, https://doi.org/10.1080/01621459.2023.2197686.

Efron B, Tibshirani R (1997). “Improvements on cross-validation: The 632+ bootstrap method.” J. Am. Stat. Assoc., 92(438), 548 - 560.


Format seconds into human readable format

Description

Format seconds into human readable format

Usage

formatSeconds(seconds, digits = 2)

Arguments

seconds

The number of seconds to be formatted

digits

the number of digits for rounding

Value

A character vector expressing time in human readable format


Calculate standard error on MSE from nested CV results

Description

Calculate standard error on MSE from nested CV results

Usage

getSEsNested(cvSplitReps, nOuterFolds, n)

Arguments

cvSplitReps

The list of outer and inner CV results

nOuterFolds

The number of outer folds

n

The sample size

Details

The calculation of the standard error of the MSE as proposed by (Bates et al. 2023)

Value

The estimate of the MSE and its standard error

References

Bates S, Hastie T, Tibshirani R (2023). “Cross-validation: What does it estimate and how well does it do it?” J. Am. Stat. Assoc., 118(ja), 1 - 22. doi:10.1080/01621459.2023.2197686, https://doi.org/10.1080/01621459.2023.2197686.

See Also

estMSE


Helper function to check if matrix is positive definite

Description

Helper function to check if matrix is positive definite

Usage

isPD(mat, tol = 1e-06)

Arguments

mat

The matrix

tol

The tolerance

Value

A boolean indicating positive definiteness


Process the out-of-bag bootstraps to get to standard errors following Efron 1997

Description

Process the out-of-bag bootstraps to get to standard errors following Efron 1997

Usage

processOob(x)

Arguments

x

the list with out=of=bag bootstrap results

Value

out-of-bag MSE estimate and standard error


Estimate out-of-sample R² and its standard error

Description

Estimate out-of-sample R² and its standard error

Usage

R2oosse(
  y,
  x,
  fitFun,
  predFun,
  methodMSE = c("CV", "bootstrap"),
  methodCor = c("nonparametric", "jackknife"),
  printTimeEstimate = TRUE,
  nFolds = 10L,
  nInnerFolds = nFolds - 1L,
  cvReps = 200L,
  nBootstraps = 200L,
  nBootstrapsCor = 50L,
  ...
)

Arguments

y

The vector of outcome values

x

The matrix of predictors

fitFun

The function for fitting the prediction model

predFun

The function for evaluating the prediction model

methodMSE

The method to estimate the MSE, either "CV" for cross-validation or "bootstrap" for .632 bootstrap

methodCor

The method to estimate the correlation between MSE and MST estimators, either "nonparametric" or "jackknife"

printTimeEstimate

A boolean, should an estimate of the running time be printed?

nFolds

The number of outer folds for cross-validation

nInnerFolds

The number of inner cross-validation folds

cvReps

The number of repeats for the cross-validation

nBootstraps

The number of .632 bootstraps

nBootstrapsCor

The number of bootstraps to estimate the correlation

...

passed onto fitFun and predFun

Details

Implements the calculation of the R² and its standard error by (Hawinkel et al. 2023). Multithreading is used as provided by the BiocParallel or doParallel packages, A rough estimate of expected computation time is printed when printTimeEstimate is true, but this is purely indicative. The options to estimate the mean squared error (MSE) are cross-validation (Bates et al. 2023) or the .632 bootstrap (Efron and Tibshirani 1997).

Value

A list with components

R2

Estimate of the R² with standard error

MSE

Estimate of the MSE with standard error

MST

Estimate of the MST with standard error

corMSEMST

Estimated correlation between MSE and MST estimators

params

List of parameters used

fullModel

The model trained on the entire dataset using fitFun

n

The sample size of the training data

References

Bates S, Hastie T, Tibshirani R (2023). “Cross-validation: What does it estimate and how well does it do it?” J. Am. Stat. Assoc., 118(ja), 1 - 22. doi:10.1080/01621459.2023.2197686, https://doi.org/10.1080/01621459.2023.2197686.

Efron B, Tibshirani R (1997). “Improvements on cross-validation: The 632+ bootstrap method.” J. Am. Stat. Assoc., 92(438), 548 - 560.

Hawinkel S, Waegeman W, Maere S (2023). “Out-of-sample R2: Estimation and inference.” Am. Stat., 1 - 16. doi:10.1080/00031305.2023.2216252, https://doi.org/10.1080/00031305.2023.2216252.

See Also

buildConfInt

Examples

data(Brassica)
#Linear model
fitFunLM = function(y, x){lm.fit(y = y, x = cbind(1, x))}
predFunLM = function(mod, x) {cbind(1,x) %*% mod$coef}
y = Brassica$Pheno$Leaf_8_width
R2lm = R2oosse(y = Brassica$Pheno$Leaf_8_width, x = Brassica$Expr[, 1:10],
fitFun = fitFunLM, predFun = predFunLM, nFolds = 10)

Calculate out-of-sample R² and its standard error based on MSE estimates

Description

Calculate out-of-sample R² and its standard error based on MSE estimates

Usage

RsquaredSE(MSE, margVar, SEMSE, n, corMSEMST)

Arguments

MSE

An estimate of the mean squared error (MSE)

margVar

The marginal variance of the outcome, not scaled by (n+1)/n

SEMSE

The standard error on the MSE estimate

n

the sample size of the training data

corMSEMST

The correlation between MSE and marginal variance estimates

Details

This function is exported to allow the user to estimate the MSE and its standard error and the correlation between MSE and MST estimators himself. The marginal variance is scaled by (n+1)/n to the out-of-sample MST, so the user does not need to do this.

Value

A vector with the R² and standard error estimates

References

Hawinkel S, Waegeman W, Maere S (2023). “Out-of-sample R2: Estimation and inference.” Am. Stat., 1 - 16. doi:10.1080/00031305.2023.2216252, https://doi.org/10.1080/00031305.2023.2216252.

See Also

R2oosse

Examples

#The out-of-sample R² calculated using externally provided estimates
RsquaredSE(MSE = 3, margVar = 4, SEMSE = 0.4, n = 50, corMSEMST = 0.75)

Perform simple CV, and return the MSE estimate

Description

Perform simple CV, and return the MSE estimate

Usage

simpleCV(y, x, fitFun, predFun, nFolds)

Arguments

y

The vector of outcome values

x

The matrix of predictors

fitFun

The function for fitting the prediction model

predFun

The function for evaluating the prediction model

nFolds

The number of outer folds for cross-validation

Value

The MSE estimate