Title: | Ensemble Partial Least Squares Regression |
---|---|
Description: | An algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions. |
Authors: | Nan Xiao [aut, cre] , Dong-Sheng Cao [aut], Miao-Zhu Li [aut], Qing-Song Xu [aut] |
Maintainer: | Nan Xiao <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 6.1 |
Built: | 2024-11-15 03:22:17 UTC |
Source: | https://github.com/nanxstats/enpls |
Methylalkanes retention index dataset from Liang et, al.
data("alkanes")
data("alkanes")
A list with 2 components:
x - data frame with 207 rows (samples) and 21 columns (predictors)
y - numeric vector of length 207 (response)
This dataset contains 207 methylalkanes' chromatographic retention index (y) which have been modeled by 21 molecular descriptors (x).
Molecular descriptor types:
Chi path, cluster and path/cluster indices
Kappa shape indices
E-state indices
Molecular electricity distance vector index
Yi-Zeng Liang, Da-Lin Yuan, Qing-Song Xu, and Olav Martin Kvalheim. "Modeling based on subspace orthogonal projections for QSAR and QSPR research." Journal of Chemometrics 22, no. 1 (2008): 23–35.
data("alkanes") str(alkanes)
data("alkanes") str(alkanes)
K-fold cross validation for ensemble partial least squares regression.
cv.enpls(x, y, nfolds = 5L, verbose = TRUE, ...)
cv.enpls(x, y, nfolds = 5L, verbose = TRUE, ...)
x |
Predictor matrix. |
y |
Response vector. |
nfolds |
Number of cross-validation folds, default is |
verbose |
Shall we print out the progress of cross-validation? |
... |
Arguments to be passed to |
A list containing:
ypred
- a matrix containing two columns: real y and predicted y
residual
- cross validation result (y.pred - y.real)
RMSE
- RMSE
MAE
- MAE
Rsquare
- Rsquare
To maximize the probablity that each observation can
be selected in the test set (thus the prediction uncertainty
can be measured), please try setting a large reptimes
.
Nan Xiao <https://nanx.me>
See enpls.fit
for ensemble
partial least squares regressions.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) cvfit <- cv.enpls(x, y, reptimes = 10) print(cvfit) plot(cvfit)
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) cvfit <- cv.enpls(x, y, reptimes = 10) print(cvfit) plot(cvfit)
K-fold cross validation for ensemble sparse partial least squares regression.
cv.enspls(x, y, nfolds = 5L, verbose = TRUE, ...)
cv.enspls(x, y, nfolds = 5L, verbose = TRUE, ...)
x |
Predictor matrix. |
y |
Response vector. |
nfolds |
Number of cross-validation folds, default is |
verbose |
Shall we print out the progress of cross-validation? |
... |
Arguments to be passed to |
A list containing:
ypred
- a matrix containing two columns: real y and predicted y
residual
- cross validation result (y.pred - y.real)
RMSE
- RMSE
MAE
- MAE
Rsquare
- Rsquare
To maximize the probablity that each observation can
be selected in the test set (thus the prediction uncertainty
can be measured), please try setting a large reptimes
.
Nan Xiao <https://nanx.me>
See enspls.fit
for ensemble sparse
partial least squares regressions.
# This example takes one minute to run ## Not run: data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) cvfit <- cv.enspls(x, y, reptimes = 10) print(cvfit) plot(cvfit) ## End(Not run)
# This example takes one minute to run ## Not run: data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) cvfit <- cv.enspls(x, y, reptimes = 10) print(cvfit) plot(cvfit) ## End(Not run)
Model applicability domain evaluation with ensemble partial least squares.
enpls.ad( x, y, xtest, ytest, maxcomp = NULL, cvfolds = 5L, space = c("sample", "variable"), method = c("mc", "boot"), reptimes = 500L, ratio = 0.8, parallel = 1L )
enpls.ad( x, y, xtest, ytest, maxcomp = NULL, cvfolds = 5L, space = c("sample", "variable"), method = c("mc", "boot"), reptimes = 500L, ratio = 0.8, parallel = 1L )
x |
Predictor matrix of the training set. |
y |
Response vector of the training set. |
xtest |
List, with the i-th component being the i-th test set's predictor matrix (see example code below). |
ytest |
List, with the i-th component being the i-th test set's response vector (see example code below). |
maxcomp |
Maximum number of components included within each model. If not specified, will use the maximum number possible (considering cross-validation and special cases where n is smaller than p). |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
space |
Space in which to apply the resampling method.
Can be the sample space ( |
method |
Resampling method. |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
A list containing:
tr.error.mean
-
absolute mean prediction error for training set
tr.error.median
-
absolute median prediction error for training set
tr.error.sd
-
prediction error sd for training set
tr.error.matrix
-
raw prediction error matrix for training set
te.error.mean
-
list of absolute mean prediction error for test set(s)
te.error.median
-
list of absolute median prediction error for test set(s)
te.error.sd
-
list of prediction error sd for test set(s)
te.error.matrix
-
list of raw prediction error matrix for test set(s)
Note that for space = "variable"
, method
could
only be "mc"
, since bootstrapping in the variable space
will create duplicated variables, and that could cause problems.
Nan Xiao <https://nanx.me>
data("alkanes") x <- alkanes$x y <- alkanes$y # training set x.tr <- x[1:100, ] y.tr <- y[1:100] # two test sets x.te <- list( "test.1" = x[101:150, ], "test.2" = x[151:207, ] ) y.te <- list( "test.1" = y[101:150], "test.2" = y[151:207] ) set.seed(42) ad <- enpls.ad( x.tr, y.tr, x.te, y.te, space = "variable", method = "mc", ratio = 0.9, reptimes = 50 ) print(ad) plot(ad) # the interactive plot requires a HTML viewer ## Not run: plot(ad, type = "interactive") ## End(Not run)
data("alkanes") x <- alkanes$x y <- alkanes$y # training set x.tr <- x[1:100, ] y.tr <- y[1:100] # two test sets x.te <- list( "test.1" = x[101:150, ], "test.2" = x[151:207, ] ) y.te <- list( "test.1" = y[101:150], "test.2" = y[151:207] ) set.seed(42) ad <- enpls.ad( x.tr, y.tr, x.te, y.te, space = "variable", method = "mc", ratio = 0.9, reptimes = 50 ) print(ad) plot(ad) # the interactive plot requires a HTML viewer ## Not run: plot(ad, type = "interactive") ## End(Not run)
Ensemble partial least squares regression.
enpls.fit( x, y, maxcomp = NULL, cvfolds = 5L, reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
enpls.fit( x, y, maxcomp = NULL, cvfolds = 5L, reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
x |
Predictor matrix. |
y |
Response vector. |
maxcomp |
Maximum number of components included within each model. If not specified, will use the maximum number possible (considering cross-validation and special cases where n is smaller than p). |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
method |
Resampling method. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
A list containing all partial least squares model objects.
Nan Xiao <https://nanx.me>
See enpls.fs
for measuring feature importance
with ensemble partial least squares regressions.
See enpls.od
for outlier detection with ensemble
partial least squares regressions.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fit <- enpls.fit(x, y, reptimes = 50) print(fit) predict(fit, newx = x)
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fit <- enpls.fit(x, y, reptimes = 50) print(fit) predict(fit, newx = x)
Measuring feature importance with ensemble partial least squares.
enpls.fs( x, y, maxcomp = NULL, cvfolds = 5L, reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
enpls.fs( x, y, maxcomp = NULL, cvfolds = 5L, reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
x |
Predictor matrix. |
y |
Response vector. |
maxcomp |
Maximum number of components included within each model. If not specified, will use the maximum number possible (considering cross-validation and special cases where n is smaller than p). |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
method |
Resampling method. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
A list containing two components:
variable.importance
- a vector of variable importance
coefficient.matrix
- original coefficient matrix
Nan Xiao <https://nanx.me>
See enpls.od
for outlier detection with
ensemble partial least squares regressions.
See enpls.fit
for fitting ensemble partial least
squares regression models.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fs <- enpls.fs(x, y, reptimes = 50) print(fs) plot(fs)
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fs <- enpls.fs(x, y, reptimes = 50) print(fs) plot(fs)
Mean Absolute Error (MAE)
enpls.mae(yreal, ypred)
enpls.mae(yreal, ypred)
yreal |
true response vector |
ypred |
predicted response vector |
MAE
Nan Xiao <https://nanx.me>
Outlier detection with ensemble partial least squares.
enpls.od( x, y, maxcomp = NULL, cvfolds = 5L, reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
enpls.od( x, y, maxcomp = NULL, cvfolds = 5L, reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
x |
Predictor matrix. |
y |
Response vector. |
maxcomp |
Maximum number of components included within each model. If not specified, will use the maximum number possible (considering cross-validation and special cases where n is smaller than p). |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
method |
Resampling method. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
A list containing four components:
error.mean
- error mean for all samples (absolute value)
error.median
- error median for all samples
error.sd
- error sd for all samples
predict.error.matrix
- the original prediction error matrix
To maximize the probablity that each observation can
be selected in the test set (thus the prediction uncertainty
can be measured), please try setting a large reptimes
.
Nan Xiao <https://nanx.me>
See enpls.fs
for measuring feature importance with
ensemble partial least squares regressions.
See enpls.fit
for fitting ensemble partial least
squares regression models.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) od <- enpls.od(x, y, reptimes = 50) print(od) plot(od) plot(od, criterion = "sd")
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) od <- enpls.od(x, y, reptimes = 50) print(od) plot(od) plot(od, criterion = "sd")
Compute Root Mean Squared Error (RMSE).
enpls.rmse(yreal, ypred)
enpls.rmse(yreal, ypred)
yreal |
true response vector |
ypred |
predicted response vector |
RMSE
Nan Xiao <https://nanx.me>
Root Mean Squared Logarithmic Error (RMSLE)
enpls.rmsle(yreal, ypred)
enpls.rmsle(yreal, ypred)
yreal |
true response vector |
ypred |
predicted response vector |
RMSLE
Nan Xiao <https://nanx.me>
Model applicability domain evaluation with ensemble sparse partial least squares.
enspls.ad( x, y, xtest, ytest, maxcomp = 5L, cvfolds = 5L, alpha = seq(0.2, 0.8, 0.2), space = c("sample", "variable"), method = c("mc", "boot"), reptimes = 500L, ratio = 0.8, parallel = 1L )
enspls.ad( x, y, xtest, ytest, maxcomp = 5L, cvfolds = 5L, alpha = seq(0.2, 0.8, 0.2), space = c("sample", "variable"), method = c("mc", "boot"), reptimes = 500L, ratio = 0.8, parallel = 1L )
x |
Predictor matrix of the training set. |
y |
Response vector of the training set. |
xtest |
List, with the i-th component being the i-th test set's predictor matrix (see example code below). |
ytest |
List, with the i-th component being the i-th test set's response vector (see example code below). |
maxcomp |
Maximum number of components included within each model.
If not specified, will use |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
alpha |
Parameter (grid) controlling sparsity of the model.
If not specified, default is |
space |
Space in which to apply the resampling method.
Can be the sample space ( |
method |
Resampling method. |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
A list containing:
tr.error.mean
-
absolute mean prediction error for training set
tr.error.median
-
absolute median prediction error for training set
tr.error.sd
-
prediction error sd for training set
tr.error.matrix
-
raw prediction error matrix for training set
te.error.mean
-
list of absolute mean prediction error for test set(s)
te.error.median
-
list of absolute median prediction error for test set(s)
te.error.sd
-
list of prediction error sd for test set(s)
te.error.matrix
-
list of raw prediction error matrix for test set(s)
Note that for space = "variable"
, method
could
only be "mc"
, since bootstrapping in the variable space
will create duplicated variables, and that could cause problems.
Nan Xiao <https://nanx.me>
data("logd1k") # remove low variance variables x <- logd1k$x[, -c(17, 52, 59)] y <- logd1k$y # training set x.tr <- x[1:300, ] y.tr <- y[1:300] # two test sets x.te <- list( "test.1" = x[301:400, ], "test.2" = x[401:500, ] ) y.te <- list( "test.1" = y[301:400], "test.2" = y[401:500] ) set.seed(42) ad <- enspls.ad( x.tr, y.tr, x.te, y.te, maxcomp = 3, alpha = c(0.3, 0.6, 0.9), space = "variable", method = "mc", ratio = 0.8, reptimes = 10 ) print(ad) plot(ad) # the interactive plot requires a HTML viewer ## Not run: plot(ad, type = "interactive") ## End(Not run)
data("logd1k") # remove low variance variables x <- logd1k$x[, -c(17, 52, 59)] y <- logd1k$y # training set x.tr <- x[1:300, ] y.tr <- y[1:300] # two test sets x.te <- list( "test.1" = x[301:400, ], "test.2" = x[401:500, ] ) y.te <- list( "test.1" = y[301:400], "test.2" = y[401:500] ) set.seed(42) ad <- enspls.ad( x.tr, y.tr, x.te, y.te, maxcomp = 3, alpha = c(0.3, 0.6, 0.9), space = "variable", method = "mc", ratio = 0.8, reptimes = 10 ) print(ad) plot(ad) # the interactive plot requires a HTML viewer ## Not run: plot(ad, type = "interactive") ## End(Not run)
Ensemble sparse partial least squares regression.
enspls.fit( x, y, maxcomp = 5L, cvfolds = 5L, alpha = seq(0.2, 0.8, 0.2), reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
enspls.fit( x, y, maxcomp = 5L, cvfolds = 5L, alpha = seq(0.2, 0.8, 0.2), reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
x |
Predictor matrix. |
y |
Response vector. |
maxcomp |
Maximum number of components included within each model.
If not specified, will use |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
alpha |
Parameter (grid) controlling sparsity of the model.
If not specified, default is |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
method |
Resampling method. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
A list containing all sparse partial least squares model objects.
Nan Xiao <https://nanx.me>
See enspls.fs
for measuring feature importance
with ensemble sparse partial least squares regressions.
See enspls.od
for outlier detection with ensemble
sparse partial least squares regressions.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fit <- enspls.fit( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) print(fit) predict(fit, newx = x)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fit <- enspls.fit( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) print(fit) predict(fit, newx = x)
Measuring feature importance with ensemble sparse partial least squares.
enspls.fs( x, y, maxcomp = 5L, cvfolds = 5L, alpha = seq(0.2, 0.8, 0.2), reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
enspls.fs( x, y, maxcomp = 5L, cvfolds = 5L, alpha = seq(0.2, 0.8, 0.2), reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
x |
Predictor matrix. |
y |
Response vector. |
maxcomp |
Maximum number of components included within each model.
If not specified, will use |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
alpha |
Parameter (grid) controlling sparsity of the model.
If not specified, default is |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
method |
Resampling method. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
A list containing two components:
variable.importance
- a vector of variable importance
coefficient.matrix
- original coefficient matrix
Nan Xiao <https://nanx.me>
See enspls.od
for outlier detection with
ensemble sparse partial least squares regressions.
See enspls.fit
for fitting ensemble sparse
partial least squares regression models.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fs <- enspls.fs(x, y, reptimes = 5, maxcomp = 2) print(fs, nvar = 10) plot(fs, nvar = 10) plot(fs, type = "boxplot", limits = c(0.05, 0.95), nvar = 10)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fs <- enspls.fs(x, y, reptimes = 5, maxcomp = 2) print(fs, nvar = 10) plot(fs, nvar = 10) plot(fs, type = "boxplot", limits = c(0.05, 0.95), nvar = 10)
Outlier detection with ensemble sparse partial least squares.
enspls.od( x, y, maxcomp = 5L, cvfolds = 5L, alpha = seq(0.2, 0.8, 0.2), reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
enspls.od( x, y, maxcomp = 5L, cvfolds = 5L, alpha = seq(0.2, 0.8, 0.2), reptimes = 500L, method = c("mc", "boot"), ratio = 0.8, parallel = 1L )
x |
Predictor matrix. |
y |
Response vector. |
maxcomp |
Maximum number of components included within each model.
If not specified, will use |
cvfolds |
Number of cross-validation folds used in each model
for automatic parameter selection, default is |
alpha |
Parameter (grid) controlling sparsity of the model.
If not specified, default is |
reptimes |
Number of models to build with Monte-Carlo resampling or bootstrapping. |
method |
Resampling method. |
ratio |
Sampling ratio used when |
parallel |
Integer. Number of CPU cores to use.
Default is |
A list containing four components:
error.mean
- error mean for all samples (absolute value)
error.median
- error median for all samples
error.sd
- error sd for all samples
predict.error.matrix
- the original prediction error matrix
To maximize the probablity that each observation can
be selected in the test set (thus the prediction uncertainty
can be measured), please try setting a large reptimes
.
Nan Xiao <https://nanx.me>
See enspls.fs
for measuring feature importance
with ensemble sparse partial least squares regressions.
See enspls.fit
for fitting ensemble sparse
partial least squares regression models.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) od <- enspls.od( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) plot(od, prob = 0.1) plot(od, criterion = "sd", sdtimes = 1)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) od <- enspls.od( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) plot(od, prob = 0.1) plot(od, criterion = "sd", sdtimes = 1)
Distribution coefficients at pH 7.4 (logD7.4) dataset from Wang et, al.
data(logd1k)
data(logd1k)
A list with 2 components:
x - data frame with 1,000 rows (samples) and 80 columns (predictors)
y - numeric vector of length 1,000 (response)
The first 1000 compounds in the original dataset were selected.
This dataset contains distribution coefficients at pH 7.4 (logD7.4) for 1,000 compounds, and 80 molecular descriptors computed with RDKit.
Jian-Bing Wang, Dong-Sheng Cao, Min-Feng Zhu, Yong-Huan Yun, Nan Xiao, and Yi-Zeng Liang. "In silico evaluation of logD7.4 and comparison with other prediction methods." Journal of Chemometrics 29, no. 7 (2015): 389–398.
data(logd1k) str(logd1k)
data(logd1k) str(logd1k)
Plot cv.enpls object
## S3 method for class 'cv.enpls' plot(x, xlim = NULL, ylim = NULL, alpha = 0.8, main = NULL, ...)
## S3 method for class 'cv.enpls' plot(x, xlim = NULL, ylim = NULL, alpha = 0.8, main = NULL, ...)
x |
An object of class |
xlim |
x Vector of length 2 - x axis limits of the plot. |
ylim |
y Vector of length 2 - y axis limits of the plot. |
alpha |
An alpha transparency value for points, a real number in (0, 1]. |
main |
Plot title, not used currently. |
... |
Additional graphical parameters, not used currently. |
Nan Xiao <https://nanx.me>
See cv.enpls
for cross-validation of
ensemble partial least squares regression models.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) cvfit <- cv.enpls(x, y, reptimes = 10) plot(cvfit)
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) cvfit <- cv.enpls(x, y, reptimes = 10) plot(cvfit)
Plot cv.enspls object
## S3 method for class 'cv.enspls' plot(x, xlim = NULL, ylim = NULL, alpha = 0.8, main = NULL, ...)
## S3 method for class 'cv.enspls' plot(x, xlim = NULL, ylim = NULL, alpha = 0.8, main = NULL, ...)
x |
An object of class |
xlim |
x Vector of length 2 - x axis limits of the plot. |
ylim |
y Vector of length 2 - y axis limits of the plot. |
alpha |
An alpha transparency value for points, a real number in (0, 1]. |
main |
Plot title, not used currently. |
... |
Additional graphical parameters, not used currently. |
Nan Xiao <https://nanx.me>
See cv.enspls
for cross-validation of
ensemble sparse partial least squares regression models.
# This example takes one minute to run ## Not run: data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) cvfit <- cv.enspls(x, y, reptimes = 10) plot(cvfit) ## End(Not run)
# This example takes one minute to run ## Not run: data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) cvfit <- cv.enspls(x, y, reptimes = 10) plot(cvfit) ## End(Not run)
Plot enpls.ad object
## S3 method for class 'enpls.ad' plot(x, type = c("static", "interactive"), main = NULL, ...)
## S3 method for class 'enpls.ad' plot(x, type = c("static", "interactive"), main = NULL, ...)
x |
An object of class |
type |
Plot type. Can be |
main |
Plot title, not used currently. |
... |
Additional graphical parameters, not used currently. |
Nan Xiao <https://nanx.me>
See enpls.ad
for model applicability domain
evaluation with ensemble partial least squares regressions.
data("alkanes") x <- alkanes$x y <- alkanes$y # training set x.tr <- x[1:100, ] y.tr <- y[1:100] # two test sets x.te <- list( "test.1" = x[101:150, ], "test.2" = x[151:207, ] ) y.te <- list( "test.1" = y[101:150], "test.2" = y[151:207] ) set.seed(42) ad <- enpls.ad( x.tr, y.tr, x.te, y.te, space = "variable", method = "mc", ratio = 0.9, reptimes = 50 ) plot(ad) # the interactive plot requires a HTML viewer ## Not run: plot(ad, type = "interactive") ## End(Not run)
data("alkanes") x <- alkanes$x y <- alkanes$y # training set x.tr <- x[1:100, ] y.tr <- y[1:100] # two test sets x.te <- list( "test.1" = x[101:150, ], "test.2" = x[151:207, ] ) y.te <- list( "test.1" = y[101:150], "test.2" = y[151:207] ) set.seed(42) ad <- enpls.ad( x.tr, y.tr, x.te, y.te, space = "variable", method = "mc", ratio = 0.9, reptimes = 50 ) plot(ad) # the interactive plot requires a HTML viewer ## Not run: plot(ad, type = "interactive") ## End(Not run)
Plot enpls.fs object
## S3 method for class 'enpls.fs' plot( x, nvar = NULL, type = c("dotplot", "boxplot"), limits = c(0, 1), main = NULL, ... )
## S3 method for class 'enpls.fs' plot( x, nvar = NULL, type = c("dotplot", "boxplot"), limits = c(0, 1), main = NULL, ... )
x |
An object of class |
nvar |
Number of top variables to show. Ignored if |
type |
Plot type. |
limits |
Vector of length 2. Set boxplot limits (in quantile) to remove the extreme outlier coefficients. |
main |
Plot title, not used currently. |
... |
Additional graphical parameters, not used currently. |
Nan Xiao <https://nanx.me>
See enpls.fs
for measuring feature importance with
ensemble partial least squares regressions.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fs <- enpls.fs(x, y, reptimes = 50) plot(fs) plot(fs, nvar = 10) plot(fs, type = "boxplot") plot(fs, type = "boxplot", limits = c(0.05, 0.95))
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fs <- enpls.fs(x, y, reptimes = 50) plot(fs) plot(fs, nvar = 10) plot(fs, type = "boxplot") plot(fs, type = "boxplot", limits = c(0.05, 0.95))
Plot enpls.od object
## S3 method for class 'enpls.od' plot( x, criterion = c("quantile", "sd"), prob = 0.05, sdtimes = 3L, alpha = 1, main = NULL, ... )
## S3 method for class 'enpls.od' plot( x, criterion = c("quantile", "sd"), prob = 0.05, sdtimes = 3L, alpha = 1, main = NULL, ... )
x |
An object of class |
criterion |
Criterion of being classified as an outlier,
can be |
prob |
Quantile probability as the cut-off value. |
sdtimes |
Times of standard deviation as the cut-off value. |
alpha |
An alpha transparency value for points, a real number in (0, 1]. |
main |
Plot title. |
... |
Additional graphical parameters for |
Nan Xiao <https://nanx.me>
See enpls.od
for outlier detection with
ensemble partial least squares regressions.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) od <- enpls.od(x, y, reptimes = 50) plot(od, criterion = "quantile") plot(od, criterion = "sd")
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) od <- enpls.od(x, y, reptimes = 50) plot(od, criterion = "quantile") plot(od, criterion = "sd")
Plot enspls.ad object
## S3 method for class 'enspls.ad' plot(x, type = c("static", "interactive"), main = NULL, ...)
## S3 method for class 'enspls.ad' plot(x, type = c("static", "interactive"), main = NULL, ...)
x |
An object of class |
type |
Plot type. Can be |
main |
Plot title. |
... |
Additional graphical parameters for |
Nan Xiao <https://nanx.me>
See enspls.ad
for model applicability domain
evaluation with ensemble sparse partial least squares regressions.
data("logd1k") # remove low variance variables x <- logd1k$x[, -c(17, 52, 59)] y <- logd1k$y # training set x.tr <- x[1:300, ] y.tr <- y[1:300] # two test sets x.te <- list( "test.1" = x[301:400, ], "test.2" = x[401:500, ] ) y.te <- list( "test.1" = y[301:400], "test.2" = y[401:500] ) set.seed(42) ad <- enspls.ad( x.tr, y.tr, x.te, y.te, maxcomp = 3, alpha = c(0.3, 0.6, 0.9), space = "variable", method = "mc", ratio = 0.8, reptimes = 10 ) plot(ad) # the interactive plot requires a HTML viewer ## Not run: plot(ad, type = "interactive") ## End(Not run)
data("logd1k") # remove low variance variables x <- logd1k$x[, -c(17, 52, 59)] y <- logd1k$y # training set x.tr <- x[1:300, ] y.tr <- y[1:300] # two test sets x.te <- list( "test.1" = x[301:400, ], "test.2" = x[401:500, ] ) y.te <- list( "test.1" = y[301:400], "test.2" = y[401:500] ) set.seed(42) ad <- enspls.ad( x.tr, y.tr, x.te, y.te, maxcomp = 3, alpha = c(0.3, 0.6, 0.9), space = "variable", method = "mc", ratio = 0.8, reptimes = 10 ) plot(ad) # the interactive plot requires a HTML viewer ## Not run: plot(ad, type = "interactive") ## End(Not run)
Plot enspls.fs object
## S3 method for class 'enspls.fs' plot( x, nvar = NULL, type = c("dotplot", "boxplot"), limits = c(0, 1), main = NULL, ... )
## S3 method for class 'enspls.fs' plot( x, nvar = NULL, type = c("dotplot", "boxplot"), limits = c(0, 1), main = NULL, ... )
x |
An object of class |
nvar |
Number of top variables to show. Ignored if |
type |
Plot type, can be |
limits |
Vector of length 2. Set boxplot limits (in quantile) to remove the extreme outlier coefficients. |
main |
Plot title, not used currently. |
... |
Additional graphical parameters, not used currently. |
Nan Xiao <https://nanx.me>
See enspls.fs
for measuring feature importance with
ensemble sparse partial least squares regressions.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fs <- enspls.fs(x, y, reptimes = 5, maxcomp = 2) plot(fs, nvar = 10) plot(fs, type = "boxplot", limits = c(0.05, 0.95), nvar = 10)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fs <- enspls.fs(x, y, reptimes = 5, maxcomp = 2) plot(fs, nvar = 10) plot(fs, type = "boxplot", limits = c(0.05, 0.95), nvar = 10)
Plot enspls.od object
## S3 method for class 'enspls.od' plot( x, criterion = c("quantile", "sd"), prob = 0.05, sdtimes = 3L, alpha = 1, main = NULL, ... )
## S3 method for class 'enspls.od' plot( x, criterion = c("quantile", "sd"), prob = 0.05, sdtimes = 3L, alpha = 1, main = NULL, ... )
x |
An object of class |
criterion |
Criterion of being classified as an outlier,
can be |
prob |
Quantile probability as the cut-off value. |
sdtimes |
Times of standard deviation as the cut-off value. |
alpha |
An alpha transparency value for points, a real number in (0, 1]. |
main |
Plot title. |
... |
Additional graphical parameters for |
Nan Xiao <https://nanx.me>
See enspls.od
for outlier detection with
ensemble sparse partial least squares regressions.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) od <- enspls.od(x, y, reptimes = 4, maxcomp = 2) plot(od, criterion = "quantile", prob = 0.1) plot(od, criterion = "sd", sdtimes = 1)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) od <- enspls.od(x, y, reptimes = 4, maxcomp = 2) plot(od, criterion = "quantile", prob = 0.1) plot(od, criterion = "sd", sdtimes = 1)
Make predictions on new data by fitted enpls.fit object.
## S3 method for class 'enpls.fit' predict(object, newx, method = c("mean", "median"), ...)
## S3 method for class 'enpls.fit' predict(object, newx, method = c("mean", "median"), ...)
object |
An object of class |
newx |
New data to predict with. |
method |
Use |
... |
Additional parameters for |
A numeric vector containing the predicted values.
Nan Xiao <https://nanx.me>
See enpls.fit
for fitting ensemble
partial least squares regression models.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fit <- enpls.fit(x, y, reptimes = 50) y.pred <- predict(fit, newx = x) plot(y, y.pred, xlim = range(y), ylim = range(y)) abline(a = 0L, b = 1L) y.pred.med <- predict(fit, newx = x, method = "median") plot(y, y.pred.med, xlim = range(y), ylim = range(y)) abline(a = 0L, b = 1L)
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fit <- enpls.fit(x, y, reptimes = 50) y.pred <- predict(fit, newx = x) plot(y, y.pred, xlim = range(y), ylim = range(y)) abline(a = 0L, b = 1L) y.pred.med <- predict(fit, newx = x, method = "median") plot(y, y.pred.med, xlim = range(y), ylim = range(y)) abline(a = 0L, b = 1L)
Make predictions on new data by fitted enspls.fit object.
## S3 method for class 'enspls.fit' predict(object, newx, method = c("mean", "median"), ...)
## S3 method for class 'enspls.fit' predict(object, newx, method = c("mean", "median"), ...)
object |
An object of class |
newx |
New data to predict with. |
method |
Use |
... |
Additional parameters for |
A numeric vector containing the predicted values.
Nan Xiao <https://nanx.me>
See enspls.fit
for fitting ensemble sparse
partial least squares regression models.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fit <- enspls.fit(x, y, reptimes = 5, maxcomp = 2) y.pred <- predict(fit, newx = x) plot(y, y.pred, xlim = range(y), ylim = range(y)) abline(a = 0L, b = 1L) y.pred.med <- predict(fit, newx = x, method = "median") plot(y, y.pred.med, xlim = range(y), ylim = range(y)) abline(a = 0L, b = 1L)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fit <- enspls.fit(x, y, reptimes = 5, maxcomp = 2) y.pred <- predict(fit, newx = x) plot(y, y.pred, xlim = range(y), ylim = range(y)) abline(a = 0L, b = 1L) y.pred.med <- predict(fit, newx = x, method = "median") plot(y, y.pred.med, xlim = range(y), ylim = range(y)) abline(a = 0L, b = 1L)
Print cv.enpls object.
## S3 method for class 'cv.enpls' print(x, ...)
## S3 method for class 'cv.enpls' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See cv.enpls
for cross-validation of ensemble
partial least squares regression models.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) cvfit <- cv.enpls(x, y, reptimes = 10) cvfit
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) cvfit <- cv.enpls(x, y, reptimes = 10) cvfit
Print cv.enspls object.
## S3 method for class 'cv.enspls' print(x, ...)
## S3 method for class 'cv.enspls' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See cv.enspls
for cross-validation of
ensemble sparse partial least squares regression models.
# This example takes one minute to run ## Not run: data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) cvfit <- cv.enspls(x, y, reptimes = 10) print(cvfit) ## End(Not run)
# This example takes one minute to run ## Not run: data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) cvfit <- cv.enspls(x, y, reptimes = 10) print(cvfit) ## End(Not run)
Print enpls.ad object.
## S3 method for class 'enpls.ad' print(x, ...)
## S3 method for class 'enpls.ad' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See enpls.ad
for model applicability domain
evaluation with ensemble partial least squares regressions.
data("alkanes") x <- alkanes$x y <- alkanes$y # training set x.tr <- x[1:100, ] y.tr <- y[1:100] # two test sets x.te <- list( "test.1" = x[101:150, ], "test.2" = x[151:207, ] ) y.te <- list( "test.1" = y[101:150], "test.2" = y[151:207] ) set.seed(42) ad <- enpls.ad( x.tr, y.tr, x.te, y.te, space = "variable", method = "mc", ratio = 0.9, reptimes = 50 ) ad
data("alkanes") x <- alkanes$x y <- alkanes$y # training set x.tr <- x[1:100, ] y.tr <- y[1:100] # two test sets x.te <- list( "test.1" = x[101:150, ], "test.2" = x[151:207, ] ) y.te <- list( "test.1" = y[101:150], "test.2" = y[151:207] ) set.seed(42) ad <- enpls.ad( x.tr, y.tr, x.te, y.te, space = "variable", method = "mc", ratio = 0.9, reptimes = 50 ) ad
Print coefficients of each model in the enpls.fit object.
## S3 method for class 'enpls.fit' print(x, ...)
## S3 method for class 'enpls.fit' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See enpls.fit
for fitting ensemble
partial least squares regression models.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fit <- enpls.fit(x, y, reptimes = 50) fit
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fit <- enpls.fit(x, y, reptimes = 50) fit
Print enpls.fs object.
## S3 method for class 'enpls.fs' print(x, sort = TRUE, nvar = NULL, ...)
## S3 method for class 'enpls.fs' print(x, sort = TRUE, nvar = NULL, ...)
x |
An object of class |
sort |
Should the variables be sorted in decreasing order of importance? |
nvar |
Number of top variables to show. Ignored if |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See enpls.fs
for measuring feature importance with
ensemble partial least squares regressions.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fs <- enpls.fs(x, y, reptimes = 100) print(fs) print(fs, nvar = 10L)
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) fs <- enpls.fs(x, y, reptimes = 100) print(fs) print(fs, nvar = 10L)
Print enpls.od object.
## S3 method for class 'enpls.od' print(x, ...)
## S3 method for class 'enpls.od' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See enpls.od
for outlier detection with
ensemble partial least squares regressions.
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) od <- enpls.od(x, y, reptimes = 40) od
data("alkanes") x <- alkanes$x y <- alkanes$y set.seed(42) od <- enpls.od(x, y, reptimes = 40) od
Print enspls.ad object.
## S3 method for class 'enspls.ad' print(x, ...)
## S3 method for class 'enspls.ad' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See enspls.ad
for model applicability domain
evaluation with ensemble sparse partial least squares regressions.
data("logd1k") # remove low variance variables x <- logd1k$x[, -c(17, 52, 59)] y <- logd1k$y # training set x.tr <- x[1:300, ] y.tr <- y[1:300] # two test sets x.te <- list( "test.1" = x[301:400, ], "test.2" = x[401:500, ] ) y.te <- list( "test.1" = y[301:400], "test.2" = y[401:500] ) set.seed(42) ad <- enspls.ad( x.tr, y.tr, x.te, y.te, maxcomp = 3, alpha = c(0.3, 0.6, 0.9), space = "variable", method = "mc", ratio = 0.8, reptimes = 10 ) print(ad)
data("logd1k") # remove low variance variables x <- logd1k$x[, -c(17, 52, 59)] y <- logd1k$y # training set x.tr <- x[1:300, ] y.tr <- y[1:300] # two test sets x.te <- list( "test.1" = x[301:400, ], "test.2" = x[401:500, ] ) y.te <- list( "test.1" = y[301:400], "test.2" = y[401:500] ) set.seed(42) ad <- enspls.ad( x.tr, y.tr, x.te, y.te, maxcomp = 3, alpha = c(0.3, 0.6, 0.9), space = "variable", method = "mc", ratio = 0.8, reptimes = 10 ) print(ad)
Print coefficients of each model in the enspls.fit object.
## S3 method for class 'enspls.fit' print(x, ...)
## S3 method for class 'enspls.fit' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See enspls.fit
for fitting ensemble sparse
partial least squares regression models.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fit <- enspls.fit( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) print(fit)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fit <- enspls.fit( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) print(fit)
Print enspls.fs object.
## S3 method for class 'enspls.fs' print(x, sort = TRUE, nvar = NULL, ...)
## S3 method for class 'enspls.fs' print(x, sort = TRUE, nvar = NULL, ...)
x |
An object of class |
sort |
Should the variables be sorted in decreasing order of importance? |
nvar |
Number of top variables to show. Ignored if |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See enspls.fs
for measuring feature importance with
ensemble sparse partial least squares regressions.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fs <- enspls.fs( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) print(fs, nvar = 10L)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) fs <- enspls.fs( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) print(fs, nvar = 10L)
Print enspls.od object.
## S3 method for class 'enspls.od' print(x, ...)
## S3 method for class 'enspls.od' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
See enspls.od
for outlier detection with
ensemble sparse partial least squares regressions.
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) od <- enspls.od( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) print(od)
data("logd1k") x <- logd1k$x y <- logd1k$y set.seed(42) od <- enspls.od( x, y, reptimes = 5, maxcomp = 3, alpha = c(0.3, 0.6, 0.9) ) print(od)