Title: | Multi-Step Adaptive Estimation Methods for Sparse Regressions |
---|---|
Description: | Multi-step adaptive elastic-net (MSAENet) algorithm for feature selection in high-dimensional regressions proposed in Xiao and Xu (2015) <DOI:10.1080/00949655.2015.1016944>, with support for multi-step adaptive MCP-net (MSAMNet) and multi-step adaptive SCAD-net (MSASNet) methods. |
Authors: | Nan Xiao [aut, cre] , Qing-Song Xu [aut] |
Maintainer: | Nan Xiao <[email protected]> |
License: | GPL (>= 3) |
Version: | 3.1.2.9000 |
Built: | 2024-10-31 18:37:56 UTC |
Source: | https://github.com/nanxstats/msaenet |
Adaptive Elastic-Net
aenet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("enet", "ridge"), alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, rule = c("lambda.min", "lambda.1se"), ebic.gamma = 1, scale = 1, lower.limits = -Inf, upper.limits = Inf, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
aenet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("enet", "ridge"), alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, rule = c("lambda.min", "lambda.1se"), ebic.gamma = 1, scale = 1, lower.limits = -Inf, upper.limits = Inf, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
x |
Data matrix. |
y |
Response vector if |
family |
Model family, can be |
init |
Type of the penalty used in the initial
estimation step. Can be |
alphas |
Vector of candidate |
tune |
Parameter tuning method for each estimation step.
Possible options are |
nfolds |
Fold numbers of cross-validation when |
rule |
Lambda selection criterion when |
ebic.gamma |
Parameter for Extended BIC penalizing
size of the model space when |
scale |
Scaling factor for adaptive weights:
|
lower.limits |
Lower limits for coefficients.
Default is |
upper.limits |
Upper limits for coefficients.
Default is |
penalty.factor.init |
The multiplicative factor for the penalty
applied to each coefficient in the initial estimation step. This is
useful for incorporating prior information about variable weights,
for example, emphasizing specific clinical variables. To make certain
variables more likely to be selected, assign a smaller value.
Default is |
seed |
Random seed for cross-validation fold division. |
parallel |
Logical. Enable parallel parameter tuning or not,
default is |
verbose |
Should we print out the estimation progress? |
List of model coefficients, glmnet
model object,
and the optimal parameter set.
Nan Xiao <https://nanx.me>
Zou, Hui, and Hao Helen Zhang. (2009). On the adaptive elastic-net with a diverging number of parameters. The Annals of Statistics 37(4), 1733–1751.
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) aenet.fit <- aenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), seed = 1002 ) print(aenet.fit) msaenet.nzv(aenet.fit) msaenet.fp(aenet.fit, 1:5) msaenet.tp(aenet.fit, 1:5) aenet.pred <- predict(aenet.fit, dat$x.te) msaenet.rmse(dat$y.te, aenet.pred) plot(aenet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) aenet.fit <- aenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), seed = 1002 ) print(aenet.fit) msaenet.nzv(aenet.fit) msaenet.fp(aenet.fit, 1:5) msaenet.tp(aenet.fit, 1:5) aenet.pred <- predict(aenet.fit, dat$x.te) msaenet.rmse(dat$y.te, aenet.pred) plot(aenet.fit)
Adaptive MCP-Net
amnet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("mnet", "ridge"), gammas = 3, alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, ebic.gamma = 1, scale = 1, eps = 1e-04, max.iter = 10000L, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
amnet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("mnet", "ridge"), gammas = 3, alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, ebic.gamma = 1, scale = 1, eps = 1e-04, max.iter = 10000L, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
x |
Data matrix. |
y |
Response vector if |
family |
Model family, can be |
init |
Type of the penalty used in the initial
estimation step. Can be |
gammas |
Vector of candidate |
alphas |
Vector of candidate |
tune |
Parameter tuning method for each estimation step.
Possible options are |
nfolds |
Fold numbers of cross-validation when |
ebic.gamma |
Parameter for Extended BIC penalizing
size of the model space when |
scale |
Scaling factor for adaptive weights:
|
eps |
Convergence threshold to use in MCP-net. |
max.iter |
Maximum number of iterations to use in MCP-net. |
penalty.factor.init |
The multiplicative factor for the penalty
applied to each coefficient in the initial estimation step. This is
useful for incorporating prior information about variable weights,
for example, emphasizing specific clinical variables. To make certain
variables more likely to be selected, assign a smaller value.
Default is |
seed |
Random seed for cross-validation fold division. |
parallel |
Logical. Enable parallel parameter tuning or not,
default is |
verbose |
Should we print out the estimation progress? |
List of model coefficients, ncvreg
model object,
and the optimal parameter set.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) amnet.fit <- amnet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), seed = 1002 ) print(amnet.fit) msaenet.nzv(amnet.fit) msaenet.fp(amnet.fit, 1:5) msaenet.tp(amnet.fit, 1:5) amnet.pred <- predict(amnet.fit, dat$x.te) msaenet.rmse(dat$y.te, amnet.pred) plot(amnet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) amnet.fit <- amnet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), seed = 1002 ) print(amnet.fit) msaenet.nzv(amnet.fit) msaenet.fp(amnet.fit, 1:5) msaenet.tp(amnet.fit, 1:5) amnet.pred <- predict(amnet.fit, dat$x.te) msaenet.rmse(dat$y.te, amnet.pred) plot(amnet.fit)
Adaptive SCAD-Net
asnet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("snet", "ridge"), gammas = 3.7, alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, ebic.gamma = 1, scale = 1, eps = 1e-04, max.iter = 10000L, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
asnet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("snet", "ridge"), gammas = 3.7, alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, ebic.gamma = 1, scale = 1, eps = 1e-04, max.iter = 10000L, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
x |
Data matrix. |
y |
Response vector if |
family |
Model family, can be |
init |
Type of the penalty used in the initial
estimation step. Can be |
gammas |
Vector of candidate |
alphas |
Vector of candidate |
tune |
Parameter tuning method for each estimation step.
Possible options are |
nfolds |
Fold numbers of cross-validation when |
ebic.gamma |
Parameter for Extended BIC penalizing
size of the model space when |
scale |
Scaling factor for adaptive weights:
|
eps |
Convergence threshold to use in SCAD-net. |
max.iter |
Maximum number of iterations to use in SCAD-net. |
penalty.factor.init |
The multiplicative factor for the penalty
applied to each coefficient in the initial estimation step. This is
useful for incorporating prior information about variable weights,
for example, emphasizing specific clinical variables. To make certain
variables more likely to be selected, assign a smaller value.
Default is |
seed |
Random seed for cross-validation fold division. |
parallel |
Logical. Enable parallel parameter tuning or not,
default is |
verbose |
Should we print out the estimation progress? |
List of model coefficients, ncvreg
model object,
and the optimal parameter set.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) asnet.fit <- asnet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), seed = 1002 ) print(asnet.fit) msaenet.nzv(asnet.fit) msaenet.fp(asnet.fit, 1:5) msaenet.tp(asnet.fit, 1:5) asnet.pred <- predict(asnet.fit, dat$x.te) msaenet.rmse(dat$y.te, asnet.pred) plot(asnet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) asnet.fit <- asnet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), seed = 1002 ) print(asnet.fit) msaenet.nzv(asnet.fit) msaenet.fp(asnet.fit, 1:5) msaenet.tp(asnet.fit, 1:5) asnet.pred <- predict(asnet.fit, dat$x.te) msaenet.rmse(dat$y.te, asnet.pred) plot(asnet.fit)
Extract model coefficients from the final model in msaenet model objects.
## S3 method for class 'msaenet' coef(object, ...)
## S3 method for class 'msaenet' coef(object, ...)
object |
An object of class |
... |
Additional parameters for |
A numerical vector of model coefficients.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) coef(msaenet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) coef(msaenet.fit)
Multi-Step Adaptive Elastic-Net
msaenet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("enet", "ridge"), alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, rule = c("lambda.min", "lambda.1se"), ebic.gamma = 1, nsteps = 2L, tune.nsteps = c("max", "ebic", "bic", "aic"), ebic.gamma.nsteps = 1, scale = 1, lower.limits = -Inf, upper.limits = Inf, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
msaenet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("enet", "ridge"), alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, rule = c("lambda.min", "lambda.1se"), ebic.gamma = 1, nsteps = 2L, tune.nsteps = c("max", "ebic", "bic", "aic"), ebic.gamma.nsteps = 1, scale = 1, lower.limits = -Inf, upper.limits = Inf, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
x |
Data matrix. |
y |
Response vector if |
family |
Model family, can be |
init |
Type of the penalty used in the initial
estimation step. Can be |
alphas |
Vector of candidate |
tune |
Parameter tuning method for each estimation step.
Possible options are |
nfolds |
Fold numbers of cross-validation when |
rule |
Lambda selection criterion when |
ebic.gamma |
Parameter for Extended BIC penalizing
size of the model space when |
nsteps |
Maximum number of adaptive estimation steps.
At least |
tune.nsteps |
Optimal step number selection method
(aggregate the optimal model from the each step and compare).
Options include |
ebic.gamma.nsteps |
Parameter for Extended BIC penalizing
size of the model space when |
scale |
Scaling factor for adaptive weights:
|
lower.limits |
Lower limits for coefficients.
Default is |
upper.limits |
Upper limits for coefficients.
Default is |
penalty.factor.init |
The multiplicative factor for the penalty
applied to each coefficient in the initial estimation step. This is
useful for incorporating prior information about variable weights,
for example, emphasizing specific clinical variables. To make certain
variables more likely to be selected, assign a smaller value.
Default is |
seed |
Random seed for cross-validation fold division. |
parallel |
Logical. Enable parallel parameter tuning or not,
default is |
verbose |
Should we print out the estimation progress? |
List of model coefficients, glmnet
model object,
and the optimal parameter set.
Nan Xiao <https://nanx.me>
Nan Xiao and Qing-Song Xu. (2015). Multi-step adaptive elastic-net: reducing false positives in high-dimensional variable selection. Journal of Statistical Computation and Simulation 85(18), 3755–3765.
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) print(msaenet.fit) msaenet.nzv(msaenet.fit) msaenet.fp(msaenet.fit, 1:5) msaenet.tp(msaenet.fit, 1:5) msaenet.pred <- predict(msaenet.fit, dat$x.te) msaenet.rmse(dat$y.te, msaenet.pred) plot(msaenet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) print(msaenet.fit) msaenet.nzv(msaenet.fit) msaenet.fp(msaenet.fit, 1:5) msaenet.tp(msaenet.fit, 1:5) msaenet.pred <- predict(msaenet.fit, dat$x.te) msaenet.rmse(dat$y.te, msaenet.pred) plot(msaenet.fit)
Get the number of false negative selections from msaenet model objects, given the indices of true variables (if known).
msaenet.fn(object, true.idx)
msaenet.fn(object, true.idx)
object |
An object of class |
true.idx |
Vector. Indices of true variables. |
Number of false negative variables in the model.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.fn(msaenet.fit, 1:5)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.fn(msaenet.fit, 1:5)
Get the number of false positive selections from msaenet model objects, given the indices of true variables (if known).
msaenet.fp(object, true.idx)
msaenet.fp(object, true.idx)
object |
An object of class |
true.idx |
Vector. Indices of true variables. |
Number of false positive variables in the model.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.fp(msaenet.fit, 1:5)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.fp(msaenet.fit, 1:5)
Compute mean absolute error (MAE).
msaenet.mae(yreal, ypred)
msaenet.mae(yreal, ypred)
yreal |
Vector. True response. |
ypred |
Vector. Predicted response. |
MAE
Nan Xiao <https://nanx.me>
Compute mean squared error (MSE).
msaenet.mse(yreal, ypred)
msaenet.mse(yreal, ypred)
yreal |
Vector. True response. |
ypred |
Vector. Predicted response. |
MSE
Nan Xiao <https://nanx.me>
Get the indices of non-zero variables from msaenet model objects.
msaenet.nzv(object)
msaenet.nzv(object)
object |
An object of class |
Indices vector of non-zero variables in the model.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.nzv(msaenet.fit) # coefficients of non-zero variables coef(msaenet.fit)[msaenet.nzv(msaenet.fit)]
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.nzv(msaenet.fit) # coefficients of non-zero variables coef(msaenet.fit)[msaenet.nzv(msaenet.fit)]
Get the indices of non-zero variables in all steps from msaenet model objects.
msaenet.nzv.all(object)
msaenet.nzv.all(object)
object |
An object of class |
List containing indices vectors of non-zero variables in all steps.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.nzv.all(msaenet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.nzv.all(msaenet.fit)
Compute root mean squared error (RMSE).
msaenet.rmse(yreal, ypred)
msaenet.rmse(yreal, ypred)
yreal |
Vector. True response. |
ypred |
Vector. Predicted response. |
RMSE
Nan Xiao <https://nanx.me>
Compute root mean squared logarithmic error (RMSLE).
msaenet.rmsle(yreal, ypred)
msaenet.rmsle(yreal, ypred)
yreal |
Vector. True response. |
ypred |
Vector. Predicted response. |
RMSLE
Nan Xiao <https://nanx.me>
Generate simulation data for benchmarking sparse logistic regression models.
msaenet.sim.binomial( n = 300, p = 500, rho = 0.5, coef = rep(0.2, 50), snr = 1, p.train = 0.7, seed = 1001 )
msaenet.sim.binomial( n = 300, p = 500, rho = 0.5, coef = rep(0.2, 50), snr = 1, p.train = 0.7, seed = 1001 )
n |
Number of observations. |
p |
Number of variables. |
rho |
Correlation base for generating correlated variables. |
coef |
Vector of non-zero coefficients. |
snr |
Signal-to-noise ratio (SNR). |
p.train |
Percentage of training set. |
seed |
Random seed for reproducibility. |
List of x.tr
, x.te
, y.tr
, and y.te
.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.binomial( n = 300, p = 500, rho = 0.6, coef = rep(1, 10), snr = 3, p.train = 0.7, seed = 1001 ) dim(dat$x.tr) dim(dat$x.te) table(dat$y.tr) table(dat$y.te)
dat <- msaenet.sim.binomial( n = 300, p = 500, rho = 0.6, coef = rep(1, 10), snr = 3, p.train = 0.7, seed = 1001 ) dim(dat$x.tr) dim(dat$x.te) table(dat$y.tr) table(dat$y.te)
Generate simulation data for benchmarking sparse Cox regression models.
msaenet.sim.cox( n = 300, p = 500, rho = 0.5, coef = rep(0.2, 50), snr = 1, p.train = 0.7, seed = 1001 )
msaenet.sim.cox( n = 300, p = 500, rho = 0.5, coef = rep(0.2, 50), snr = 1, p.train = 0.7, seed = 1001 )
n |
Number of observations. |
p |
Number of variables. |
rho |
Correlation base for generating correlated variables. |
coef |
Vector of non-zero coefficients. |
snr |
Signal-to-noise ratio (SNR). |
p.train |
Percentage of training set. |
seed |
Random seed for reproducibility. |
List of x.tr
, x.te
, y.tr
, and y.te
.
Nan Xiao <https://nanx.me>
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39(5), 1–13.
dat <- msaenet.sim.cox( n = 300, p = 500, rho = 0.6, coef = rep(1, 10), snr = 3, p.train = 0.7, seed = 1001 ) dim(dat$x.tr) dim(dat$x.te) dim(dat$y.tr) dim(dat$y.te)
dat <- msaenet.sim.cox( n = 300, p = 500, rho = 0.6, coef = rep(1, 10), snr = 3, p.train = 0.7, seed = 1001 ) dim(dat$x.tr) dim(dat$x.te) dim(dat$y.tr) dim(dat$y.te)
Generate simulation data (Gaussian case) following the settings in Xiao and Xu (2015).
msaenet.sim.gaussian( n = 300, p = 500, rho = 0.5, coef = rep(0.2, 50), snr = 1, p.train = 0.7, seed = 1001 )
msaenet.sim.gaussian( n = 300, p = 500, rho = 0.5, coef = rep(0.2, 50), snr = 1, p.train = 0.7, seed = 1001 )
n |
Number of observations. |
p |
Number of variables. |
rho |
Correlation base for generating correlated variables. |
coef |
Vector of non-zero coefficients. |
snr |
Signal-to-noise ratio (SNR). SNR is defined as
|
p.train |
Percentage of training set. |
seed |
Random seed for reproducibility. |
List of x.tr
, x.te
, y.tr
, and y.te
.
Nan Xiao <https://nanx.me>
Nan Xiao and Qing-Song Xu. (2015). Multi-step adaptive elastic-net: reducing false positives in high-dimensional variable selection. Journal of Statistical Computation and Simulation 85(18), 3755–3765.
dat <- msaenet.sim.gaussian( n = 300, p = 500, rho = 0.6, coef = rep(1, 10), snr = 3, p.train = 0.7, seed = 1001 ) dim(dat$x.tr) dim(dat$x.te)
dat <- msaenet.sim.gaussian( n = 300, p = 500, rho = 0.6, coef = rep(1, 10), snr = 3, p.train = 0.7, seed = 1001 ) dim(dat$x.tr) dim(dat$x.te)
Generate simulation data for benchmarking sparse Poisson regression models.
msaenet.sim.poisson( n = 300, p = 500, rho = 0.5, coef = rep(0.2, 50), snr = 1, p.train = 0.7, seed = 1001 )
msaenet.sim.poisson( n = 300, p = 500, rho = 0.5, coef = rep(0.2, 50), snr = 1, p.train = 0.7, seed = 1001 )
n |
Number of observations. |
p |
Number of variables. |
rho |
Correlation base for generating correlated variables. |
coef |
Vector of non-zero coefficients. |
snr |
Signal-to-noise ratio (SNR). |
p.train |
Percentage of training set. |
seed |
Random seed for reproducibility. |
List of x.tr
, x.te
, y.tr
, and y.te
.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.poisson( n = 300, p = 500, rho = 0.6, coef = rep(1, 10), snr = 3, p.train = 0.7, seed = 1001 ) dim(dat$x.tr) dim(dat$x.te)
dat <- msaenet.sim.poisson( n = 300, p = 500, rho = 0.6, coef = rep(1, 10), snr = 3, p.train = 0.7, seed = 1001 ) dim(dat$x.tr) dim(dat$x.te)
Get the number of true positive selections from msaenet model objects, given the indices of true variables (if known).
msaenet.tp(object, true.idx)
msaenet.tp(object, true.idx)
object |
An object of class |
true.idx |
Vector. Indices of true variables. |
Number of true positive variables in the model.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.tp(msaenet.fit, 1:5)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.tp(msaenet.fit, 1:5)
Multi-Step Adaptive MCP-Net
msamnet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("mnet", "ridge"), gammas = 3, alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, ebic.gamma = 1, nsteps = 2L, tune.nsteps = c("max", "ebic", "bic", "aic"), ebic.gamma.nsteps = 1, scale = 1, eps = 1e-04, max.iter = 10000L, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
msamnet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("mnet", "ridge"), gammas = 3, alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, ebic.gamma = 1, nsteps = 2L, tune.nsteps = c("max", "ebic", "bic", "aic"), ebic.gamma.nsteps = 1, scale = 1, eps = 1e-04, max.iter = 10000L, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
x |
Data matrix. |
y |
Response vector if |
family |
Model family, can be |
init |
Type of the penalty used in the initial
estimation step. Can be |
gammas |
Vector of candidate |
alphas |
Vector of candidate |
tune |
Parameter tuning method for each estimation step.
Possible options are |
nfolds |
Fold numbers of cross-validation when |
ebic.gamma |
Parameter for Extended BIC penalizing
size of the model space when |
nsteps |
Maximum number of adaptive estimation steps.
At least |
tune.nsteps |
Optimal step number selection method
(aggregate the optimal model from the each step and compare).
Options include |
ebic.gamma.nsteps |
Parameter for Extended BIC penalizing
size of the model space when |
scale |
Scaling factor for adaptive weights:
|
eps |
Convergence threshold to use in MCP-net. |
max.iter |
Maximum number of iterations to use in MCP-net. |
penalty.factor.init |
The multiplicative factor for the penalty
applied to each coefficient in the initial estimation step. This is
useful for incorporating prior information about variable weights,
for example, emphasizing specific clinical variables. To make certain
variables more likely to be selected, assign a smaller value.
Default is |
seed |
Random seed for cross-validation fold division. |
parallel |
Logical. Enable parallel parameter tuning or not,
default is |
verbose |
Should we print out the estimation progress? |
List of model coefficients, ncvreg
model object,
and the optimal parameter set.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msamnet.fit <- msamnet( dat$x.tr, dat$y.tr, alphas = seq(0.3, 0.9, 0.3), nsteps = 3L, seed = 1003 ) print(msamnet.fit) msaenet.nzv(msamnet.fit) msaenet.fp(msamnet.fit, 1:5) msaenet.tp(msamnet.fit, 1:5) msamnet.pred <- predict(msamnet.fit, dat$x.te) msaenet.rmse(dat$y.te, msamnet.pred) plot(msamnet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msamnet.fit <- msamnet( dat$x.tr, dat$y.tr, alphas = seq(0.3, 0.9, 0.3), nsteps = 3L, seed = 1003 ) print(msamnet.fit) msaenet.nzv(msamnet.fit) msaenet.fp(msamnet.fit, 1:5) msaenet.tp(msamnet.fit, 1:5) msamnet.pred <- predict(msamnet.fit, dat$x.te) msaenet.rmse(dat$y.te, msamnet.pred) plot(msamnet.fit)
Multi-Step Adaptive SCAD-Net
msasnet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("snet", "ridge"), gammas = 3.7, alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, ebic.gamma = 1, nsteps = 2L, tune.nsteps = c("max", "ebic", "bic", "aic"), ebic.gamma.nsteps = 1, scale = 1, eps = 1e-04, max.iter = 10000L, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
msasnet( x, y, family = c("gaussian", "binomial", "poisson", "cox"), init = c("snet", "ridge"), gammas = 3.7, alphas = seq(0.05, 0.95, 0.05), tune = c("cv", "ebic", "bic", "aic"), nfolds = 5L, ebic.gamma = 1, nsteps = 2L, tune.nsteps = c("max", "ebic", "bic", "aic"), ebic.gamma.nsteps = 1, scale = 1, eps = 1e-04, max.iter = 10000L, penalty.factor.init = rep(1, ncol(x)), seed = 1001, parallel = FALSE, verbose = FALSE )
x |
Data matrix. |
y |
Response vector if |
family |
Model family, can be |
init |
Type of the penalty used in the initial
estimation step. Can be |
gammas |
Vector of candidate |
alphas |
Vector of candidate |
tune |
Parameter tuning method for each estimation step.
Possible options are |
nfolds |
Fold numbers of cross-validation when |
ebic.gamma |
Parameter for Extended BIC penalizing
size of the model space when |
nsteps |
Maximum number of adaptive estimation steps.
At least |
tune.nsteps |
Optimal step number selection method
(aggregate the optimal model from the each step and compare).
Options include |
ebic.gamma.nsteps |
Parameter for Extended BIC penalizing
size of the model space when |
scale |
Scaling factor for adaptive weights:
|
eps |
Convergence threshold to use in SCAD-net. |
max.iter |
Maximum number of iterations to use in SCAD-net. |
penalty.factor.init |
The multiplicative factor for the penalty
applied to each coefficient in the initial estimation step. This is
useful for incorporating prior information about variable weights,
for example, emphasizing specific clinical variables. To make certain
variables more likely to be selected, assign a smaller value.
Default is |
seed |
Random seed for cross-validation fold division. |
parallel |
Logical. Enable parallel parameter tuning or not,
default is |
verbose |
Should we print out the estimation progress? |
List of model coefficients, ncvreg
model object,
and the optimal parameter set.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msasnet.fit <- msasnet( dat$x.tr, dat$y.tr, alphas = seq(0.3, 0.9, 0.3), nsteps = 3L, seed = 1003 ) print(msasnet.fit) msaenet.nzv(msasnet.fit) msaenet.fp(msasnet.fit, 1:5) msaenet.tp(msasnet.fit, 1:5) msasnet.pred <- predict(msasnet.fit, dat$x.te) msaenet.rmse(dat$y.te, msasnet.pred) plot(msasnet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msasnet.fit <- msasnet( dat$x.tr, dat$y.tr, alphas = seq(0.3, 0.9, 0.3), nsteps = 3L, seed = 1003 ) print(msasnet.fit) msaenet.nzv(msasnet.fit) msaenet.fp(msasnet.fit, 1:5) msaenet.tp(msasnet.fit, 1:5) msasnet.pred <- predict(msasnet.fit, dat$x.te) msaenet.rmse(dat$y.te, msasnet.pred) plot(msasnet.fit)
Plot msaenet model objects.
## S3 method for class 'msaenet' plot( x, type = c("coef", "criterion", "dotplot"), nsteps = NULL, highlight = TRUE, col = NULL, label = FALSE, label.vars = NULL, label.pos = 2, label.offset = 0.3, label.cex = 0.7, label.srt = 90, xlab = NULL, ylab = NULL, abs = FALSE, ... )
## S3 method for class 'msaenet' plot( x, type = c("coef", "criterion", "dotplot"), nsteps = NULL, highlight = TRUE, col = NULL, label = FALSE, label.vars = NULL, label.pos = 2, label.offset = 0.3, label.cex = 0.7, label.srt = 90, xlab = NULL, ylab = NULL, abs = FALSE, ... )
x |
An object of class |
type |
Plot type, |
nsteps |
Maximum number of estimation steps to plot. Default is to plot all steps. |
highlight |
Should we highlight the "optimal" step
according to the criterion? Default is |
col |
Color palette to use for the coefficient paths.
If it is |
label |
Should we label all the non-zero variables of the
optimal step in the coefficient plot or the dot plot?
Default is |
label.vars |
Labels to use for all the variables
if |
label.pos |
Position of the labels. See argument
|
label.offset |
Offset of the labels. See argument
|
label.cex |
Character expansion factor of the labels.
See argument |
label.srt |
Label rotation in degrees for the Cleveland dot plot.
Default is |
xlab |
Title for x axis. If is |
ylab |
Title for y axis. If is |
abs |
Should we plot the absolute values of the coefficients
instead of the raw coefficients in the Cleveland dot plot?
Default is |
... |
Other parameters (not used). |
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 5L, tune.nsteps = "bic", seed = 1002 ) plot(fit) plot(fit, label = TRUE) plot(fit, label = TRUE, nsteps = 5) plot(fit, type = "criterion") plot(fit, type = "criterion", nsteps = 5) plot(fit, type = "dotplot", label = TRUE) plot(fit, type = "dotplot", label = TRUE, abs = TRUE)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 5L, tune.nsteps = "bic", seed = 1002 ) plot(fit) plot(fit, label = TRUE) plot(fit, label = TRUE, nsteps = 5) plot(fit, type = "criterion") plot(fit, type = "criterion", nsteps = 5) plot(fit, type = "dotplot", label = TRUE) plot(fit, type = "dotplot", label = TRUE, abs = TRUE)
Make predictions on new data by a msaenet model object.
## S3 method for class 'msaenet' predict(object, newx, ...)
## S3 method for class 'msaenet' predict(object, newx, ...)
object |
An object of class |
newx |
New data to predict with. |
... |
Additional parameters, particularly prediction |
Numeric matrix of the predicted values.
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.pred <- predict(msaenet.fit, dat$x.te) msaenet.rmse(dat$y.te, msaenet.pred)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) msaenet.pred <- predict(msaenet.fit, dat$x.te) msaenet.rmse(dat$y.te, msaenet.pred)
Print msaenet model objects (currently, only printing the model information of the final step).
## S3 method for class 'msaenet' print(x, ...)
## S3 method for class 'msaenet' print(x, ...)
x |
An object of class |
... |
Additional parameters for |
Nan Xiao <https://nanx.me>
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) print(msaenet.fit)
dat <- msaenet.sim.gaussian( n = 150, p = 500, rho = 0.6, coef = rep(1, 5), snr = 2, p.train = 0.7, seed = 1001 ) msaenet.fit <- msaenet( dat$x.tr, dat$y.tr, alphas = seq(0.2, 0.8, 0.2), nsteps = 3L, seed = 1003 ) print(msaenet.fit)