
ggsci - Scientific Journal and Sci-Fi Themed Color Palettes for 'ggplot2'
A collection of 'ggplot2' color palettes inspired by plots in scientific journals, data visualization libraries, science fiction movies, and TV shows.
Last updated
color-palettesdata-visualizationggplot2ggscisci-fiscientific-journalsvisualization
18.15 score 733 stars 506 dependents 19k scripts 236k downloads
protr - Generating Various Numerical Representation Schemes for Protein Sequences
Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.
Last updated
bioinformaticsfeature-engineeringfeature-extractionmachine-learningpeptidesprotein-sequencessequence-analysis
9.15 score 53 stars 2 dependents 221 scripts 529 downloads
hdnom - Benchmarking and Visualization Toolkit for Penalized Cox Models
Creates nomogram visualizations for penalized Cox regression models, with the support of reproducible survival model building, validation, calibration, and comparison for high-dimensional data.
Last updated
benchmarkhigh-dimensional-datalinear-regressionnomogram-visualizationpenalized-cox-modelssurvival-analysisopenblas
8.83 score 47 stars 3 dependents 71 scripts 374 downloads
msaenet - Multi-Step Adaptive Estimation Methods for Sparse Regressions
Multi-step adaptive elastic-net (MSAENet) algorithm for feature selection in high-dimensional regressions proposed in Xiao and Xu (2015) <DOI:10.1080/00949655.2015.1016944>, with support for multi-step adaptive MCP-net (MSAMNet) and multi-step adaptive SCAD-net (MSASNet) methods.
Last updated
false-positive-controlhigh-dimensional-datalinear-regressionmachine-learningvariable-selection
7.44 score 13 stars 6 dependents 65 scripts 3.6k downloads
Rcpi - Molecular Informatics Toolkit for Compound-Protein Interaction in Drug Discovery
A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.
Last updated
softwaredataimportdatarepresentationfeatureextractioncheminformaticsbiomedicalinformaticsproteomicsgosystemsbiologybioconductorbioinformaticsdrug-discoveryfeature-extractionfingerprintmolecular-descriptorsprotein-sequences
7.37 score 39 stars 30 scripts 476 downloads
liftr - Containerize R Markdown Documents for Continuous Reproducibility
Persistent reproducible reporting by containerization of R Markdown documents.
Last updated
containerizationdockerdynamic-documentsknitrliftrreproducible-researchreproducible-sciencermarkdownstatistical-computing
6.86 score 173 stars 14 scripts 279 downloads
enpls - Ensemble Partial Least Squares Regression
An algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.
Last updated
chemometricsdimensionality-reductionensemble-learningmachine-learningoutlier-detectionpartial-least-squares-regression
5.56 score 18 stars 40 scripts 288 downloads
pkgdown.offline - Build 'pkgdown' Websites Offline
Provides support for building 'pkgdown' websites without an internet connection. Works by bundling cached dependencies and implementing drop-in replacements for key 'pkgdown' functions. Enables package documentation websites to be built in environments where internet access is unavailable or restricted. For more details on generating 'pkgdown' websites, see Wickham et al. (2025) <doi:10.32614/CRAN.package.pkgdown>.
Last updated
documentation-toolofflineoffline-buildpkgdown
5.48 score 4 stars 1 scripts 217 downloads
stackgbm - Stacked Gradient Boosting Machines
A minimalist implementation of model stacking by Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> for boosted tree models. A classic, two-layer stacking model is implemented, where the first layer generates features using gradient boosting trees, and the second layer employs a logistic regression model that uses these features as inputs. Utilities for training the base models and parameters tuning are provided, allowing users to experiment with different ensemble configurations easily. It aims to provide a simple and efficient way to combine multiple gradient boosting models to improve predictive model performance and robustness.
Last updated
automlcatboostdecision-treesensemble-learninggbdtgbmgradient-boostinglightgbmmachine-learningmodel-stackingxgboost
5.10 score 25 stars 3 scripts 598 downloads
grex - Gene ID Mapping for Genotype-Tissue Expression (GTEx) Data
Convert 'Ensembl' gene identifiers from Genotype-Tissue Expression (GTEx) data to identifiers in other annotation systems, including 'Entrez', 'HGNC', and 'UniProt'.
Last updated
bioinformaticsgene-expressiongenotype-tissue-expressiongtex
4.94 score 8 stars 22 scripts 168 downloads
ssw - Striped Smith-Waterman Algorithm for Sequence Alignment using SIMD
Provides an R interface for 'SSW' (Striped Smith-Waterman) via its 'Python' binding 'ssw-py'. 'SSW' is a fast 'C' and 'C++' implementation of the Smith-Waterman algorithm for pairwise sequence alignment using Single-Instruction-Multiple-Data (SIMD) instructions. 'SSW' enhances the standard algorithm by efficiently returning alignment information and suboptimal alignment scores. The core 'SSW' library offers performance improvements for various bioinformatics tasks, including protein database searches, short-read alignments, primary and split-read mapping, structural variant detection, and read-overlap graph generation. These features make 'SSW' particularly useful for genomic applications. Zhao et al. (2013) <doi:10.1371/journal.pone.0082138> developed the original 'C' and 'C++' implementation.
Last updated
bioinformaticsreticulatesequence-alignmentsimdsmith-waterman
4.48 score 6 stars 552 downloads
oneclust - Maximum Homogeneity Clustering for Univariate Data
Maximum homogeneity clustering algorithm for one-dimensional data described in W. D. Fisher (1958) <doi:10.1080/01621459.1958.10501479> via dynamic programming.
Last updated
clustering-algorithmfeature-engineeringhomogeneitypeak-callingunivariate-datacpp
4.40 score 5 stars 2 scripts 184 downloads
OHPL - Ordered Homogeneity Pursuit Lasso for Group Variable Selection
Ordered homogeneity pursuit lasso (OHPL) algorithm for group variable selection proposed in Lin et al. (2017) <DOI:10.1016/j.chemolab.2017.07.004>. The OHPL method exploits the homogeneity structure in high-dimensional data and enjoys the grouping effect to select groups of important variables automatically. This feature makes it particularly useful for high-dimensional datasets with strongly correlated variables, such as spectroscopic data.
Last updated
chemometricshigh-dimensional-datahomogeneity-pursuitlassopartial-least-squares-regressionspectroscopyvariable-selection
4.15 score 7 stars 9 scripts 501 downloads
RECA - Relevant Component Analysis for Supervised Distance Metric Learning
Relevant Component Analysis (RCA) tries to find a linear transformation of the feature space such that the effect of irrelevant variability is reduced in the transformed space.
Last updated
machine-learningmetric-learning
4.02 score 7 stars 5 scripts 168 downloads