R-universe - nanxstats (Nan Xiao)

ggsci - Scientific Journal and Sci-Fi Themed Color Palettes for 'ggplot2'

A collection of 'ggplot2' color palettes inspired by plots in scientific journals, data visualization libraries, science fiction movies, and TV shows.

Last updated

color-palettesdata-visualizationggplot2ggscisci-fiscientific-journalsvisualization

18.21 score 737 stars 518 dependents 24k scripts 186k downloads

protr - Generating Various Numerical Representation Schemes for Protein Sequences

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.

Last updated

bioinformaticsfeature-engineeringfeature-extractionmachine-learningpeptidesprotein-sequencessequence-analysis

9.18 score 53 stars 2 dependents 237 scripts 595 downloads

hdnom - Benchmarking and Visualization Toolkit for Penalized Cox Models

Creates nomogram visualizations for penalized Cox regression models, with the support of reproducible survival model building, validation, calibration, and comparison for high-dimensional data.

Last updated

benchmarkhigh-dimensional-datalinear-regressionnomogram-visualizationpenalized-cox-modelssurvival-analysisopenblas

8.50 score 47 stars 2 dependents 84 scripts 788 downloads

msaenet - Multi-Step Adaptive Estimation Methods for Sparse Regressions

Multi-step adaptive elastic-net (MSAENet) algorithm for feature selection in high-dimensional regressions proposed in Xiao and Xu (2015) <DOI:10.1080/00949655.2015.1016944>, with support for multi-step adaptive MCP-net (MSAMNet) and multi-step adaptive SCAD-net (MSASNet) methods.

Last updated

false-positive-controlhigh-dimensional-datalinear-regressionmachine-learningvariable-selection

7.37 score 13 stars 6 dependents 65 scripts 3.1k downloads

Rcpi - Molecular Informatics Toolkit for Compound-Protein Interaction in Drug Discovery

A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.

Last updated

softwaredataimportdatarepresentationfeatureextractioncheminformaticsbiomedicalinformaticsproteomicsgosystemsbiologybioconductorbioinformaticsdrug-discoveryfeature-extractionfingerprintmolecular-descriptorsprotein-sequences

7.35 score 39 stars 29 scripts

liftr - Containerize R Markdown Documents for Continuous Reproducibility

Persistent reproducible reporting by containerization of R Markdown documents.

Last updated

containerizationdockerdynamic-documentsknitrliftrreproducible-researchreproducible-sciencermarkdownstatistical-computing

6.83 score 173 stars 13 scripts 339 downloads

enpls - Ensemble Partial Least Squares Regression

An algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.

Last updated

chemometricsdimensionality-reductionensemble-learningmachine-learningoutlier-detectionpartial-least-squares-regression

5.57 score 18 stars 41 scripts 462 downloads

pkgdown.offline - Build 'pkgdown' Websites Offline

Provides support for building 'pkgdown' websites without an internet connection. Works by bundling cached dependencies and implementing drop-in replacements for key 'pkgdown' functions. Enables package documentation websites to be built in environments where internet access is unavailable or restricted. For more details on generating 'pkgdown' websites, see Wickham et al. (2025) <doi:10.32614/CRAN.package.pkgdown>.

Last updated

documentation-toolofflineoffline-buildpkgdown

5.40 score 5 stars 1 scripts 227 downloads

stackgbm - Stacked Gradient Boosting Machines

A minimalist implementation of model stacking by Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> for boosted tree models. A classic, two-layer stacking model is implemented, where the first layer generates features using gradient boosting trees, and the second layer employs a logistic regression model that uses these features as inputs. Utilities for training the base models and parameters tuning are provided, allowing users to experiment with different ensemble configurations easily. It aims to provide a simple and efficient way to combine multiple gradient boosting models to improve predictive model performance and robustness.

Last updated

automlcatboostdecision-treesensemble-learninggbdtgbmgradient-boostinglightgbmmachine-learningmodel-stackingxgboost

5.10 score 25 stars 4 scripts 362 downloads

grex - Gene ID Mapping for Genotype-Tissue Expression (GTEx) Data

Convert 'Ensembl' gene identifiers from Genotype-Tissue Expression (GTEx) data to identifiers in other annotation systems, including 'Entrez', 'HGNC', and 'UniProt'.

Last updated

bioinformaticsgene-expressiongenotype-tissue-expressiongtex

4.94 score 8 stars 22 scripts 278 downloads

gsDesignTune - Dependency-Aware Scenario Exploration for Group Sequential Designs

Provides systematic, dependency-aware exploration of group sequential designs created with 'gsDesign'. Supports reproducible grid and random search over user-defined candidate sets, parallel evaluation via the 'future' framework, standardized metric extraction, and auditable reporting for design space evaluation and trade-off analysis. Methods for group sequential design are described in Anderson (2025) <doi:10.32614/CRAN.package.gsDesign>. The 'future' framework for parallel processing is described in Bengtsson (2021) <doi:10.32614/RJ-2021-048>.

Last updated

4.78 score 6 scripts 264 downloads

ssw - Striped Smith-Waterman Algorithm for Sequence Alignment using SIMD

Provides an R interface for 'SSW' (Striped Smith-Waterman) via its 'Python' binding 'ssw-py'. 'SSW' is a fast 'C' and 'C++' implementation of the Smith-Waterman algorithm for pairwise sequence alignment using Single-Instruction-Multiple-Data (SIMD) instructions. 'SSW' enhances the standard algorithm by efficiently returning alignment information and suboptimal alignment scores. The core 'SSW' library offers performance improvements for various bioinformatics tasks, including protein database searches, short-read alignments, primary and split-read mapping, structural variant detection, and read-overlap graph generation. These features make 'SSW' particularly useful for genomic applications. Zhao et al. (2013) <doi:10.1371/journal.pone.0082138> developed the original 'C' and 'C++' implementation.

Last updated

bioinformaticsreticulatesequence-alignmentsimdsmith-waterman

4.48 score 6 stars 385 downloads

oneclust - Maximum Homogeneity Clustering for Univariate Data

Maximum homogeneity clustering algorithm for one-dimensional data described in W. D. Fisher (1958) <doi:10.1080/01621459.1958.10501479> via dynamic programming.

Last updated

clustering-algorithmfeature-engineeringhomogeneitypeak-callingunivariate-datacpp

4.40 score 5 stars 2 scripts 257 downloads

OHPL - Ordered Homogeneity Pursuit Lasso for Group Variable Selection

Ordered homogeneity pursuit lasso (OHPL) algorithm for group variable selection proposed in Lin et al. (2017) <DOI:10.1016/j.chemolab.2017.07.004>. The OHPL method exploits the homogeneity structure in high-dimensional data and enjoys the grouping effect to select groups of important variables automatically. This feature makes it particularly useful for high-dimensional datasets with strongly correlated variables, such as spectroscopic data.

Last updated

chemometricshigh-dimensional-datahomogeneity-pursuitlassopartial-least-squares-regressionspectroscopyvariable-selection

4.02 score 7 stars 9 scripts 445 downloads

RECA - Relevant Component Analysis for Supervised Distance Metric Learning

Relevant Component Analysis (RCA) tries to find a linear transformation of the feature space such that the effect of irrelevant variability is reduced in the transformed space.

Last updated

machine-learningmetric-learning

4.02 score 7 stars 5 scripts 269 downloads