ggsci - Scientific Journal and Sci-Fi Themed Color Palettes for 'ggplot2'
A collection of 'ggplot2' color palettes inspired by plots in scientific journals, data visualization libraries, science fiction movies, and TV shows.
Last updated 5 months ago
color-palettesdata-visualizationggplot2ggscisci-fiscientific-journalsvisualization
18.02 score 664 stars 410 packages 24k scripts 165k downloadsprotr - Generating Various Numerical Representation Schemes for Protein Sequences
Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.
Last updated 2 months ago
bioinformaticsfeature-engineeringfeature-extractionmachine-learningpeptidesprotein-sequencessequence-analysis
9.98 score 52 stars 3 packages 162 scripts 1.1k downloadshdnom - Benchmarking and Visualization Toolkit for Penalized Cox Models
Creates nomogram visualizations for penalized Cox regression models, with the support of reproducible survival model building, validation, calibration, and comparison for high-dimensional data.
Last updated 3 months ago
benchmarkhigh-dimensional-datalinear-regressionnomogram-visualizationpenalized-cox-modelssurvival-analysis
8.17 score 41 stars 1 packages 67 scripts 497 downloadsRcpi - Molecular Informatics Toolkit for Compound-Protein Interaction in Drug Discovery
A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.
Last updated 23 days ago
softwaredataimportdatarepresentationfeatureextractioncheminformaticsbiomedicalinformaticsproteomicsgosystemsbiologybioconductorbioinformaticsdrug-discoveryfeature-extractionfingerprintmolecular-descriptorsprotein-sequences
7.78 score 36 stars 28 scripts 290 downloadsliftr - Containerize R Markdown Documents for Continuous Reproducibility
Persistent reproducible reporting by containerization of R Markdown documents.
Last updated 9 months ago
containerizationdockerdynamic-documentsknitrliftrreproducible-researchreproducible-sciencermarkdownstatistical-computing
7.03 score 171 stars 21 scripts 265 downloadspkglite - Compact Package Representations
A tool, grammar, and standard to represent and exchange R package source code as text files. Converts one or more source packages to a text file and restores the package structures from the file.
Last updated 12 days ago
clinical-trialsectdpackaging-toolpharmaverse
6.94 score 30 stars 12 scripts 808 downloadsmsaenet - Multi-Step Adaptive Estimation Methods for Sparse Regressions
Multi-step adaptive elastic-net (MSAENet) algorithm for feature selection in high-dimensional regressions proposed in Xiao and Xu (2015) <DOI:10.1080/00949655.2015.1016944>, with support for multi-step adaptive MCP-net (MSAMNet) and multi-step adaptive SCAD-net (MSASNet) methods.
Last updated 4 months ago
false-positive-controlhigh-dimensional-datalinear-regressionmachine-learningvariable-selection
6.23 score 13 stars 52 scripts 459 downloadsgMCPLite - Lightweight Graph Based Multiple Comparison Procedures
A lightweight fork of 'gMCP' with functions for graphical described multiple test procedures introduced in Bretz et al. (2009) <doi:10.1002/sim.3495> and Bretz et al. (2011) <doi:10.1002/bimj.201000239>. Implements a flexible function using 'ggplot2' to create multiplicity graph visualizations. Contains instructions of multiplicity graph and graphical testing for group sequential design, described in Maurer and Bretz (2013) <doi:10.1080/19466315.2013.807748>, with necessary unit testing using 'testthat'.
Last updated 9 months ago
5.97 score 9 stars 13 scripts 246 downloadsenpls - Ensemble Partial Least Squares Regression
An algorithmic framework for measuring feature importance, outlier detection, model applicability domain evaluation, and ensemble predictive modeling with (sparse) partial least squares regressions.
Last updated 3 years ago
chemometricsdimensionality-reductionensemble-learningmachine-learningoutlier-detectionpartial-least-squares-regression
5.58 score 18 stars 42 scripts 292 downloadsstackgbm - Stacked Gradient Boosting Machines
A minimalist implementation of model stacking by Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> for boosted tree models. A classic, two-layer stacking model is implemented, where the first layer generates features using gradient boosting trees, and the second layer employs a logistic regression model that uses these features as inputs. Utilities for training the base models and parameters tuning are provided, allowing users to experiment with different ensemble configurations easily. It aims to provide a simple and efficient way to combine multiple gradient boosting models to improve predictive model performance and robustness.
Last updated 7 months ago
automlcatboostdecision-treesensemble-learninggbdtgbmgradient-boostinglightgbmmachine-learningmodel-stackingxgboost
5.40 score 25 stars 3 scripts 456 downloadsssw - Striped Smith-Waterman Algorithm for Sequence Alignment using SIMD
Provides an R interface for 'SSW' (Striped Smith-Waterman) via its 'Python' binding 'ssw-py'. 'SSW' is a fast 'C' and 'C++' implementation of the Smith-Waterman algorithm for pairwise sequence alignment using Single-Instruction-Multiple-Data (SIMD) instructions. 'SSW' enhances the standard algorithm by efficiently returning alignment information and suboptimal alignment scores. The core 'SSW' library offers performance improvements for various bioinformatics tasks, including protein database searches, short-read alignments, primary and split-read mapping, structural variant detection, and read-overlap graph generation. These features make 'SSW' particularly useful for genomic applications. Zhao et al. (2013) <doi:10.1371/journal.pone.0082138> developed the original 'C' and 'C++' implementation.
Last updated 2 months ago
bioinformaticsreticulatesequence-alignmentsimdsmith-waterman
5.10 score 5 stars 454 downloadsgrex - Gene ID Mapping for Genotype-Tissue Expression (GTEx) Data
Convert 'Ensembl' gene identifiers from Genotype-Tissue Expression (GTEx) data to identifiers in other annotation systems, including 'Entrez', 'HGNC', and 'UniProt'.
Last updated 3 years ago
bioinformaticsgene-expressiongenotype-tissue-expressiongtex
4.96 score 8 stars 23 scripts 225 downloadsoneclust - Maximum Homogeneity Clustering for Univariate Data
Maximum homogeneity clustering algorithm for one-dimensional data described in W. D. Fisher (1958) <doi:10.1080/01621459.1958.10501479> via dynamic programming.
Last updated 9 months ago
clustering-algorithmfeature-engineeringhomogeneitypeak-callingunivariate-data
4.40 score 5 stars 162 downloadsRECA - Relevant Component Analysis for Supervised Distance Metric Learning
Relevant Component Analysis (RCA) tries to find a linear transformation of the feature space such that the effect of irrelevant variability is reduced in the transformed space.
Last updated 7 months ago
machine-learningmetric-learning
4.02 score 7 stars 4 scripts 161 downloadsOHPL - Ordered Homogeneity Pursuit Lasso for Group Variable Selection
Ordered homogeneity pursuit lasso (OHPL) algorithm for group variable selection proposed in Lin et al. (2017) <DOI:10.1016/j.chemolab.2017.07.004>. The OHPL method exploits the homogeneity structure in high-dimensional data and enjoys the grouping effect to select groups of important variables automatically. This feature makes it particularly useful for high-dimensional datasets with strongly correlated variables, such as spectroscopic data.
Last updated 4 months ago
chemometricshigh-dimensional-datahomogeneity-pursuitlassopartial-least-squares-regressionspectroscopyvariable-selection
3.85 score 7 stars 9 scripts 196 downloads