Package: protr 1.7-4

protr: Generating Various Numerical Representation Schemes for Protein Sequences

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <doi:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.

Authors:Nan Xiao [aut, cre], Qing-Song Xu [aut], Dong-Sheng Cao [aut], Sebastian Mueller [ctb]

protr_1.7-4.tar.gz
protr_1.7-4.zip(r-4.5)protr_1.7-4.zip(r-4.4)protr_1.7-4.zip(r-4.3)
protr_1.7-4.tgz(r-4.4-any)protr_1.7-4.tgz(r-4.3-any)
protr_1.7-4.tar.gz(r-4.5-noble)protr_1.7-4.tar.gz(r-4.4-noble)
protr_1.7-4.tgz(r-4.4-emscripten)protr_1.7-4.tgz(r-4.3-emscripten)
protr.pdf |protr.html
protr/json (API)
NEWS

# Install 'protr' in R:
install.packages('protr', repos = c('https://nanxstats.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/nanxstats/protr/issues

Datasets:
  • AA2DACOR - 2D Autocorrelations Descriptors for 20 Amino Acids calculated by Dragon
  • AA3DMoRSE - 3D-MoRSE Descriptors for 20 Amino Acids calculated by Dragon
  • AAACF - Atom-Centred Fragments Descriptors for 20 Amino Acids calculated by Dragon
  • AABLOSUM100 - BLOSUM100 Matrix for 20 Amino Acids
  • AABLOSUM45 - BLOSUM45 Matrix for 20 Amino Acids
  • AABLOSUM50 - BLOSUM50 Matrix for 20 Amino Acids
  • AABLOSUM62 - BLOSUM62 Matrix for 20 Amino Acids
  • AABLOSUM80 - BLOSUM80 Matrix for 20 Amino Acids
  • AABurden - Burden Eigenvalues Descriptors for 20 Amino Acids calculated by Dragon
  • AACPSA - CPSA Descriptors for 20 Amino Acids calculated by Discovery Studio
  • AAConn - Connectivity Indices Descriptors for 20 Amino Acids calculated by Dragon
  • AAConst - Constitutional Descriptors for 20 Amino Acids calculated by Dragon
  • AADescAll - All 2D Descriptors for 20 Amino Acids calculated by Dragon
  • AAEdgeAdj - Edge Adjacency Indices Descriptors for 20 Amino Acids calculated by Dragon
  • AAEigIdx - Eigenvalue-Based Indices Descriptors for 20 Amino Acids calculated by Dragon
  • AAFGC - Functional Group Counts Descriptors for 20 Amino Acids calculated by Dragon
  • AAGETAWAY - GETAWAY Descriptors for 20 Amino Acids calculated by Dragon
  • AAGeom - Geometrical Descriptors for 20 Amino Acids calculated by Dragon
  • AAInfo - Information Indices Descriptors for 20 Amino Acids calculated by Dragon
  • AAMOE2D - 2D Descriptors for 20 Amino Acids calculated by MOE 2011.10
  • AAMOE3D - 3D Descriptors for 20 Amino Acids calculated by MOE 2011.10
  • AAMetaInfo - Meta Information for the 20 Amino Acids
  • AAMolProp - Molecular Properties Descriptors for 20 Amino Acids calculated by Dragon
  • AAPAM120 - PAM120 Matrix for 20 Amino Acids
  • AAPAM250 - PAM250 Matrix for 20 Amino Acids
  • AAPAM30 - PAM30 Matrix for 20 Amino Acids
  • AAPAM40 - PAM40 Matrix for 20 Amino Acids
  • AAPAM70 - PAM70 Matrix for 20 Amino Acids
  • AARDF - RDF Descriptors for 20 Amino Acids calculated by Dragon
  • AARandic - Randic Molecular Profiles Descriptors for 20 Amino Acids calculated by Dragon
  • AATopo - Topological Descriptors for 20 Amino Acids calculated by Dragon
  • AATopoChg - Topological Charge Indices Descriptors for 20 Amino Acids calculated by Dragon
  • AAWHIM - WHIM Descriptors for 20 Amino Acids calculated by Dragon
  • AAWalk - Walk and Path Counts Descriptors for 20 Amino Acids calculated by Dragon
  • AAindex - AAindex Data of 544 Physicochemical and Biological Properties for 20 Amino Acids

On CRAN:

bioinformaticsfeature-engineeringfeature-extractionmachine-learningpeptidesprotein-sequencessequence-analysis

9.98 score 52 stars 3 packages 162 scripts 1.1k downloads 11 mentions 43 exports 0 dependencies

Last updated 2 months agofrom:8cafbdb092. Checks:OK: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKNov 09 2024
R-4.5-winOKNov 09 2024
R-4.5-linuxOKNov 09 2024
R-4.4-winOKNov 09 2024
R-4.4-macOKNov 09 2024
R-4.3-winOKNov 09 2024
R-4.3-macOKNov 09 2024

Exports:acccrossSetSimcrossSetSimDiskextractAACextractAPAACextractBLOSUMextractCTDCextractCTDCClassextractCTDDextractCTDDClassextractCTDTextractCTDTClassextractCTriadextractCTriadClassextractDCextractDescScalesextractFAScalesextractGearyextractMDSScalesextractMoranextractMoreauBrotoextractPAACextractProtFPextractProtFPGapextractPSSMextractPSSMAccextractPSSMFeatureextractQSOextractScalesextractScalesGapextractSOCNextractTCgetUniProtparGOSimparSeqSimparSeqSimDiskprotcheckprotsegreadFASTAreadPDBremoveGapstwoGOSimtwoSeqSim

Dependencies:

protr: R package for generating various numerical representation schemes of protein sequences

Rendered fromprotr.Rmdusingknitr::rmarkdownon Nov 09 2024.

Last update: 2024-08-30
Started: 2017-06-06

Readme and manuals

Help Manual

Help pageTopics
2D Autocorrelations Descriptors for 20 Amino Acids calculated by DragonAA2DACOR
3D-MoRSE Descriptors for 20 Amino Acids calculated by DragonAA3DMoRSE
Atom-Centred Fragments Descriptors for 20 Amino Acids calculated by DragonAAACF
BLOSUM100 Matrix for 20 Amino AcidsAABLOSUM100
BLOSUM45 Matrix for 20 Amino AcidsAABLOSUM45
BLOSUM50 Matrix for 20 Amino AcidsAABLOSUM50
BLOSUM62 Matrix for 20 Amino AcidsAABLOSUM62
BLOSUM80 Matrix for 20 Amino AcidsAABLOSUM80
Burden Eigenvalues Descriptors for 20 Amino Acids calculated by DragonAABurden
Connectivity Indices Descriptors for 20 Amino Acids calculated by DragonAAConn
Constitutional Descriptors for 20 Amino Acids calculated by DragonAAConst
CPSA Descriptors for 20 Amino Acids calculated by Discovery StudioAACPSA
All 2D Descriptors for 20 Amino Acids calculated by DragonAADescAll
Edge Adjacency Indices Descriptors for 20 Amino Acids calculated by DragonAAEdgeAdj
Eigenvalue-Based Indices Descriptors for 20 Amino Acids calculated by DragonAAEigIdx
Functional Group Counts Descriptors for 20 Amino Acids calculated by DragonAAFGC
Geometrical Descriptors for 20 Amino Acids calculated by DragonAAGeom
GETAWAY Descriptors for 20 Amino Acids calculated by DragonAAGETAWAY
AAindex Data of 544 Physicochemical and Biological Properties for 20 Amino AcidsAAindex
Information Indices Descriptors for 20 Amino Acids calculated by DragonAAInfo
Meta Information for the 20 Amino AcidsAAMetaInfo
2D Descriptors for 20 Amino Acids calculated by MOE 2011.10AAMOE2D
3D Descriptors for 20 Amino Acids calculated by MOE 2011.10AAMOE3D
Molecular Properties Descriptors for 20 Amino Acids calculated by DragonAAMolProp
PAM120 Matrix for 20 Amino AcidsAAPAM120
PAM250 Matrix for 20 Amino AcidsAAPAM250
PAM30 Matrix for 20 Amino AcidsAAPAM30
PAM40 Matrix for 20 Amino AcidsAAPAM40
PAM70 Matrix for 20 Amino AcidsAAPAM70
Randic Molecular Profiles Descriptors for 20 Amino Acids calculated by DragonAARandic
RDF Descriptors for 20 Amino Acids calculated by DragonAARDF
Topological Descriptors for 20 Amino Acids calculated by DragonAATopo
Topological Charge Indices Descriptors for 20 Amino Acids calculated by DragonAATopoChg
Walk and Path Counts Descriptors for 20 Amino Acids calculated by DragonAAWalk
WHIM Descriptors for 20 Amino Acids calculated by DragonAAWHIM
Auto Cross Covariance (ACC) for Generating Scales-Based Descriptors of the Same Lengthacc
Parallel Protein Sequence Similarity Calculation Between Two Sets Based on Sequence Alignment (In-Memory Version)crossSetSim
Parallel Protein Sequence Similarity Calculation Between Two Sets Based on Sequence Alignment (Disk-Based Version)crossSetSimDisk
Amino Acid Composition DescriptorextractAAC
Amphiphilic Pseudo Amino Acid Composition (APseAAC) DescriptorextractAPAAC
BLOSUM and PAM Matrix-Derived DescriptorsextractBLOSUM
CTD Descriptors - CompositionextractCTDC
CTD Descriptors - Composition (with customized amino acid classification support)extractCTDCClass
CTD Descriptors - DistributionextractCTDD
CTD Descriptors - Distribution (with customized amino acid classification support)extractCTDDClass
CTD Descriptors - TransitionextractCTDT
CTD Descriptors - Transition (with customized amino acid classification support)extractCTDTClass
Conjoint Triad DescriptorextractCTriad
Conjoint Triad Descriptor (with customized amino acid classification support)extractCTriadClass
Dipeptide Composition DescriptorextractDC
Scales-Based Descriptors with 20+ classes of Molecular DescriptorsextractDescScales
Scales-Based Descriptors derived by Factor AnalysisextractFAScales
Geary Autocorrelation DescriptorextractGeary
Scales-Based Descriptors derived by Multidimensional ScalingextractMDSScales
Moran Autocorrelation DescriptorextractMoran
Normalized Moreau-Broto Autocorrelation DescriptorextractMoreauBroto
Pseudo Amino Acid Composition (PseAAC) DescriptorextractPAAC
Amino Acid Properties Based Scales Descriptors (Protein Fingerprint)extractProtFP
Amino Acid Properties Based Scales Descriptors (Protein Fingerprint) with Gap SupportextractProtFPGap
Compute PSSM (Position-Specific Scoring Matrix) for given protein sequenceextractPSSM
Profile-based protein representation derived by PSSM (Position-Specific Scoring Matrix) and auto cross covarianceextractPSSMAcc
Profile-based protein representation derived by PSSM (Position-Specific Scoring Matrix)extractPSSMFeature
Quasi-Sequence-Order (QSO) DescriptorextractQSO
Scales-Based Descriptors derived by Principal Components AnalysisextractScales
Scales-Based Descriptors derived by Principal Components Analysis (with Gap Support)extractScalesGap
Sequence-Order-Coupling NumbersextractSOCN
Tripeptide Composition DescriptorextractTC
Retrieve Protein Sequences from UniProt by Protein IDgetUniProt
OptAA3d.sdf - 20 Amino Acids Optimized with MOE 2011.10 (Semiempirical AM1)OptAA3d
Protein Similarity Calculation based on Gene Ontology (GO) SimilarityparGOSim
Parallel Protein Sequence Similarity Calculation Based on Sequence Alignment (In-Memory Version)parSeqSim
Parallel Protein Sequence Similarity Calculation Based on Sequence Alignment (Disk-Based Version)parSeqSimDisk
Protein sequence amino acid type sanity checkprotcheck
Protein Sequence Segmentation/Partitionprotseg
Read Protein Sequences in FASTA FormatreadFASTA
Read Protein Sequences in PDB FormatreadPDB
Remove or replace gaps from protein sequences.removeGaps
Protein Similarity Calculation based on Gene Ontology (GO) SimilaritytwoGOSim
Protein Sequence Alignment for Two Protein SequencestwoSeqSim