A Python Toolkit and Web Server for Calculating a Wide Range of Structural and Physicochemical Feature Descriptors from Protein and Peptide Sequences
Structural and physiochemical descriptors extracted from protein sequences have been widely used to represent protein sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as other macromolecules. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. In addition to the default parameters, it also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating feature generation, analysis, training and benchmarking of machine-learning models and predictions. In addition to the Python toolkit, we have also implemented an online web server of iFeature.
The feature extraction algorithms (or encoding schemes) include:
Group 1: Amino acid composition (6 descriptors) Group 2: Grouped amino acid composition (5 descriptors) Group 3: Binary (1 descriptor) Group 4: Autocorrelation (3 descriptors) Group 5: C/T/D (3 descriptors) Group 6: Conjoint Triad (2 descriptors) Group 7: Quasi-sequence-order (2 descriptors) Group 8: Pseudo-amino acid composition (2 descriptors) Group 9: K-nearest neighbor (2 descriptors) Group 10: PSSM
(1 descriptor) Group 11: AAindex
(1 descriptor) Group 12: BLOSUM62
(1 descriptor) Group 13: Z-scale
(1 descriptor) Group 14: Predicted secondary structure (2 descriptors) Group 15: Predicted protein disorder (3 descriptors) Group 16: Predicted accessible surface area (1 descriptor) Group 17: Predicted main-chain torsional angles (1 descriptor) Group 18: Pseudo K-tuple reduced amino acids composition (16 descriptors)
The feature analysis algorithms include:
K-Means clustering (kmeans) Hierarchical clustering (hcluster) Mean Shift clustering (meanshift) DBSCAN (dbscan) Affinity Propagation (apc) Chi-Square based feature selection (CHI2) Information Gain based feature selection (IG) Mutual Information based feature selection (MIC) Pearson Correlation based feature selection (pearsonr) Principal component analysis (PCA) Latent dirichlet allocation (LDA) t-Distributed Stochastic Neighbor Embedding (t-SNE)
Who are using?
If you find iFeature useful, please kindly cite the following paper:
Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT, Wang Y, Webb GI, Smith AI, Daly RJ*, Chou KC*, Song J*. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018, Volume 34, Issue 14, 15 July 2018, Pages 2499–2502, doi: 10.1093/bioinformatics/bty140.
Backend computation is powered by our Python package iFeature.