PPVED: Plant Protein Variation Effect Detector

Function: prediction of the effect of single amino acid substitution on protein function in plants

1. Abstract

Single amino acid substitution (SAAS) produces the most common variant of protein function change under physiological conditions. As the number of SAAS events in plants have increased exponentially, an effective prediction tool is required to help identify and distinguish functional SAASs from the whole genome as either potentially causal traits or as variants. Here, we constructed a plant SAAS database that stores 12,865 SAASs in 6,172 proteins and developed a tool called Plant Protein Variation Effect Detector (PPVED) that predicts the effect of SAASs on protein function in plants. PPVED achieved an 87% predictive accuracy when applied to plant SAASs, an accuracy that was much higher than those from six human database software: SIFT, PROVEAN, PANTHER-PSEP, PhD-SNP, PolyPhen-2, and MutPred2. The predictive effect of six SAASs from three proteins in Arabidopsis and maize were validated with wet lab experiments, of which five substitution sites were accurately predicted. PPVED could facilitate the identification and characterization of genetic variants that explain observed phenotype variations in plants, contributing to solutions for challenges in functional genomics and systems biology. PPVED can be accessed under a CC-BY (4.0) license via http://www.ppved.org.cn.

2. Results

We compared the performance of PPVED with the six most-used software (SIFT, PROVEAN, PANTHER-PSEP, PhD-SNP, PolyPhen-2, and MutPred2) that linked to the human SAAS dataset. Six indicators of global performance for each software are listed in Table 1 for the benchmark dataset and in Table 2 for the independent dataset.

Table 1. Performance comparison of six existing software and PPVED under benchmark dataset.

Software MCC ACC SEN SPE PRE AUC
SIFT 0.475 0.726 0.873 0.581 0.671 0.833
PROVEAN 0.547 0.773 0.774 0.772 0.769 0.826
PANTHER-PSEP 0.356 0.681 0.756 0.594 0.681 0.704
PhD-SNP 0.442 0.720 0.679 0.761 0.736 0.720
PolyPhen-2 (HumDiv) 0.527 0.762 0.868 0.642 0.733 0.835
PolyPhen-2 (HumVar) 0.525 0.763 0.824 0.695 0.754 0.832
MutPred2 0.459 0.717 0.544 0.886 0.824 0.825
PPVED 0.744 0.872 0.886 0.857 0.859 0.940

Table 2. Performance comparison of six existing software and PPVED under independent dataset.

Software MCC ACC SEN SPE PRE AUC
SIFT 0.462 0.718 0.876 0.564 0.663 0.816
PROVEAN 0.512 0.756 0.761 0.751 0.749 0.817
PANTHER-PSEP 0.346 0.676 0.765 0.574 0.675 0.718
PhD-SNP 0.433 0.716 0.686 0.746 0.726 0.716
PolyPhen-2 (HumDiv) 0.534 0.766 0.866 0.653 0.738 0.836
PolyPhen-2 (HumVar) 0.532 0.767 0.820 0.708 0.760 0.833
MutPred2 0.432 0.704 0.529 0.876 0.807 0.808
PPVED 0.712 0.856 0.874 0.838 0.841 0.931

Note: MCC = Matthew's correlation coefficient, ACC = Accuracy, SEN = Sensitivity, SPE = Specificity, PRE = Precision, AUC = Area under the curve.

3. Full text links

(1) DOI: 10.1111/pbi.13823
(2) PubMed ID: 35398963
(3) Download: here