PPEPFinder

Plant Pathogen Effector Protein Finder

Why is the prediction of effector proteins important in fungi and oomycetes?

Fungi and oomycetes are among the most destructive eukaryotic plant pathogens, often responsible for devastating plant diseases worldwide. Their successful colonization of host plants largely depends on the secretion of effector proteins that function within host cells to suppress immune responses and promote pathogen infection, ultimately leading to disease symptoms or even host death. Therefore, the prediction of effector proteins plays an important role in deciphering plant-pathogen relationships.

What is PPEPFinder?

PPEPFinder is an integrated deep learning framework designed to predict fungal and oomycete effector proteins accurately. PPEPFinder leverages both protein sequence and 3D structural information to build a comprehensive ensemble model. The framework consists of three sub-models:

  1. A sequence-based model that utilizes embeddings from the pre-trained ESM language model as input to a Transformer encoder;
  2. A structure-based graph model that represents proteins as residue-level contact networks, where node features are ESM embeddings and edges reflect spatial distances, processed via a Graph Attention Network (GAT);
  3. A second structure-based graph model that incorporates structural embeddings generated by the structure pre-trained model SaProt as node features, also using GAT for representation learning.

To integrate the predictive capabilities of all three sub-models, PPEPFinder employs a logistic regression model that combines their outputs into a final prediction score.

It is important to note that PPEPFinder does not explicitly predict or report specific effector-related characteristics such as RXLR motifs, WY domains, Cys-rich regions, or the presence of a signal peptide. Although these features were represented in the training dataset and may implicitly contribute to the model’s learning process, they are not individually assessed during prediction. Users who wish to include these characteristics in their analyses may pre-screen their protein sequences using existing tools prior to submission, or examine the predicted effector sequences afterwards with specialized bioinformatics programs.

How to use PPEPFinder?

Users first select the organism type (fungus or oomycete), then choose:

⚠️ Note: Full PPEPFinder model only supports a single protein per processing.

Job List: Viewing Prediction Results

Each result includes:

  • Protein ID: The identifier of the input protein
  • Prediction Score: The confidence score predicted by the model
  • Is Effector?: Indicates whether the protein is predicted to be an effector

⚠️ Note: Proteins with a prediction score greater than 0.5 are classified as effectors.

Model Performance

Query proteins with a prediction score greater than the default prediction threshold value (i.e., 0.5) are classified as effectors.

What are Precision and Recall?

Precision measures the model's ability to correctly identify positive cases among all predicted positives.
Precision = TP / (TP + FP)

Recall (also called Sensitivity) measures the model's ability to correctly identify positive cases among all actual positives.
Recall = TP / (TP + FN)

- TP: True Positives (correctly predicted positives)
- FP: False Positives (negatives wrongly predicted as positives)
- FN: False Negatives (positives wrongly predicted as negatives)

Fungi

Model Precision Recall
PPEPFinder 0.816 0.741
Sequence-based 0.771 0.685

Oomycetes

Model Precision Recall
PPEPFinder 0.781 0.792
Sequence-based 0.713 0.792

Download: Training & Additional Data

The Download module provides access to datasets used to train the models, including both positive and negative examples (ratio 1:3).