PPEPFinder

Plant Pathogen Effector Protein Finder

Why is the prediction of effector proteins important in fungi and oomycetes?

Fungi and oomycetes are among the most destructive eukaryotic plant pathogens, often responsible for devastating plant diseases worldwide. Their successful colonization of host plants largely depends on the secretion of effector proteins that function within host cells to suppress immune responses and promote pathogen infection, ultimately leading to disease symptoms or even host death. Therefore, the prediction of effector proteins plays an important role in deciphering plant-pathogen relationships.

What is PPEPFinder?

PPEPFinder is an integrated deep learning framework designed to predict fungal and oomycete effector proteins accurately. PPEPFinder leverages both protein sequence and 3D structural information to build a comprehensive ensemble model. The framework consists of three sub-models:

  1. A sequence-based model that utilizes embeddings from the pre-trained ESM language model as input to a Transformer encoder;
  2. A structure-based graph model that represents proteins as residue-level contact networks, where node features are ESM embeddings and edges reflect spatial distances, processed via a Graph Attention Network (GAT);
  3. A second structure-based graph model that incorporates structural embeddings generated by the structure pre-trained model SaProt as node features, also using GAT for representation learning.

To integrate the predictive capabilities of all three sub-models, PPEPFinder employs a logistic regression model that combines their outputs into a final prediction score.

It is important to note that PPEPFinder does not explicitly predict or report specific effector-related characteristics such as RXLR motifs, WY domains, Cys-rich regions, or the presence of a signal peptide. Although these features were represented in the training dataset and may implicitly contribute to the model’s learning process, they are not individually assessed during prediction. Users who wish to include these characteristics in their analyses may pre-screen their protein sequences using existing tools prior to submission, or examine the predicted effector sequences afterwards with specialized bioinformatics programs.

How to use PPEPFinder?

Users first select the organism type (fungus or oomycete), then choose:

⚠️ Note: Full PPEPFinder model only supports a single protein per processing.

Job List: Viewing Prediction Results

Each result includes:

  • Protein ID: The identifier of the input protein
  • Prediction Score: The confidence score predicted by the model
  • Is Effector?: Indicates whether the protein is predicted to be an effector

⚠️ Note: Proteins with a prediction score greater than 0.5 are classified as effectors.

Memory Limitation Notice

PPEPFinder relies on deep learning models that require substantial system memory during prediction. If the input size is too large or if multiple tasks are submitted simultaneously, the prediction process may be terminated by the system, resulting in a failure message.

To avoid this issue, please:

  • Limit each submission to no more than 50 FASTA sequences
  • Submit a new task only after the previous task has completed
  • Avoid extremely long protein sequences

Model Performance

Query proteins with a prediction score greater than the default prediction threshold value (i.e., 0.5) are classified as effectors.

What are Precision and Recall?

Precision measures the model's ability to correctly identify positive cases among all predicted positives.
Precision = TP / (TP + FP)

Recall (also called Sensitivity) measures the model's ability to correctly identify positive cases among all actual positives.
Recall = TP / (TP + FN)

- TP: True Positives (correctly predicted positives)
- FP: False Positives (negatives wrongly predicted as positives)
- FN: False Negatives (positives wrongly predicted as negatives)

Fungi

Model Precision Recall
PPEPFinder 0.816 0.741
Sequence-based 0.771 0.685

Oomycetes

Model Precision Recall
PPEPFinder 0.781 0.792
Sequence-based 0.713 0.792

Bacterial T3SE

Model Precision Recall
PPEPFinder 0.964 0.853
Sequence-based 0.915 0.790

Download: Training & Additional Data

The Download module provides access to datasets used to train the models, including both positive and negative examples (ratio 1:3).