PPEPFinder

Why is the prediction of effector proteins important in fungi and oomycetes?

Fungi and oomycetes are among the most destructive eukaryotic plant pathogens, often responsible for devastating plant diseases worldwide. Their successful colonization of host plants largely depends on the secretion of effector proteins that function within host cells to suppress immune responses and promote pathogen infection, ultimately leading to disease symptoms or even host death. Therefore, the prediction of effector proteins plays an important role in deciphering plant-pathogen relationships.

What is PPEPFinder?

PPEPFinder is an integrated deep learning framework designed to predict fungal and oomycete effector proteins accurately. PPEPFinder leverages both protein sequence and 3D structural information to build a comprehensive ensemble model. The framework consists of three sub-models:

A sequence-based model that utilizes embeddings from the pre-trained ESM language model as input to a Transformer encoder;
A structure-based graph model that represents proteins as residue-level contact networks, where node features are ESM embeddings and edges reflect spatial distances, processed via a Graph Attention Network (GAT);
A second structure-based graph model that incorporates structural embeddings generated by the structure pre-trained model SaProt as node features, also using GAT for representation learning.

To integrate the predictive capabilities of all three sub-models, PPEPFinder employs a logistic regression model that combines their outputs into a final prediction score.

How to use PPEPFinder?

Users first select the organism type (fungi or oomycetes), then choose:

Sequence_Based_Model: You can upload protein sequences in FASTA format to perform predictions using a sequence-based Transformer model. This mode supports multiple sequences submitted at once.
Full PPEPFinder Model: If you also provide a PDB file containing the 3D structure of the protein, PPEPFinder will integrate predictions from three models.

⚠️ Note: Full PPEPFinder model only supports a single protein per processing.

Job List: Viewing Prediction Results

Each result includes:

Protein ID: The identifier of the input protein
Prediction Score: The confidence score predicted by the model
Is Effector?: Indicates whether the protein is predicted to be an effector

⚠️ Note: Proteins with a prediction score greater than 0.5 are classified as effectors.

Model Specificity

The specificity of the model when the prediction score is 0.5.

What is Specificity?
Specificity measures the model's ability to correctly identify negative cases (e.g., non-effectors). It is calculated as:

Specificity = TN / (TN + FP) = 1 - FPR

- TN: True Negatives (correctly predicted negatives)
- FP: False Positives (negatives wrongly predicted as positives)
- FPR: False Positive Rate = FP / (FP + TN)

Specificity	Fungi	Oomycetes
Sequence-Based	0.979	0.973
PPEPFinder	0.940	0.946

Download: Training & Additional Data

The Download module provides access to datasets used to train the models, including both positive and negative examples (ratio 1:3).