The separation of plant and fungal sequences in EST pools by bioinformatic methods is difficult because of sequence similarities between plants and fungi, lack of enough sequence information, and the short length of the isolated fragments. An algorithm and software that utilize the differences in codon usage bias to discriminate between plant and fungal sequences are described. The software (PF-IND) includes five pairs of fungi and their host plants that can be used to analyze a large number of related species. Analysis of a sequence provides an arbitrary value that defines the likelihood that a sequence will be a fungal or a plant gene. The software can distinguish between homologous fungal and plant genes and it helps identify the correct reading frame of unknown expressed sequence tags (ESTs) for which BLAST analyses do not provide clear information. Short sequences of 100-150 bp can be analyzed with high confidence. PF-IND analysis of 100 sequences derived from fungal infected plants identified the origin of 94 sequences. Only 66 sequences were identified by a BLASTX analysis of the same 100 ESTs. Overall, PF-IND is a novel bioinformatic tool aimed at assisting the research of fungus-plant interaction.
- Codon usage bias
- Fungus-plant interaction