In vivo transposon mutagenesis, coupled with deep sequencing, enables large-scale genome-wide mutant screens for genes essential in different growth conditions. We analyzed six large-scale studies performed on haploid strains of three yeast species (Saccharomyces cerevisiae, Schizosaccaromyces pombe, and Candida albicans), each mutagenized with two of three different heterologous transposons (AcDs, Hermes, and PiggyBac). Using a machine-learning approach, we evaluated the ability of the data to predict gene essentiality. Important data features included sufficient numbers and distribution of independent insertion events. All transposons showed some bias in insertion site preference because of jackpot events, and preferences for specific insertion sequences and short-distance vs long-distance insertions. For PiggyBac, a stringent target sequence limited the ability to predict essentiality in genes with few or no target sequences. The machine learning approach also robustly predicted gene function in less well-studied species by leveraging cross-species orthologs. Finally, comparisons of isogenic diploid versus haploid S. cerevisiae isolates identified several genes that are haplo-insufficient, while most essential genes, as expected, were recessive. We provide recommendations for the choice of transposons and the inference of gene essentiality in genome-wide studies of eukaryotic haploid microbes such as yeasts, including species that have been less amenable to classical genetic studies.
- Machine learning