TY - JOUR
T1 - Codon usage and expression-based features significantly improve prediction of CRISPR efficiency
AU - Bergman, Shaked
AU - Tuller, Tamir
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - CRISPR is a precise and effective genome editing technology; but despite several advancements during the last decade, our ability to computationally design gRNAs remains limited. Most predictive models have relatively low predictive power and utilize only the sequence of the target site as input. Here we suggest a new category of features, which incorporate the target site genomic position and the presence of genes close to it. We calculate four features based on gene expression and codon usage bias indices. We show, on CRISPR datasets taken from 3 different cell types, that such features perform comparably with 425 state-of-the-art predictive features, ranking in the top 2–12% of features. We trained new predictive models, showing that adding expression features to them significantly improves their r2 by up to 0.04 (relative increase of 39%), achieving average correlations of up to 0.38 on their validation sets; and that these features are deemed important by different feature importance metrics. We believe that incorporating the target site’s position, in addition to its sequence, in features such as we have generated here will improve our ability to predict, design and understand CRISPR experiments going forward.
AB - CRISPR is a precise and effective genome editing technology; but despite several advancements during the last decade, our ability to computationally design gRNAs remains limited. Most predictive models have relatively low predictive power and utilize only the sequence of the target site as input. Here we suggest a new category of features, which incorporate the target site genomic position and the presence of genes close to it. We calculate four features based on gene expression and codon usage bias indices. We show, on CRISPR datasets taken from 3 different cell types, that such features perform comparably with 425 state-of-the-art predictive features, ranking in the top 2–12% of features. We trained new predictive models, showing that adding expression features to them significantly improves their r2 by up to 0.04 (relative increase of 39%), achieving average correlations of up to 0.38 on their validation sets; and that these features are deemed important by different feature importance metrics. We believe that incorporating the target site’s position, in addition to its sequence, in features such as we have generated here will improve our ability to predict, design and understand CRISPR experiments going forward.
UR - http://www.scopus.com/inward/record.url?scp=85203009563&partnerID=8YFLogxK
U2 - 10.1038/s41540-024-00431-8
DO - 10.1038/s41540-024-00431-8
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 39227603
AN - SCOPUS:85203009563
SN - 2056-7189
VL - 10
JO - npj Systems Biology and Applications
JF - npj Systems Biology and Applications
IS - 1
M1 - 100
ER -