Interpolation of microbiome composition in longitudinal data sets

Omri Peleg, Elhanan Borenstein*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The human gut microbiome significantly impacts health, prompting a rise in longitudinal studies that capture microbiome samples at multiple time points. Such studies allow researchers to characterize microbiome changes over time, but importantly, also present major analytical challenges due to incomplete or irregular sampling. To address this challenge, longitudinal microbiome studies often employ various interpolation methods, aiming to infer missing microbiome data. However, to date, a comprehensive assessment of such microbiome interpolation techniques, as well as best practice guidelines for interpolating microbiome data, is still lacking. This work aims to fill this gap, rigorously implementing and systematically evaluating a large array of interpolation methods, spanning several different categories, for longitudinal microbiome interpolation. To assess each method and its ability to accurately infer microbiome composition at missing time points, we used three longitudinal microbiome data sets that follow individuals over a long period of time and a leave-one-out approach. Overall, our analysis demonstrated that the K-nearest neighbors algorithm consistently outperforms other methods in interpolation accuracy, yet, accuracy varied widely across data sets, individuals, and time. Factors such as microbiome stability, sample size, and the time gap between interpolated and adjacent samples significantly influenced accuracy, allowing us to develop a model for predicting the expected interpolation accuracy at a missing time point. Our findings, combined, suggest that accurate interpolation in longitudinal microbiome data is feasible, especially in dense cohorts. Furthermore, using our predictive model, future studies can interpolate data only in time points where the expected interpolation accuracy is high. IMPORTANCE Since missing samples are common in longitudinal microbiome dataset due to inconsistent collection practices, it is important to evaluate and benchmark different interpolation methods for predicting microbiome composition in such samples and facilitate downstream analysis. Our study rigorously evaluated several such methods and identified the K-nearest neighbors approach as particularly effective for this task. The study also notes significant variability in interpolation accuracy among individuals, influenced by factors such as age, sample size, and sampling frequency. Furthermore, we developed a predictive model for estimating interpolation accuracy at a specific time point, enhancing the reliability of such analyses in future studies. Combined, our study, thus, provides critical insights and tools that enhance the accuracy and reliability of data interpolation methods in the growing field of longitudinal microbiome research.

Original languageEnglish
JournalmBio
Volume15
Issue number9
DOIs
StatePublished - Sep 2024

Keywords

  • interpolation
  • longitudinal data
  • microbiome

Fingerprint

Dive into the research topics of 'Interpolation of microbiome composition in longitudinal data sets'. Together they form a unique fingerprint.

Cite this