|An improved rhythmicity analysis method using Gaussian Processes detects cell-density dependent circadian oscillations in stem cells.
|Year of Publication
|Sahay S, Adhikari S, Hormoz S, Chakrabarti S
|2023 Apr 18
Detecting oscillations in time series remains a challenging problem even after decades of research. In chronobiology, rhythms in time series (for instance gene expression, eclosion, egg-laying and feeding) datasets tend to be low amplitude, display large variations amongst replicates, and often exhibit varying peak-to-peak distances (non-stationarity). Most currently available rhythm detection methods are not specifically designed to handle such datasets. Here we introduce a new method, ODeGP ( scillation tection using aussian rocesses), which combines Gaussian Process (GP) regression with Bayesian inference to provide a flexible approach to the problem. Besides naturally incorporating measurement errors and non-uniformly sampled data, ODeGP uses a recently developed kernel to improve detection of non-stationary waveforms. An additional advantage is that by using Bayes factors instead of p-values, ODeGP models both the null (non-rhythmic) and the alternative (rhythmic) hypotheses. Using a variety of synthetic datasets we first demonstrate that ODeGP almost always outperforms eight commonly used methods in detecting stationary as well as non-stationary oscillations. Next, on analyzing existing qPCR datasets that exhibit low amplitude and noisy oscillations, we demonstrate that our method is more sensitive compared to the existing methods at detecting weak oscillations. Finally, we generate new qPCR time-series datasets on pluripotent mouse embryonic stem cells, which are expected to exhibit no oscillations of the core circadian clock genes. Surprisingly, we discover using ODeGP that increasing cell density can result in the rapid generation of oscillations in the gene, thus highlighting our methodâ€™s ability to discover unexpected patterns. In its current implementation, ODeGP (available as an R package) is meant only for analyzing single or a few time-trajectories, not genome-wide datasets.
|PubMed Central ID