Progression Reconstruction from Unsynchronized Biological Data using Cluster Spanning Trees
Identifying the progression-order of an unsynchronized set of biological samples is crucial for comprehending the dynamics of the underlying molecular interactions. It is also valuable in many applied problems such as data denoising and synchronization, tumor classification and cell lineage identification. Current methods that attempt solving this problem are ultimately based either on polynomial and piece-wise approximation of the unknown generating function or its reconstruction through the use of spanning trees. Such approaches face difficulty when it is necessary to factor-in complex relationships within the data such as partial ordering or bifurcating or multifurcating progressions. We propose the notion of Cluster Spanning Trees (CST) that can model both linear as well as the aforementioned complex progression relationships in data. Through a number of experiments on synthetic data sets as well as datasets from the cell cycle, cellular differentiation, and phenotypic screening, we show that the proposed CST approach outperforms the previous approaches in reconstructing the temporal progression of the data.
KeywordsSpan Tree Minimum Span Tree Reconstruction Error Natural Cluster Phenotypic Screening
This research was funded in part by the National Science Foundation grant IIS-0644418 and the National Institutes of Health grant 1R01A1089896.
- 5.Boruvka, O.: Contribution to the solution of a problem of economical construction of electrical networks. Elektronický Obzor 15, 153–154 (1926)Google Scholar
- 6.Sokal, R.R.: A statistical method for evaluating systematic relationships. Univ Kans Sci Bull. 38, 1409–1438 (1958)Google Scholar
- 14.Arreola, L.R., Long, T., Asarnow, D., Suzuki, B.M., Singh, R., Caffrey, C.: Chemical and genetic validation of the Statin drug target for the potential treatment of the Helminth disease. Schistosomiasis PLoS One 9, 1 (2014)Google Scholar
- 16.1000 Genomes Project Consortium.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010)Google Scholar