Abstract
An abundance of biological data sources contain data on classes of scientific entities, such as genes and sequences. Logical relationships between scientific objects are implemented as URLs and foreign IDs. Query processing typically involves traversing links and paths (concatenation of links) through these sources. We model the data objects in these sources and the links between objects as an object graph. Analogous to database cost models, we use samples and statistics from the object graph to develop a framework to estimate the result size for a query on the object graph.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Davidson, S., Cabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, C., Stoeckert, C.: K2/Kleisli and GUS: Experiments in integrated access to genomic data sources. IBM Systems Journal 40(2) (2001)
Eckman, B., Kosky, A., Laroco, L.: Extending traditional query-based integration approaches for functional characterization of post-genomic data. BioInformatics 17(2) (2000)
Eckman, B., Lacroix, Z., Raschid, L.: Optimized seamless integration of biomolecular data. In: Proc. of the IEEE Int. Symp. on Bio-Informatics and Biomedical Engineering (2001)
Etzold, T., Argos, P.: SRS: An indexing and retrieval tool for flat file data libraries. Computer Applications of Biosciences 9(1) (1993)
Feller, W.: An Introduction to Probability Theory and Its Applications. John Wiley & Sons, New York (1968)
Haas, L., Kodali, P., Rice, J., Schwarz, P., Swope, W.: Integrating life sciences data - with a little Garlic. In: Proc. of the IEEE Int. Symp. on Bio-Informatics and Biomedical Engineering (2000)
Kemp, G., Robertson, C., Gray, P.: Efficient access to biological databases using CORBA. CCP11 Newsletter 3.1(7) (1999)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. ACM Computing Surveys 46(5), 604–632 (1999)
Lacroix, Z., Murthy, H., Naumann, F., Raschid, L.: Links and paths through life sciences data sources. Technical Report, Humboldt-Universität zu Berlin, Institut für Informatik (2004)
Lacroix, Z., Raschid, L., Vidal, M.-E.: Efficient techniques to explore and rank paths in life science data sources. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 187–202. Springer, Heidelberg (2004)
Mork, P., Shaker, R., Halevy, A., Tarczy-Hornoch, P.: PQL: A declarative query language over dynamic biological data. In: Proc. of the AMIA (2002)
Paton, N.W., Stevens, R., Baker, P.G., Goble, C.A., Bechhofer, S., Brass: Query processing in the tambis bioinformatics source integration system. In: Proc. of the IEEE Intl. Conf. on Scientific and Statistical Databases, SSDBM (1999)
Polyzotis, N., Garofalakis, M.: Structure and value synopses for XML data graphs. In: Proc. of the Conf. on Very Large Databases, VLDB (2002)
Topaloglou, T., Kosky, A., Markovitz, V.: Seamless integration of biological applications within a database framework. In: Proc. of the Intl. Conf. on Intelligent Systems for Molecular Biology, ISMB (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lacroix, Z., Murthy, H., Naumann, F., Raschid, L. (2004). Links and Paths through Life Sciences Data Sources. In: Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2004. Lecture Notes in Computer Science(), vol 2994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24745-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-24745-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21300-0
Online ISBN: 978-3-540-24745-6
eBook Packages: Springer Book Archive