Abstract
In our real life, there is much hybrid data which contains not only unstructured data but also structured data. In general, the majority techniques of text search on hybrid data are only focused on unstructured data (text) ignoring the structured data. So this may lead a bad ranking of the searching results. In this paper, we describe a new method about improving text search using structured data. Our contributions are summarized as follows: (i) We build the uniform problem model; (ii) Ours is the first approach adopting the mutual information of feature words to qualify the relevance (similarity) between two texts; and (iii) We utilize several rules to consider the structured data to improve text search and build our approach. Finally, experimental results show the relevance function and our approach guarantees the search results with high recall, top-k precision, Mean Average Precision and good search performance, respectively.
The work is partially supported by the National Natural Science Foundation of China (Nos. 60973018), the National Natural Science Foundation of China (No. 60973020), the Doctoral Fund of Ministry of Education of China (No. 20110042110028) and the Fundamental Research Funds for the Central Universities (Nos. N110804002, N110404015).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hahn, U., Honeck, M., Schulz, S.: Subword-based text retrieval. System Sciences (2003)
Huang, G., Zhang, X., Luoyang: Text Retrieval Based on Semantic Relationship. In: E-Product E-Service and E-Entertainment (ICEEE) (2010)
Liu, J., Zhou, H.: Computer Engineering Faculty. Research on the Chinese text retrieval method using context. In: Information Science and Engineering (ICISE), Huaiyin Institute of Technology, Huaian (2010)
Lei, J.: A Web Information Retrieval Method Based on Multilayer Vector Space Model. Computer Applications 24, 26–27 (2004)
Holt, J.D., Chung, S.M., Li, Y.: Usage of Mined Word Associations for Text Retrieval. In: Tools with Artificial Intelligence. Wright State Univ., Dayton (2007)
Strzalkowski, T.: Natural Language Information Retrieval. Text, Speech and Language Technology Book Series, vol. 7 (1999)
Rong., F.S., Jun, X.W.: Study on text semantic similarity in information retrieval. Information and Automation (2008)
Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword Search on Structured and Semi-Structured Data. In: SIGMOD (2009)
Sahami, M., Heilman, T.D.: A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. ACM, 1-59593-323-9/06/0005 (2006)
Hu, Q., Zhanget, L., et al.: Measuring relevance between discrete and continuous features based on neighborhood mutual information. In: TOC (2011)
Li, W.T.: Mutual information functions versus correlation functions. Stat. Phys. 60, 823–837 (1990)
Holt, J.D., Chung, S.M., Li, Y.: Usage of Mined Word Associations for Text Retrieval. In: ICTAI (2007)
Arslan, A., Yilmazel, O.: A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval. Natural Language Processing and Knowledge Engineering (2008)
Chellappa, M., Kambhampaty, S. (Kanishka Syst.): Text retrieval-a trendy cocktail to address the dataworld. In: Computer Software and Applications Conference (1994)
Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Engineering Bulletin, Special Issue on Text and Databases 24(4) (December 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, H., Yang, X., Wang, B., Wang, Y. (2012). Improving Text Search on Hybrid Data. In: Bao, Z., et al. Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33050-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-33050-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33049-0
Online ISBN: 978-3-642-33050-6
eBook Packages: Computer ScienceComputer Science (R0)