Abstract
Analysis and processing of environmental information is considered of utmost importance for humanity. This article addresses the problem of discovery of web resources that provide environmental measurements. Towards the solution of this domain-specific search problem, we combine state-of-the-art search techniques together with advanced textual processing and supervised machine learning. Specifically, we generate domain-specific queries using empirical information and machine learning driven query expansion in order to enhance the initial queries with domain-specific terms. Multiple variations of these queries are submitted to a general-purpose web search engine in order to achieve a high recall performance and we employ a post processing module based on supervised machine learning to improve the precision of the final results. In this work, we focus on the discovery of weather forecast websites and we evaluate our technique by discovering weather nodes for south Finland.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Oyama, S., Kokubo, T., Ishida, T., Yamada, T., Kitamura, Y.: Keyword Spices: A New Method for Building Domain-Specific Web Search Engines. In: Proceedings of the 17th International Joint Conferences on Artificial Intelligence, pp. 1457–1463 (2001)
Oyama, S., Kokubo, T., Ishida, T.: Domain-Specific Web Search with Keyword Spices. IEEE Transactions on Knowledge and Data Engineering 16, 17–27 (2004)
Menemenis, F., Papadopoulos, S., Bratu, B., Waddington, S., Kompatsiaris, Y.: AQUAM: Automatic Query Formulation Architecture for Mobile Applications. In: Proceedings of the 7th International Conference on Mobile and Ubiquitous Multimedia MUM 2008, Umea, Sweden, December 3-5. ACM, New York (2008)
Shakes, J., Langheinrich, M., Etzioni, O.: Dynamic reference sifting: a case study in the homepage domain. In: Proceedings of the 6th International World Wide Web Conference (WWW6), pp. 189–200 (1997)
Chen, H., Fan, H., Chau, M., Zeng, D.: MetaSpider: Meta-Searching and Categorization on the Web. Journal of the American Society for Information Science and Technology 52(13), 1134–1147 (2001)
Luong, H.P., Gauch, S., Wang, Q.: Ontology-Based Focused Crawling. In: Int. Conference on Information, Process, and Knowledge Management, pp. 123–128 (2009)
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: A Machine Learning Approach to Building Domain-Specific Search Engines. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 662–667 (1999a)
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Building Domain-Specific Search Engines with Machine Learning Techniques. In: Proc. AAAI 1999 Spring Symposium on Intelligent Agents in Cyberspace (1999b)
Chakrabarti, S., van den Berg, M., Byron Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks: The International Journal of Computer and Telecommunications Networking 31(11-16), 1623–1640 (1999)
Tang, T.T., Hawking, D., Craswell, N., Sankaranarayana, R.S.: Focused crawling in depression portal search: A feasibility study. In: Proceedings of the 9th Australasian Document Computing Symposium, Melbourne, Australia, December 13 (2004)
Zheng, H.-T., Kanga, B.-Y., Kim, H.-G.: An ontology-based approach to learnable focused crawling. Information Sciences 178(23), 4512–4522 (2008)
Boser, B.E., Guyon, I.M., Va, V.N.: A training algorithm for optimal margin classifiers. In: COLT 1992: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Girardi, C.: The HLT Web Manager. FBK Technical Report n. 23969 (2011)
Pianta, E., Tonelli, S.: KX: A Flexible System for Keyphrase Extraction. In: Proceedings of SemEval 2010, Uppsala, Sweden (2010)
Pianta, E., Girardi, C., Zanoli, R.: The TextPro tool suite. In: Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco (2008)
Machine Learning Group at University of Waikato: Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moumtzidou, A., Vrochidis, S., Tonelli, S., Kompatsiaris, I., Pianta, E. (2012). Discovery of Environmental Nodes in the Web. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-31274-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31273-1
Online ISBN: 978-3-642-31274-8
eBook Packages: Computer ScienceComputer Science (R0)