Skip to main content

Discovery of Environmental Nodes in the Web

  • Conference paper
Multidisciplinary Information Retrieval (IRFC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7356))

Included in the following conference series:

Abstract

Analysis and processing of environmental information is considered of utmost importance for humanity. This article addresses the problem of discovery of web resources that provide environmental measurements. Towards the solution of this domain-specific search problem, we combine state-of-the-art search techniques together with advanced textual processing and supervised machine learning. Specifically, we generate domain-specific queries using empirical information and machine learning driven query expansion in order to enhance the initial queries with domain-specific terms. Multiple variations of these queries are submitted to a general-purpose web search engine in order to achieve a high recall performance and we employ a post processing module based on supervised machine learning to improve the precision of the final results. In this work, we focus on the discovery of weather forecast websites and we evaluate our technique by discovering weather nodes for south Finland.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Oyama, S., Kokubo, T., Ishida, T., Yamada, T., Kitamura, Y.: Keyword Spices: A New Method for Building Domain-Specific Web Search Engines. In: Proceedings of the 17th International Joint Conferences on Artificial Intelligence, pp. 1457–1463 (2001)

    Google Scholar 

  2. Oyama, S., Kokubo, T., Ishida, T.: Domain-Specific Web Search with Keyword Spices. IEEE Transactions on Knowledge and Data Engineering 16, 17–27 (2004)

    Article  Google Scholar 

  3. Menemenis, F., Papadopoulos, S., Bratu, B., Waddington, S., Kompatsiaris, Y.: AQUAM: Automatic Query Formulation Architecture for Mobile Applications. In: Proceedings of the 7th International Conference on Mobile and Ubiquitous Multimedia MUM 2008, Umea, Sweden, December 3-5. ACM, New York (2008)

    Google Scholar 

  4. Shakes, J., Langheinrich, M., Etzioni, O.: Dynamic reference sifting: a case study in the homepage domain. In: Proceedings of the 6th International World Wide Web Conference (WWW6), pp. 189–200 (1997)

    Google Scholar 

  5. Chen, H., Fan, H., Chau, M., Zeng, D.: MetaSpider: Meta-Searching and Categorization on the Web. Journal of the American Society for Information Science and Technology 52(13), 1134–1147 (2001)

    Article  Google Scholar 

  6. Luong, H.P., Gauch, S., Wang, Q.: Ontology-Based Focused Crawling. In: Int. Conference on Information, Process, and Knowledge Management, pp. 123–128 (2009)

    Google Scholar 

  7. McCallum, A., Nigam, K., Rennie, J., Seymore, K.: A Machine Learning Approach to Building Domain-Specific Search Engines. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 662–667 (1999a)

    Google Scholar 

  8. McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Building Domain-Specific Search Engines with Machine Learning Techniques. In: Proc. AAAI 1999 Spring Symposium on Intelligent Agents in Cyberspace (1999b)

    Google Scholar 

  9. Chakrabarti, S., van den Berg, M., Byron Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks: The International Journal of Computer and Telecommunications Networking 31(11-16), 1623–1640 (1999)

    Article  Google Scholar 

  10. Tang, T.T., Hawking, D., Craswell, N., Sankaranarayana, R.S.: Focused crawling in depression portal search: A feasibility study. In: Proceedings of the 9th Australasian Document Computing Symposium, Melbourne, Australia, December 13 (2004)

    Google Scholar 

  11. Zheng, H.-T., Kanga, B.-Y., Kim, H.-G.: An ontology-based approach to learnable focused crawling. Information Sciences 178(23), 4512–4522 (2008)

    Article  Google Scholar 

  12. Boser, B.E., Guyon, I.M., Va, V.N.: A training algorithm for optimal margin classifiers. In: COLT 1992: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)

    Chapter  Google Scholar 

  13. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm

    Google Scholar 

  14. Girardi, C.: The HLT Web Manager. FBK Technical Report n. 23969 (2011)

    Google Scholar 

  15. Pianta, E., Tonelli, S.: KX: A Flexible System for Keyphrase Extraction. In: Proceedings of SemEval 2010, Uppsala, Sweden (2010)

    Google Scholar 

  16. Pianta, E., Girardi, C., Zanoli, R.: The TextPro tool suite. In: Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  17. Machine Learning Group at University of Waikato: Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moumtzidou, A., Vrochidis, S., Tonelli, S., Kompatsiaris, I., Pianta, E. (2012). Discovery of Environmental Nodes in the Web. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31274-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31273-1

  • Online ISBN: 978-3-642-31274-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics