Knowledge and Information Systems

, Volume 38, Issue 1, pp 109–140 | Cite as

Context-based information analysis for the Web environment

Regular Paper

Abstract

Finding the relevant set of information that satisfies an information request of a Web user in the availability of today’s vast amount of digital data is becoming a challenging problem. Currently, available Information Retrieval (IR) Systems are designed to return long lists of results, only a few of which are relevant for a specific user. In this paper, an IR method called Context-Based Information Analysis (CONIA) that investigates the context information of the user and user’s information request to provide relevant results for the given domain users is introduced. In this paper, relevance is measured by the semantics of the information provided in the documents. The information extracted from lexical and domain ontologies is integrated by the user’s interest information to expand the terms entered in the request. The obtained set of terms is categorized by a novel approach, and the relations between the categories are obtained from the ontologies. This categorization is used to improve the quality of the document selection by going beyond checking the availability of the words in the document by analyzing the semantic composition of the mapped terms.

Keywords

Information retrieval Ontology Context-based search Relevance Query 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Auger A, Morin M-A (2005) TerroGate: a new information extraction technology designed for the terrorism domain. Défense Sécurité Innovation, Quebec City, QuébecGoogle Scholar
  2. 2.
    Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading. ISBN 0-201-39829-XGoogle Scholar
  3. 3.
    Baldauf M, Dustdar S, Rosenberg F (2004) A survey on context-awere systems. Int J Ad Hoc Ubiquit Comput 2(4): 263–277Google Scholar
  4. 4.
    Baralis E, Cagliero L, Cerquitelli T, Garza P, Marchetti M (2011) CAS-Mne: providing personalized services in context-aware applications by means of generalized rules. Knowl Inf Syst 28(2): 283–310CrossRefGoogle Scholar
  5. 5.
    Bellahsene Z, Bonifati A, Duchateau F, Velegrakis Y (2011) On evaluating schema matching and mapping, 1st edn, XII, p 314Google Scholar
  6. 6.
    Broder A (2002) A taxonomy of Web search. SIGIR Forum. #36(2):3–10Google Scholar
  7. 7.
    Chen C-L, Tseng FSC, Liang T (2011) An integration of fuzzy association rules and WordNet for focumenting clustering. Knowl Inf Syst 28(3): 687–708CrossRefGoogle Scholar
  8. 8.
    Clarke CLA, Cormack GV, Tudhope EA (2000) Relevance ranking for one to three term queries. Inf Process Manag Kowalski 36(2): 291–311CrossRefGoogle Scholar
  9. 9.
    Falconer SM, Noy NF (2011) Interactive techniques to support ontology matching. Data-centric systems and applications. Springer, Berlin. doi:10.1007/978-3-642-16518-42
  10. 10.
    Fodeh S, Punch B, Tan P-N (2011) On ontology-driven document clustering using core semantic features. Knowl Inf Syst 28(2): 395–421CrossRefGoogle Scholar
  11. 11.
    Gao J, Walker S, Robertson S, Cao G, He H, Zhang M, Nie J-Y (2001) TREC-10 web track experiments at MSRA. In: NIST special publication : the tenth text retrieval conference (TREC). National Institute of Standards and Technology, GaithersburgGoogle Scholar
  12. 12.
    Gerald S, Michael JM (2008) Introduction to modern information retrieval. McGraw-Hill, New YorkGoogle Scholar
  13. 13.
    Greisdorf H, Spink A (2001) Median measure: an approach to IR systems evaluation. Inf Process Manag Elisever 37: 843–857MATHCrossRefGoogle Scholar
  14. 14.
    Griesbaum J (2004) Evaluation of three German Search engines: Altavista.de, Google.de and Lycos.de. Inf Res 9(4): 9–4.Google Scholar
  15. 15.
    Grunewald L, McNutt G, Mercier A (2003) Using an ontology to improve search in a terrorism database system. In: Proceedings of the 14th international workshop on database and expert system applications, DEXAGoogle Scholar
  16. 16.
    Guarino N (1998) Formal ontology in information systems. IOS Press, AmsterdamGoogle Scholar
  17. 17.
    Gulla JA, Auran PG, Risvik KM (2002) Linguistics in large-scale web search. In: Proceedings of the 7th international conference on applications of natural language to information systems NLDB, StockholmGoogle Scholar
  18. 18.
    Gupta DK (2005) Exploring roots of terrorism. In: Bjørgo T (eds) Root causes of terrorism. Routledge, LondonGoogle Scholar
  19. 19.
    Hawking D, Thistlewaite P (1996) Relevance weighting using distance between term occurrences, Unpublished manuscript, joint computer science technical report series, The Australian National UniversityGoogle Scholar
  20. 20.
    Hersh W, Buckley C, Leone T, Hickman D (1994) Ohsumed: an interactive retrieval evaluation and new large text collection for research. In: Proceedings of SIGIR-94, 17th ACM international conference on research and development in information retrieval, DublinGoogle Scholar
  21. 21.
    Henzinger MR, Motwani R, Silverstein C (2002) Challenges in web search engines. SIGIRGoogle Scholar
  22. 22.
    Houghton B (2002) Understanding the terrorism database. National Memorial Institute for Prevention of Terrorism Quarterly BulletinGoogle Scholar
  23. 23.
    Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4): 422–446CrossRefGoogle Scholar
  24. 24.
    Jiang J, Conrath, D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings on international conference on research in computational linguistics, TaiwanGoogle Scholar
  25. 25.
    Kowalski G (1997) Information retrieval systems, theory and implementation. Kluwer Academic Publishers, BostonMATHGoogle Scholar
  26. 26.
    Krebs VE (2001) Mapping networks of terrorist cells. Connections 24(3): 43–52Google Scholar
  27. 27.
    Liu S, Liu F, Yu CT, Meng W (2004) An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrievalGoogle Scholar
  28. 28.
    Liu B (2006) Personal evaluations of search engines: Google, Yahoo! and MSN. Department of Computer Science University of Illinois at ChicagoGoogle Scholar
  29. 29.
    Liu T, Xu J, Qin T, Xiong W, Li H (2007) LETOR:benchmark dataset for research on learning to rank for information retrieval. In: SIGIR ’07 workshop on learning to rank for information retrievalGoogle Scholar
  30. 30.
    Long H, Lv B, Zhao T, Liu Y (2007) Evaluate and compare Chinese internet search engines based on users’ experience. In: Proceedings of IEEE wireless communications, networking and mobile computing conference, WiComGoogle Scholar
  31. 31.
    Lopez V, Victoria U, Marta S, Enrico M (2009) Cross ontology query answering on the semantic web: an initial evaluation. In: Proceedings of the 5th international conference on knowledge captureGoogle Scholar
  32. 32.
    Mannes A, Golbeck J (2007) Ontology building: a terrorism specialist’s perspective. In: Proceedings of the IEEE Aerospace conferenceGoogle Scholar
  33. 33.
    Michelizzi J (2005) Semantic relatedness applied to all words sense disambiguation. Master’s thesis, University of Minnesota, DuluthGoogle Scholar
  34. 34.
    Miniwatts Marketing Group, InternetStat (2009). www.internetworldstats.com/stats.htm
  35. 35.
    Moldovan DI, Mihalcea R (2000) Using WordNet and lexical operators to improve internet searches. IEEE Internet Comput 4(1): 34–43CrossRefGoogle Scholar
  36. 36.
    Morgan K (1992) MUC-4 proceedings. In: Proceedings of the fourth message understanding conference (MUC-4), San MateoGoogle Scholar
  37. 37.
    Patwardhan S, Banerjee S, Pedersen T (2003) Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the fourth international conference on intelligent text processing and computational linguisticsGoogle Scholar
  38. 38.
    Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet::similarity—measuring the relatedness of concepts. In: Proceedings of the nineteenth national conference on artificial intelligence (AAAI)Google Scholar
  39. 39.
    Pingdom R (2009) Internet 2009 in numbers. http://royal.pingdom.com/2010/01/22/internet-2009-in-numbers/
  40. 40.
    RAND Corporation (2003) Purpose and description of information found in the incident databases . http://www.tkb.org/RandSummary.jsp
  41. 41.
    Rasolofo Y, Savoy J (2003) Term proximity scoring for keyword-based retrieval systems. In Proceedings of the 25th European conference on IR researchGoogle Scholar
  42. 42.
    Roberto N, Litkowski KC, Orin H (2007) Coarse-grained english all words task. In Proceedings of the 4th international workshop on semantic evaluations semEval, PragueGoogle Scholar
  43. 43.
    Rose DE, Levinson D (2004) Understanding user goals in web search. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, ACM, New YorkGoogle Scholar
  44. 44.
    Salton G, McGill M (1983) An introduction to modem information retrieval. McGraw-Hill, NewYork, NYGoogle Scholar
  45. 45.
    Singhal A, Buckley C, Mitra M (1996) Pivoted document length normalization. In: ACM-SIGIR Conference on research and development in information retrievalGoogle Scholar
  46. 46.
    Smeaton AF, Berrut C (1996) Thresholding postings lists, query expansion by word-word distances and POS tagging of Spanish text. In: Proceedings of the fourth text retrieval conferenceGoogle Scholar
  47. 47.
    Smith BL, Damphousse KR (2002) The American terrorism study: indictment databaseGoogle Scholar
  48. 48.
    Spink A, Wolfram D, Jansen BJ, Saracevic T (2001) Searching the web: the public and their queries. J Am Soc Inf Sci 53(2): 226–234CrossRefGoogle Scholar
  49. 49.
    Tao T, ChengXiang Z (2007) An exploration of proximity measures in information retrieval, SIGIRGoogle Scholar
  50. 50.
    US Environmental Protection Agency (2009). http://www.epa.gov/air/urbanair/
  51. 51.
  52. 52.
    Webber W, Moffat A, Zobel J (1983) Score standardization for intercollection comparison of retrieval systems (SIGIR)Google Scholar
  53. 53.
    Weinberger H (2011) Search in context. Lecture notes in business information processing, enterprise information systemsGoogle Scholar
  54. 54.
    Zhang Z (2005) Ontology query languages for the semantic web: a performance evaluation. Master’s thesis, University of GeorgiaGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Information System EngineeringCyprus International UniversityHaspolat-LefkoşaTurkey
  2. 2.Semantic Information Research Laboratory, Department of Computer ScienceUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations