Uncertainty Representations for Information Retrieval with Missing Data

Abstract

Retrieving items such as similar past events, or vessels with a specific characteristic of interest, is a critical task for crisis management support. The problem of information retrieval from incomplete databases is addressed in this paper. In particular, we assess the impact of the uncertainty representation about missing data for retrieving the corresponding items. After a brief survey on the problem of missing data with an emphasis on the information retrieval application, we propose a novel approach for retrieving records with missing data. The general idea of the proposed data-driven approach is to model the uncertainty pertaining to this missing data. We chose the general model of belief functions as it encompasses as special cases both classical set and probability models. Several uncertainty models are then compared based on (1) an expressiveness criterion (non-specificity or randomness) and (2) objective measures of performance typical to the Information Retrieval domain. The results are illustrated on a real dataset and a simulation controlled missing data mechanism.

References

  1. Aamodt A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun 7(1):39–59Google Scholar
  2. Ahlgren P, Grönqvist L (2006) Retrieval evaluation with incomplete relevance data: a comparative study of three measures. In: 15th ACM international conference on information and knowledge management, ArlingtonGoogle Scholar
  3. Bach Tobji MA, Ben Yaghlane B, Mellouli K (2008) A new algorithm for mining frequent itemsets from evidential databases. In: Magdalena JVL, Ojeda-Aciego M (ed) Proceedings of IPMU, pp 1535–1542Google Scholar
  4. Brini A, Boughanem M, Dubois D (2005) A model for information retrieval based on possibilistic networks. In: String processing and information retrieval (SPIRE 2005), Buenos Aires. Lecture notes in computer sciences. Springer, New York, pp 271–282Google Scholar
  5. Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04), Sheffield, pp 25–32Google Scholar
  6. Burkhard H-D (2004) Case completion and similarity in case-based reasoning. Comput Sci Inf Syst 1(2):27–55CrossRefGoogle Scholar
  7. Chen LA (1988) On information retrieval and evidential reasoning. Tech. Rep. UCB/CSD-88-429, EECS Department, University of California, BerkeleyGoogle Scholar
  8. Chen N, Dahanayake A (2007) Role-based situation-aware information seeking and retrieval for crisis response. Int J Intell Control Syst 12:186–197Google Scholar
  9. Chowdhary KR, Bansal VS (2011) Information retrieval using probability and belief theory. In: International conference on emerging trends in networks and computer communications (ETNCC), pp 188–191Google Scholar
  10. Costa PCG, Laskey K, Blasch E, Jousselme A-L (2012) Towards unbiased evaluation of uncertainty reasoning: The URREF Ontology. In: Proceedings of the 15th International Conference on Information Fusion, SingaporeGoogle Scholar
  11. Crestani F, Lalmas M, Van Rijsbergen CJ, Campbell I (1998) Is this document relevant? probably: a survey of probabilistic models in information retrieval. ACM Comput Surv 30(4):528–552CrossRefGoogle Scholar
  12. Dalvi N, Re C, Suciu D (2009) Probabilistic databases: diamonds in the dirt (extended version). Commun ACM 52:86–94CrossRefGoogle Scholar
  13. da Silva WT, Milidiú RL (1993) Belief function model for information retrieval. J Am Soc Inf Sci 44(2):10–18CrossRefGoogle Scholar
  14. Farhangfar A, Kurgan L, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern - A: Syst and Humans 37(5):692–708CrossRefGoogle Scholar
  15. Fuhr N (1992) Probabilistic models in information retrieval. Comput J 35:243–255CrossRefMATHGoogle Scholar
  16. Hewawasam GK, Premaratne K, Subasingha M-L, Shyu SP (2005) Rule mining and classification in imperfect databases. In: Proceedings of the 7th international conference on information fusionGoogle Scholar
  17. Joussselme A-L, Maupin P (2012) A brief survey of comparative elements for uncertainty calculi and decision procedures assessment. In: Proceedings of the 15th international conference on information fusion, 2012. Panel Uncertainty Evaluation: Current Status and Major ChallengesGoogle Scholar
  18. Jousselme A-L, Maupin P (2013) Comparison of uncertainty representations for missing data in information retrieval. In: Proceedings of the international conference of information fusion, IstanbulGoogle Scholar
  19. Jousselme A-L, Grenier D, Bossé E (2001) A new distance between two bodies of evidence. Inf Fusion 2:91–101CrossRefGoogle Scholar
  20. Kim W, Choi B-J, Hong E-K, Kim S-K, Lee D (2003) A taxonomy of dirty data. Data Min Knowl Discov 7(1):81–99CrossRefMathSciNetGoogle Scholar
  21. Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice Hall International, Upper Saddle RiverMATHGoogle Scholar
  22. Lalmas M (1998) Information retrieval and Dempster-Shafer’s theory of evidence. In: Applications of uncertainty formalisms. Lecture notes in computer science, Chap. B. Springer Berlin/Heidelberg, pp 157–176Google Scholar
  23. Lee SK (1992) Imprecise and uncertain information in databases: an evidential approach. In: Proceedings of the 8th international conference data engineering, pp 614–621Google Scholar
  24. McClean S, Scotney B, Shapcott M (2001) Aggregation of imprecise and uncertain information in databases. IEEE Trans Knowl Data Eng 13:902CrossRefGoogle Scholar
  25. National Counterterrorism Center (NCTC) (2010) Worldwide Incidents Tracking System (WITS) report on terrorism. http://www.nctc.gov/, April 2011
  26. Schafer JL, John WG (2004) Missing data: our view of the state of the art. Psychol Methods 7(2):147–177CrossRefGoogle Scholar
  27. Schmidt R, Vorobieva O (2007) Applying case-based reasoning for missing medical data in ISOR. In: LWA 07, pp 275–280Google Scholar
  28. Telmoudi A, Chakhar S (2004) Data fusion application from evidential databases as a support for decision making. Inf Softw Technol 46:547–555CrossRefGoogle Scholar
  29. Wu S, McClean S (2006) Evaluation of system measures for incomplete relevance judgment in IR. In: Flexible query answering systems. Lecture notes in computer sciences, vol 4027. Springer, New York, pp 245–256Google Scholar
  30. Yassir A, Nayak S (2012) Issues in data mining and information retrieval. Int J Comput Sci Commun Netw 2:93–98Google Scholar
  31. Yi X (2011) Discovering and using implicit data for information retrieval. Ph.D. thesis, University of Massachusetts AmherstGoogle Scholar
  32. Zaffalon M (2002) Exact credal treatment of missing data. J Stat Plann Inference 105(1):105–122CrossRefMathSciNetMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.NATO Science and Technology OrganizationCentre for Maritime Research and Experimentation (CMRE)La SpeziaItaly
  2. 2.Command, Control and Intelligence (C2I) SectionDefence R & D Canada - ValcartierQuebecCanada

Personalised recommendations