Information Retrieval

, Volume 15, Issue 1, pp 54–92 | Cite as

Graph-based term weighting for information retrieval

Article

Abstract

A standard approach to Information Retrieval (IR) is to model text as a bag of words. Alternatively, text can be modelled as a graph, whose vertices represent words, and whose edges represent relations between the words, defined on the basis of any meaningful statistical or linguistic relation. Given such a text graph, graph theoretic computations can be applied to measure various properties of the graph, and hence of the text. This work explores the usefulness of such graph-based text representations for IR. Specifically, we propose a principled graph-theoretic approach of (1) computing term weights and (2) integrating discourse aspects into retrieval. Given a text graph, whose vertices denote terms linked by co-occurrence and grammatical modification, we use graph ranking computations (e.g. PageRank Page et al. in The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998) to derive weights for each vertex, i.e. term weights, which we use to rank documents against queries. We reason that our graph-based term weights do not necessarily need to be normalised by document length (unlike existing term weights) because they are already scaled by their graph-ranking computation. This is a departure from existing IR ranking functions, and we experimentally show that it performs comparably to a tuned ranking baseline, such as BM25 (Robertson et al. in NIST Special Publication 500-236: TREC-4, 1995). In addition, we integrate into ranking graph properties, such as the average path length, or clustering coefficient, which represent different aspects of the topology of the graph, and by extension of the document represented as a graph. Integrating such properties into ranking allows us to consider issues such as discourse coherence, flow and density during retrieval. We experimentally show that this type of ranking performs comparably to BM25, and can even outperform it, across different TREC (Voorhees and Harman in TREC: Experiment and evaluation in information retrieval, MIT Press, 2005) datasets and evaluation measures.

Keywords

Information retrieval Graph theory Natural language processing 

References

  1. Agirre, E., & Soroa, A. (2009). Personalizing pagerank for word sense disambiguation. In EACL (pp. 33–41). The Association for Computer Linguistics.Google Scholar
  2. Albert, R. (2005). Scale-free networks in cell biology. Journal of Cell Science, 118, 4947–4957.CrossRefGoogle Scholar
  3. Albert, R., & Barabási, A. L. (2001). Statistical mechanics of complex networks. CoRR cond-mat/0106096.Google Scholar
  4. Albert, R., & Barabási, A. L. (2002). Statistical mechanics of complex networks. Review of Modern Physics, 74, 47–97.MATHCrossRefGoogle Scholar
  5. Albert, R., Jeong, H., & Barabási, A. L. (1999). The diameter of the world wide web. CoRR cond-mat/9907038.Google Scholar
  6. Allan, J., Aslam, J. A., Sanderson, M., Zhai, C., & Zobel, J. (Eds.). (2009). Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR 2009. Boston, MA, USA: ACM. July 19–23.Google Scholar
  7. Anh, V. N., & Moffat, A. (2005). Simplified similarity scoring using term ranks. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’05 (pp. 226–233). New York, NY, USA: ACM. doi:http://doi.acm.org/10.1145/1076034.1076075.http://doi.acm.org/10.1145/1076034.107607.
  8. Antiqueira, L., Oliveira Jr., O. N., Costa, L. F., & Nunes, M. G. V. (2009). A complex network approach to text summarization. Information Science, 179(5), 584–599. doi:http://dx.doi.org/10.1016/j.ins.2008.10.03.
  9. Antiqueira, L. L., Pardo, T. A. S., Nunes, M., & Oliveira, J. O. N. (2007). Some issues on complex networks for author characterization. Inteligencia Artificial, Revista Iberoamericana de IA, 11(36), 51–58. url:http://iajournal.aepia.org/aepia/Uploads/36/420.pdf
  10. Baeza-Yates, R. A., & Ribeiro-Neto, B. A. (1999). Modern information retrieval. New York: ACM Press/Addison-Wesley.Google Scholar
  11. Baeza-Yates, R. A., Ziviani, N., Marchionini, G., Moffat, A., & Tait, J. (Eds.) (2005). SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. Salvador, Brazil: ACM. August 15–19.Google Scholar
  12. Barabási, A. L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3-4), 590–614. doi:10.1016/S0378-4371(02)00736-7. http://www.sciencedirect.com/science/article/B6TVG-45S9HG2-1/2/dff30ba73ddd8820aca3e7f072aa788.
  13. Barrat, A., Barthélemy, M., Pastor-Satorras, R., & Vespignani, A. (2004). The architecture of complex weighted networks. Proceedings of National Academic Science, 101(11), 3747–3752.Google Scholar
  14. Bekkerman, R., Zilberstein, S., & Allan, J. (2007). Web page clustering using heuristic search in the web graph. In IJCAI (pp. 2280–2285).Google Scholar
  15. Belew, R. K. (2011). Adaptive information retrieval: Using a connectionist representation to retrieve and learn about documents. In Belkin and van Rijsbergen (1989), pp. 11–20.Google Scholar
  16. Belew, R. K. (2005). Scientific impact quantity and quality: Analysis of two sources of bibliographic data. CoRR abs/cs/0504036.Google Scholar
  17. Belkin, N. J., & van Rijsbergen, C. J. (Eds.). (1989). SIGIR’89, 12th international conference on research and development in information retrieval. Cambridge, Massachusetts, USA: ACM. June 25–28 (Proceedings).Google Scholar
  18. Berlow, E. L. (1999). Strong effects of weak interactions in ecological communities. Nature, 398, 330–334.CrossRefGoogle Scholar
  19. Blanco, R., & Lioma, C. (2007). Random walk term weighting for information retrieval. In SIGIR (pp. 829–830).Google Scholar
  20. Blondel, V. D., Gajardo, A., Heymans, M., Senellart, P., & Dooren, P. V. (2004). A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM RevIEW, 46(4), 647–666. doi:http://dx.doi.org/10.1137/S003614450241596.Google Scholar
  21. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D. U. (2006). Complex networks: structure and dynamics. Physics Reports, 424, 175–308.MathSciNetCrossRefGoogle Scholar
  22. Bollobás, B. (1979). Graph theory: An introductory course. New York: Springer.MATHGoogle Scholar
  23. Bollobás, B. (1985). Random graphs. London: Academic Press.MATHGoogle Scholar
  24. Bookstein, A., Chiaramella, Y., Salton, G., & Raghavan, V. V. (Eds.). (1991). Proceedings of the 14th annual international ACM SIGIR conference on research and development in information retrieval. Chicago, Illinois, USA: ACM. October 13–16 (Special Issue of the SIGIR Forum).Google Scholar
  25. Bordag, S., Heyer, G., & Quasthoff, U. (2003). Small worlds of concepts and other principles of semantic search. In T. Bhme, G. Heyer, & H. Unger (Eds.), IICS, lecture notes in computer science (Vol. 2877, pp. 10–19). Springer. url: http://dblp.uni-trier.de/db/conf/iics/iics2003.html#BordagHQ0.
  26. Caldeira, S. M. G., Lobao, T. C. P., Andrade, R. F. S., Neme, A., & Miranda, J. G. V. (2005). The network of concepts in written texts.Google Scholar
  27. i Cancho, R. F., Capocci, A., & Caldarelli, G. (2007). Spectral methods cluster words of the same class in a syntactic dependency network. International Journal of Bifurcation and Chaos, 17(7), 2453–2463.MATHCrossRefGoogle Scholar
  28. i Cancho, R. F., Capocci, A., & Caldarelli, G. (2007). Spectral methods cluster words of the same class in a syntactic dependency network. International Journal of Bifurcation and Chaos, 17(7), 2453–2463.MATHCrossRefGoogle Scholar
  29. Cao, G., Nie, J. Y., & Bai, J. (2005). Integrating word relationships into language models. In: R. A. Baeza-Yates, N. Ziviani, G. Marchionini, A. Moffat, & J. Tait (Eds.), SIGIR (pp. 298–305).Google Scholar
  30. Chakrabarti, S. (2007). Dynamic personalized pagerank in entity-relation graphs. In Proceedings of the 16th international conference on World Wide Web, WWW ’07 (pp. 571–580). New York, NY, USA: ACM. doi:http://doi.acm.org/10.1145/1242572.124265. URL: http://doi.acm.org/10.1145/1242572.1242650
  31. Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., & Kleinberg, J. M. (1998). Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks, 30(1–7), 65–74.Google Scholar
  32. Choudhury, M., Thomas, M., Mukherjee, A., Basu, A., & Ganguly, N. (2007). How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach. In Proceedings of the second workshop on TextGraphs: Graph-based algorithms for natural language processing (pp. 81–88). Rochester, NY, USA: Association for Computational Linguistics. url: http://www.aclweb.org/anthology/W/W07/W07-021.
  33. Christensen, C., & Albert, R. (2007). Using graph concepts to understand the organization of complex systems. International Journal of Bifurcation and Chaos, 17(7), 2201–2214.MathSciNetMATHCrossRefGoogle Scholar
  34. Cramer, P. (1968). Word association. New York, USA: Academic Press.Google Scholar
  35. Craswell, N., Robertson, S. E., Zaragoza, H., & Taylor, M. J. (2005). Relevance weighting for query independent evidence. In SIGIR (pp. 416–423).Google Scholar
  36. Craswell, N., & Szummer, M. (2007) Random walks on the click graph. In Kraaij et al. (2007), pp. 239–246.Google Scholar
  37. Crestani, F., & van Rijsbergen, C. J. (1998). A study of probability kinematics in information retrieval. ACM Transaction of Informational System, 16(3), 225–255.CrossRefGoogle Scholar
  38. Deese, J. (1965). The structure of associations in language and thought. Baltimore, USA: The John Hopkins Press.Google Scholar
  39. Dorogovtsev, S. N., & Mendes, J. F. F. (2001). Language as an evolving word web. Proceedings of The Royal Society of London. Series B, Biological Sciences 268(1485), 2603–2606. doi:10.1098/rspb.2001.1824. url: http://www.isrl.uiuc.edu/amag/langev/paper/dorogovtsev01languageAs.htm.
  40. Dorogovtsev, S. N., & Mendes, J. F. F. (2002). Evolution of networks. Advances in Physics, 51, 1079–1187. doi:10.1098/rspb.2001.1824. http://www.isrl.uiuc.edu/amag/langev/paper/dorogovtsev01languageAs.htm.
  41. Doszkocs, T. E., Reggia, J., & Lin, X. (1990). Connectionist models and information retrieval. Annual Review of Information Science and Technology (ARIST), 25, 209–260.Google Scholar
  42. Eisner, J., Smith, N. A. (2005). Parsing with soft and hard constraints on dependency length. In Proceedings of the international workshop on parsing technologies (IWPT) (pp. 30–41). Vancouver. http://cs.jhu.edu/jason/papers/#iwpt05.
  43. Erdos, P., & Renyi, A. (1959). On random graphs i. Publicationes Mathematicae (Debrecen), 6, 290–297.MathSciNetGoogle Scholar
  44. Erdos, P., & Renyi, A. (1960). On the evolution of random graphs. Publication Mathematical Institution of Hungarian Academic Science, 5, 17–61.MathSciNetGoogle Scholar
  45. Erdos, P., & Renyi, A. (1961). On the evolution of random graphs. Bulletin Institution of International Statistics, 38, 343–347.MathSciNetGoogle Scholar
  46. Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 22, 457–479.Google Scholar
  47. Esuli, A., & Sebastiani, F. (2007) Pageranking wordnet synsets: An application to opinion mining. In The Association for Computer Linguistics (ACL).Google Scholar
  48. Faloutsos, M., Faloutsos, P., & Faloutsos, C. (1999). On power-law relationships of the internet topology. In SIGCOMM (pp. 251–262).Google Scholar
  49. Feinberg, M. (1980). Chemical oscillations, multiple equilibria, and reaction network structure. In W. Stewart, W. Rey, & C. Conley (Eds.), Dynamics of reactive systems (pp. 59–130). New York: Academic Press.Google Scholar
  50. Ferrer i Cancho, R. (2005). The structure of syntactic dependency networks: Insights from recent advances in network theory. In G. Altmann, V. Levickij, & V. Perebyinis (Eds.), The problems of quantitative linguistics (pp. 60–75). Chernivtsi: Ruta.Google Scholar
  51. Ferrer i Cancho, R., & Solé, R. V. (2001). Two regimes in the frequency of words and the origins of complex lexicons: Zipf’s law revisited. Journal of Quantitative Linguistics, 8(3), 165–173.CrossRefGoogle Scholar
  52. Ferrer i Cancho, R., Solé, R. V., & Köhler, R. (2004). Patterns in syntactic dependency networks. Physics Review E, 69(5), 051–915. doi:10.1103/PhysRevE.69.051915.CrossRefGoogle Scholar
  53. Firth, J. R. (1968b). A synopsis of linguistic theory. In F. R. Palmer (Ed.), Selected papers of J.R. Firth 1952–1959 (pp. 168–205). London: Longmans.Google Scholar
  54. Gamon, M. (2006) Graph-based text representation for novelty detection. In Proceedings of TextGraphs: The first workshop on graph based methods for natural language processing (pp. 17–24). New York City: Association for Computational Linguistics. url:http://www.aclweb.org/anthology/W/W06/W06-380.
  55. Gaume, B. (2008). Mapping the forms of meaning in small worlds. International Journal of Intelligence System, 23(7), 848–862.CrossRefGoogle Scholar
  56. Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of National Academic Science USA, 99(12), 7821–7826.MathSciNetMATHCrossRefGoogle Scholar
  57. Goldberg, A., Zhu, X. (2006). Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization. In Proceedings of TextGraphs: The first workshop on graph based methods for natural language processing (pp. 45–52). New York City: Association for Computational Linguistics. url:http://www.aclweb.org/anthology/W/W06/W06-380.
  58. Guimera, R., Mossa, S., Turtschi, A., & Amaral, L. A. N. (2005). The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proceedings of National Academic Science USA, 102, 7794–7799.Google Scholar
  59. Halliday, M., & Hasan, R. (1976). Cohesion in English. London: Longman.Google Scholar
  60. Harman, D. (1991). How effective is suffixing? JASIS, 42(1), 7–15.CrossRefGoogle Scholar
  61. Hassan, S., Banea, C. (2006). Random-walk term weighting for improved text classification. In Proceedings of TextGraphs: The first workshop on graph based methods for natural language processing (pp. 53–60). New York City: Association for Computational Linguistics. url:http://www.aclweb.org/anthology/W/W06/W06-380.
  62. Ho, N. D., & Fairon, C. (2004). Lexical similarity based on quantity of information exchanged—synonym extraction. In RIVF (pp. 193–198).Google Scholar
  63. Hoey, M. (1991). Patterns of lexis in text. Oxford, UK: Oxford University Press.Google Scholar
  64. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558.Google Scholar
  65. Hopfield, J. J., & Tank, D. W. (1986). Computing with neural circuits: A model. Science, 233, 625–633.CrossRefGoogle Scholar
  66. Huang, W. Y., & Lippmann, R. (1987). Neural net and traditional classifiers. In D. Z. Anderson (Ed.) NIPS (pp. 387–396). American Institue of Physics.Google Scholar
  67. Hughes, T., & Ramage, D. (2007). Lexical semantic relatedness with random graph walks. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL) (pp. 581–589). Prague, Czech Republic: Association for Computational Linguistics. url: http://www.aclweb.org/anthology/D/D07/D07-106.
  68. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., & Barabsi, A. L. (2000). The large-scale organization of metabolic networks. Nature, 407(6804), 651–654. doi: 10.1038/35036627 url:http://dx.doi.org/10.1038/3503662.Google Scholar
  69. Jespersen, O. (1929). The philosophy of grammar. London: Allen and Unwin.Google Scholar
  70. Joyce, T., & Miyake, M. (2008). Capturing the structures in association knowledge: Application of network analyses to large-scale databases of japanese word associations. In T. Tokunaga, A. Ortega (Eds.), Lecture notes in computer science (LKR) (Vol. 4938, pp. 116–131). Springer.Google Scholar
  71. Jung, J., Makoshi, N., & Akama, H. (2008). Associative language learning support applying graph clustering for vocabulary learning and improving associative ability. In ICALT (pp. 228–232). IEEE.Google Scholar
  72. Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 46(5), 604–632.MathSciNetMATHCrossRefGoogle Scholar
  73. Kleinberg, J. M. (2006). Social networks, incentives, and search. In: E. N. Efthimiadis, S. T. Dumais, D. Hawking, & K. Järvelin (Eds.), SIGIR (pp. 210–211). ACM.Google Scholar
  74. Knospe, W., Santen, L., Schadschneider, A., Schreckenberg, M. (2002). Single vehicle data of highway traffic: Microscopic description of traffic phases. Physical Review, E65, 056133.Google Scholar
  75. Konstas, I., Stathopoulos, V., & Jose, J. M. (2009). On social networks and collaborative recommendation. In Allan et al. (2009), pp. 195–202.Google Scholar
  76. Kozareva, Z., Riloff, E., & Hovy, E. (2008). Semantic class learning from the web with hyponym pattern linkage graphs. In Proceedings of ACL-08: HLT (pp. 1048–1056). Columbus, Ohio: Association for Computational Linguistics. url: http://www.aclweb.org/anthology/P/P08/P08-111.
  77. Kozima, H. (1993). Similarity between words computed by spreading activation on an english dictionary. In EACL (pp. 232–239).Google Scholar
  78. Kraaij, W., de Vries, A. P., Clarke, C. L. A., Fuhr, N., & Kando, N. (Eds.). (2007). SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. Amsterdam, The Netherlands: ACM. July 23–27.Google Scholar
  79. Krapivsky, P. L., Redner, S., & Leyvraz, F. (2000). Connectivity of growing random networks. Physical Review Letters, 85, 4629–4632.CrossRefGoogle Scholar
  80. Krovetz, R. (2000). Viewing morphology as an inference process. Artificial Intelligence, 118(1–2), 277–294.MATHCrossRefGoogle Scholar
  81. Kurland, O., & Lee, L. (2010). Pagerank without hyperlinks: Structural re-ranking using links induced by language models. In Baeza-Yates et al. (2005), pp. 306–313.Google Scholar
  82. Kwok, K. L. (2011) A neural network for probabilistic information retrieval. In Belkin and van Rijsbergen (1989), pp. 21–30.Google Scholar
  83. Latora, V., & Marchiori, M. (2001). Efficient behavior of small-world networks. Physical Review Letters, 87, 198701–198704.CrossRefGoogle Scholar
  84. Latora, V., & Marchiori, M. (2003). Economic small-world behaviour in weighted networks. European Physics Journal, B32, 249–263.Google Scholar
  85. Leicht, E. A., Holme, P., & Newman, M. E. J. (2006) Vertex similarity in networks. Physical Review E, (73).Google Scholar
  86. Lemke, N., Herédia, F., Barcellos, C. K., dos Reis, A. N., & Mombach, J. C. M. (2004). Essentiality and damage in metabolic networks. Bioinformatics, 20(1), 115–119.CrossRefGoogle Scholar
  87. Lempel, R., & Moran, S. (2001). SALSA: The stochastic approach for link-structure analysis. ACM Transaction on Informational System, 19(2), 131–160.CrossRefGoogle Scholar
  88. Li, W., & Cai, X. (2004). Statistical analysis of airport network of china. Physical Review, E69, 046106.Google Scholar
  89. Lin, X., Soergel, D., Marchionini, G. A self-organizing semantic map for information retrieval. In Bookstein et al. (1991), pp. 262–269.Google Scholar
  90. Lioma, C., & Blanco, R. (2009). Part of speech based term weighting for information retrieval. In: M. Boughanem, C. Berrut, J. Mothe, & C. Soulé-Dupuy (Eds.), ECIR, lecture notes in computer science (Vol. 5478, pp. 412–423). Springer.Google Scholar
  91. Lioma, C., & Van Rijsbergen, C. J. K. (2008). Part of speech n-grams and information retrieval. RFLA, 8, 9–22.Google Scholar
  92. Ma’ayan, A., Blitzer, R. D., & Iyengar, R. (2004). Toward predictive models of mammalian cells. Annual Review of Giophysics and Biomolecular Structure, 319–349.Google Scholar
  93. Ma’ayan, A., Jenkins, S. L., Neves, S., Hasseldine, A., Grace, E., Dubin-Thaler, et al. (2005). Formation of regulatory patterns during signal propagation in a mammalian cellular network. Science, 309(5737), 1078–1083.CrossRefGoogle Scholar
  94. Macleod, K. J., & Robertson, W. (1991). A neural algorithm for document clustering. Information Processing & Management, 27(4), 337–346.CrossRefGoogle Scholar
  95. Manning, C. D., & Schutze, H. (1999). Foundations of statistical language processing. London: The MIT Press.MATHGoogle Scholar
  96. Masucci, A. P., & Rodgers, G. J. (2006). Network properties of written human language. Physics Review E, 74(2), 026–102. doi:10.1103/PhysRevE.74.026102.CrossRefGoogle Scholar
  97. McCann, K., Hastings, A., & Huxel, G. R. (1998). Weak trophic interactions and the balance of nature. Nature, 395, 794–798.CrossRefGoogle Scholar
  98. Mehler, A. (2007). Large text networks as an object of corpus linguistic studies. In: Corpus linguistics. An international handbook of the science of language and society.Google Scholar
  99. Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order into texts. In EMNLP (pp. 404–411).Google Scholar
  100. Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., et al. (2004). Superfamilies of evolved and designed networks. Science, 303(5663), 1538–1542. url:http://dx.doi.org/10.1126/science.108916.Google Scholar
  101. Minkov, E., & Cohen, W. W. (2008). Learning graph walk based similarity measures for parsed text. In EMNLP (pp. 907–916). ACL.Google Scholar
  102. Minsky, M. L. (1969). Semantic information processing. Cambridge: The MIT Press.Google Scholar
  103. Mizzaro, S., & Robertson, S. (2007). Hits hits trec: exploring ir evaluation results with network analysis. In Kraaij et al. (2007), pp. 479–486Google Scholar
  104. Moore, C., & Newman, M. E. J. (2000). Epidemics and percolation in small-world networks. Physical Review, E61, 5678–5682.Google Scholar
  105. Motter, A. E., de Moura, A. P. S., Lai, Y. C., & Dasgupta, P. (2011). Topology of the conceptual network of language. Physics Review E, 65(6).Google Scholar
  106. Muller, P., Hathout, N., & Gaume, B. (2006). Synonym extraction using a semantic distance on a dictionary. In Proceedings of TextGraphs: The first workshop on graph based methods for natural language processing (pp. 65–72). New York City: Association for Computational Linguistics. url:http://www.aclweb.org/anthology/W/W06/W06-3811
  107. Nastase, V., Sayyad-Shirabad, J., Sokolova, M., & Szpakowicz, S. (2006). Learning noun-modifier semantic relations with corpus-based and wordnet-based features. In AAAI. AAAI PressGoogle Scholar
  108. Newman, M. E. J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the USA, 98(2), 404–409. doi:10.1073/pnas.021544898 url:http://dx.doi.org/10.1073/pnas.02154489.
  109. Newman, M. E. J. (2003). The structure and function of complex networks. Siam Review, 45, 167–256.MathSciNetMATHCrossRefGoogle Scholar
  110. Noh, T. G., Park, S. B., Yoon, H. G., Lee, S. J., & Park, S. Y. (2009). An automatic translation of tags for multimedia contents using folksonomy networks. In Allan et al. (2009), pp. 492–499.Google Scholar
  111. Ounis, I., Lioma, C., Macdonald, C., & Plachouras, V. (2007). Research directions in terrier: A search engine for advanced retrieval on the Web. Novatica/UPGRADE Special Issue on Web Information Access.Google Scholar
  112. Ozmutlu, S., Spink, A., & Ozmutlu, H. C. (2004). A day in the life of Web searching: An exploratory study. Information Processing & Management, 40(2), 319–345.CrossRefGoogle Scholar
  113. Pado, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linquistics, 33(2), 161–199.CrossRefGoogle Scholar
  114. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project. url: citeseer.ist.psu.edu/page98pagerank.html.Google Scholar
  115. Pastor-Satorras, R., & Vespignani, A. (2001). Epidemic spreading in scale-free networks. Physics Review Letter, 86(14), 3200–3203. doi:10.1103/PhysRevLett.86.3200.CrossRefGoogle Scholar
  116. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar
  117. Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). Wordnet: Similarity—Measuring the relatedness of concepts. In D. L. McGuinness, & G. Ferguson (Eds.) AAAI (pp. 1024–1025). AAAI Press/The MIT Press.Google Scholar
  118. Plaza, L., Daz, A., Gervs, P. (2008). Concept-graph based biomedical automatic summarization using ontologies. In Coling 2008: Proceedings of the 3rd textgraphs workshop on graph-based algorithms for natural language processing (pp. 53–56). Manchester, UK: Coling 2008 Organizing Committee. url:http://www.aclweb.org/anthology/W08-200.
  119. Polis, G. A. (1998). Ecology: Stability is woven by complex webs. Nature, 395, 744–745.CrossRefGoogle Scholar
  120. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR (pp. 275–281). ACM.Google Scholar
  121. Popescu, A. M., & Etzioni, O. (2005) Extracting product features and opinions from reviews. In HLT/EMNLP. The Association for Computational Linguistics.Google Scholar
  122. Ramage, D., Rafferty, A. N., & Manning, C. D. (2009). Random walks for text semantic similarity. In Proceedings of the 2009 workshop on graph-based methods for natural language processing (TextGraphs-4) (pp. 23–31). Suntec, Singapore: Association for Computational Linguistics. url:http://www.aclweb.org/anthology/W/W09/W09-3204
  123. Reynal, V. F., & Brainerd, C. J. (2005). Fuzzy processing in transitivity development. Annals of Operations Research, 23(1), 37–63.CrossRefGoogle Scholar
  124. Robertson, S., & Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society of Information Science, 27, 129–146.CrossRefGoogle Scholar
  125. Robertson, S., Walker, S., Beaulieu, M., Gatford, M., & Payne, A. (1995). Okapi at trec-4. In NIST Special Publication 500-236: TREC-4.Google Scholar
  126. Ruge, G. (1995). Human memory models and term association. In Fox, E. A., Ingwersen, P., Fidel, R. (Eds.), SIGIR (pp. 219–227). ACM Press.Google Scholar
  127. Scellato, S., Cardillo, A., Latora, V., & Porta, S. (2005). The backbone of a city. European Physics Journal B, 50(physics/0511063. 1–2), 221–225 (manuscript not submitted to the proceedings NEXT-SigmaPhi).Google Scholar
  128. Schenkel, R., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J. X., et al. (2008). Efficient top-k querying over social-tagging networks. In S. H. Myaeng, D. W. Oard, F. Sebastiani, T. S. Chua, & M. K. Leong (Eds.), SIGIR (pp. 523–530). ACM.Google Scholar
  129. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In International conference on new methods in language processing (pp. 44–49).Google Scholar
  130. Schütze, H., & Pedersen, J. O. (1995). Information retrieval based on word senses. In Symposium on document analysis and information retrieval (pp. 161–175).Google Scholar
  131. Sigman, M., & Cecchi, G. A. (2002). Global organization of the WordNet lexicon. Proceedings of the National Academy of Sciences 3(99), 1742–1747.Google Scholar
  132. Sigurd, B., Eeg-Olofsson, M., van de Weijer, J., Eeg-Olofsson, M., & van de Weijer, J. (2004). Word length, sentence length and frequency: Zipf’s law revisited. Studia Linguistica, 58(1), 37–52.CrossRefGoogle Scholar
  133. Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.Google Scholar
  134. Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Engineer Bulletin, 24(4), 35–43.Google Scholar
  135. Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In: H. P. Frei, D. Harman, P. Schäuble, & R. Wilkinson (Eds.), SIGIR (pp. 21–29). ACM.Google Scholar
  136. Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In SIGIR (pp. 21–29)Google Scholar
  137. Sinha, S., Pan, R. K., Yadav, N., Vahia, M., & Mahadevan, I. (2009). Network analysis reveals structure indicative of syntax in the corpus of undeciphered indus civilization inscriptions. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing (TextGraphs-4) (pp. 5–13). Suntec, Singapore: Association for Computational Linguistics. url:http://www.aclweb.org/anthology/W/W09/W09-3202
  138. Soares, M. M., Corso, C., & Lucena, L. S. (2005). Network of syllables in portuguese. Physica A: Statistical Mechanics and its Applications, 355(2–4), 678–684. doi:10.1016/j.physa.2005.03.017. url:http://www.isrl.uiuc.edu/amag/langev/paper/soares05networkOfSyllables.htm.
  139. Somasundaran, S., Namata, G., Getoor, L., & Wiebe, J. (2009). Opinion graphs for polarity and discourse classification. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing (TextGraphs-4) (pp. 66–74). Suntec, Singapore: Association for Computational Linguistics. url:http://www.aclweb.org/anthology/W/W09/W09-321.
  140. Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.CrossRefGoogle Scholar
  141. Sporns, O. (2002). Network analysis, complexity, and brain function. Complexity, 8(1), 56–60.MathSciNetCrossRefGoogle Scholar
  142. Sporns, O., Tononi, G., Edelman, G. M. (2002). Theoretical neuroanatomy and the connectivity of the cerebral cortex. Behavioural Brain Research, 135, 69–74.CrossRefGoogle Scholar
  143. Steyvers, M., & Tenenbaum, J. (2005). The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science, 1(29), 41–78.CrossRefGoogle Scholar
  144. Takamura, H., Inui, T., & Okumura, M. (2007). Extracting semantic orientations of phrases from dictionary. In C. L. Sidner, T. Schultz, M. Stone, & C. Zhai (Eds.), HLT-NAACL (pp. 292–299). The Association for Computational Linguistics.Google Scholar
  145. Turtle, H. R., & Croft, W. B. (1991). Evaluation of an inference network-based retrieval model. ACM Transaction on Information System, 9(3), 187–222.CrossRefGoogle Scholar
  146. Véronis, J., & Ide, N. (1990). Word sense disambiguation with very large neural networks extracted from machine readable dictionaries. In COLING (pp. 389–394).Google Scholar
  147. Vitevitch, M. S., & Rodrguez, E. (2005). Neighborhood density effects in spoken word recognition in spanish. Journal of Multilingual Communication Disorders, 3, 64–73.CrossRefGoogle Scholar
  148. Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and evaluation in information retrieval. MIT Press. url:http://trec.nist.gov/.
  149. Wagner, A., & Fell, D. A. (2001). The small world inside large metabolic networks. Proceedings of the Royal Society of London Series B Biological Sciences, 268, 1803–1810.Google Scholar
  150. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (structural analysis in the social sciences). New York: Cambridge University Press.Google Scholar
  151. Watts, D., & Strogatz, S. H. (1998). Collective dynamics of ’small-world’ networks. Nature, 393, 440–442.CrossRefGoogle Scholar
  152. Widdows, D., & Dorow, B. (2002). A graph model for unsupervised lexical acquisition. In COLING.Google Scholar
  153. Wilkinson, R., & Hingston, P. (1991). Using the cosine measure in a neural network for document. In Bookstein et al. (1991), pp. 202–210.Google Scholar
  154. Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., et al. (2005) . Improving web search results using affinity graph. In Baeza-Yates et al. (2005), pp. 504–511.Google Scholar
  155. Zhou, D., Schölkopf, B., & Hofmann, T. (2004). Semi supervised learning on directed graphs. In NIPS.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Computer Science DepartmentUniversity of A CoruñaA CoruñaSpain
  2. 2.Computer Science DepartmentStuttgart UniversityStuttgartGermany

Personalised recommendations