Advertisement

Analysis of Implicit Relations on Wikipedia: Measuring Strength through Mining Elucidatory Objects

  • Xinpeng Zhang
  • Yasuhito Asano
  • Masatoshi Yoshikawa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5981)

Abstract

We focus on measuring relations between pairs of objects in Wikipedia whose pages can be regarded as individual objects. Two kinds of relations between two objects exist: in Wikipedia, an explicit relation is represented by a single link between the two pages for the objects, and an implicit relation is represented by a link structure containing the two pages. Previously proposed methods are inadequate for measuring implicit relations because they use only one or two of the following three important factors: distance, connectivity, and co-citation. We propose a new method reflecting all the three factors by using a generalized maximum flow. We confirm that our method can measure the strength of a relation more appropriately than these previously proposed methods do. Another remarkable aspect of our method is mining elucidatory objects, that is, objects constituting a relation. We explain that mining elucidatory objects opens a novel way to deeply understand a relation.

Keywords

link analysis generalized flow Wikipedia mining relation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Koren, Y., North, S.C., Volinsky, C.: Measuring and extracting proximity in networks. In: Proc. of 12th ACM SIGKDD Conference, pp. 245–255 (2006)Google Scholar
  2. 2.
    Ito, M., Nakayama, K., Hara, T., Nishio, S.: Association thesaurus construction methods based on link co-occurrence analysis for wikipedia. In: CIKM, pp. 817–826 (2008)Google Scholar
  3. 3.
    Nakayama, K., Hara, T., Nishio, S.: Wikipedia mining for an association web thesaurus construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, pp. 322–334. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall, New Jersey (1993)Google Scholar
  5. 5.
    Wayne, K.D.: Generalized Maximum Flow Algorithm. PhD thesis, Cornell University, New York, U.S. (January 1999)Google Scholar
  6. 6.
    Cilibrasi, R.L., Vitányi, P.M.B.: The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)CrossRefGoogle Scholar
  7. 7.
    Kasneci, G., Suchanek, F.M., Ifrim, G., Ramanath, M., Weikum, G.: Naga: Searching and ranking knowledge. In: Proc. of 24th ICDE, pp. 953–962 (2008)Google Scholar
  8. 8.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proc. of 16th WWW, pp. 697–706 (2007)Google Scholar
  9. 9.
    Erdös Number: The Erdös number project, http://www.oakland.edu/enp/
  10. 10.
    Lu, W., Janssen, J., Milios, E., Japkowicz, N., Zhang, Y.: Node similarity in the citation graph. Knowledge and Information Systems 11(1), 105–129 (2006)CrossRefGoogle Scholar
  11. 11.
    White, H.D., Griffith, B.C.: Author cocitation: A literature measure of intellectual structure. JASIST 32(3), 163–171 (1981)CrossRefGoogle Scholar
  12. 12.
    Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links (2008)Google Scholar
  13. 13.
    Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proc. of 8th ACM SIGKDD Conference, pp. 538–543 (2002)Google Scholar
  14. 14.
    Hubbell, C.H.: An input-output approach to clique identification. Sociolmetry 28, 277–299 (1965)Google Scholar
  15. 15.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)zbMATHCrossRefGoogle Scholar
  16. 16.
    Wasserman, S., Faust, K.: Social Network Analysis: Methods and Application (Structural Analysis in the Social Sciences). Cambridge University Press, New York (1994)Google Scholar
  17. 17.
    Faloutsos, C., Mccurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: Proc. of 10th ACM SIGKDD Conference, pp. 118–127 (2004)Google Scholar
  18. 18.
    Doyle, P.G., Snell, J.L.: Random Walks and Electric Networks, vol. 22. Mathematical Association America, New York (1984)zbMATHGoogle Scholar
  19. 19.
    Tong, H., Faloutsos, C.: Center-piece subgraphs: Problem definition and fast solutions. In: Proc. of 12th ACM SIGKDD Conference, pp. 404–413 (2006)Google Scholar
  20. 20.
    Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.R.: Statsnowball: a statistical approach to extracting entity relationships. In: WWW, pp. 101–110 (2009)Google Scholar
  21. 21.
    Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: Proc. of 28th SIGIR, pp. 130–137 (2005)Google Scholar
  22. 22.
    Gracia, J., Mena, E.: Web-based measure of semantic relatedness. In: Bailey, J., Maier, D., Schewe, K.-D., Thalheim, B., Wang, X.S. (eds.) WISE 2008. LNCS, vol. 5175, pp. 136–150. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: The WordSimilarity-353 Test Collection (2002)Google Scholar
  24. 24.
    Coutsoukis, P.: Country ranks (2009), http://www.photius.com/rankings/index.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Xinpeng Zhang
    • 1
  • Yasuhito Asano
    • 1
  • Masatoshi Yoshikawa
    • 1
  1. 1.Kyoto UniversityKyotoJapan

Personalised recommendations