Skip to main content

Detecting Synonymous Predicates from Online Encyclopedia with Rich Features

  • Conference paper
  • First Online:
  • 869 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9994))

Abstract

The integration of Linked Open Data faces great challenges on the semantic level, despite unified data models. Inappropriate use of ontology concepts, namely predicates, impedes knowledge discovery. Although predicate unification is one of the most crucial steps when building structured knowledge base, little effort has been put forward. In this paper, we propose a supervised approach to detect synonymous predicates. Our detection focuses on feature selection and their effectiveness analysis. We not only leverage different resources such as Wikipedia, Freebase, but also use different word embeddings to represent predicates. The experimental results indicate that wikitext defined by Wikipedia and predicate surface form are most useful features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Based on Chinese Wikipedia web pages in August 2014.

  2. 2.

    Available at https://dumps.wikimedia.org/zhwiki/.

  3. 3.

    The linking property in Freebase rdf dump is Wikipedia.zh-cn_id while the Freebase category predicate is rdf:type.

  4. 4.

    https://zh.wikipedia.org/wiki?curid=472824.

  5. 5.

    http://www.freebase.com/m/03cp9fl.

  6. 6.

    The version of Freebase used in the experiment is 2013-06-02 (1.37 billion triples). We collected categories of 337042 entities in Freebase.

References

  1. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)

    Google Scholar 

  2. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7, 154–165 (2009)

    Article  Google Scholar 

  3. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)

    Google Scholar 

  4. Wu, F., Hoffmann, R., Weld, D.S.: Information extraction from wikipedia: moving down the long tail. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 731–739. ACM (2008)

    Google Scholar 

  5. Tan, C.H., Agichtein, E., Ipeirotis, P., Gabrilovich, E.: Trust, but verify: predicting contribution quality for knowledge base construction and curation. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 553–562. ACM (2014)

    Google Scholar 

  6. Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 313–320. IEEE (2001)

    Google Scholar 

  7. Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endow. 1, 538–549 (2008)

    Article  Google Scholar 

  8. Abedjan, Z., Naumann, F.: Synonym analysis for predicate expansion. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 140–154. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38288-8_10

    Chapter  Google Scholar 

  9. Baroni, M., Bisi, S.: Using cooccurrence statistics and the web to discover synonyms in a technical language. In: LREC (2004)

    Google Scholar 

  10. Wei, X., Peng, F., Tseng, H., Lu, Y., Dumoulin, B.: Context sensitive synonym discovery for web search queries. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1585–1588. ACM (2009)

    Google Scholar 

  11. Harris, Z.S.: Distributional structure. Word 10, 146 (1954)

    Google Scholar 

  12. Naumann, F., Ho, C.T., Tian, X., Haas, L.M., Megiddo, N.: Attribute classification using feature analysis. In: ICDE, vol. 271 (2002)

    Google Scholar 

  13. Li, W.S., Clifton, C.: Semint: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33, 49–84 (2000)

    Article  MATH  Google Scholar 

  14. Denoyer, L., Gallinari, P.: The wikipedia XML corpus. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 12–19. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73888-6_2

    Chapter  Google Scholar 

  15. Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 41–50. ACM (2007)

    Google Scholar 

  16. Wu, F., Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: Proceedings of the 17th International Conference on World Wide Web, pp. 635–644. ACM (2008)

    Google Scholar 

  17. Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National High Technology R&D Program of China (Grant No. 2015AA015403, 2014AA015102), Natural Science Foundation of China (Grant No. 61202233, 61272344, 61370055) and the joint project with IBM Research. Any correspondence please refer to Yansong Feng.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhe Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Han, Z., Feng, Y., Zhao, D. (2016). Detecting Synonymous Predicates from Online Encyclopedia with Rich Features. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48051-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48050-3

  • Online ISBN: 978-3-319-48051-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics