Chinese Standard Comparative Sentence Recognition and Extraction Research

  • Liqiang Xing
  • Lu Liu
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 218)


Information extraction is the first and foremost important task of Standard Knowledge Mining. The paper focuses on comparative sentence recognition and extraction. There are three steps, respectively Comparative Sentence Recognition, Technical Index Parameter Recognition and Technical Index Extraction. At first, we search the standard by keywords from a feature set lists in order to categorize the documents into its specific class. In addition, we build the regular expression to spot technical index parameter. Lastly, we treat the technical index extraction as a sequence labelling problem and treat the keyword, noun phrases, and theirs position as features training by CRF model. The final experiments show that the result performs very well in standard document comparative sentence recognition and extraction.


Standard knowledge mining Comparative sentence recognition Technical index parameter recognition Technical index extraction CRF model 


  1. 1.
    Jindal N, Liu B (2006) Identifying comparative sentences in text documents. SIGIR 10:244–251Google Scholar
  2. 2.
    Jindal N, Liu B (2006) Mining comparative sentences and relations. AAAI 22:1331–1336Google Scholar
  3. 3.
    Feldman R, Fresko M, Goldenberg J, Netzer O, Ungar LH (2007) Extracting product comparisons from discussion boards. ICDM 27:469–474Google Scholar
  4. 4.
    Luo G, Tang C, Tian Y (2007) Answering relationship queries on the web. WWW 13:561–570CrossRefGoogle Scholar
  5. 5.
    Li S, Lin CY, Song YI, Li Z (2010) Comparable entity mining from comparative questions. Proc ACL 21:650–658Google Scholar
  6. 6.
    Huang X, Wan X, Yang J (2008) Chinese comparative research. Chin Inform J 22(5):30–37Google Scholar
  7. 7.
    Huang G, Yao T, Liu Q (2010) Chinese comparative sentence recognition and relation extraction based on CRF. Appl Res Comput 27(6):2061–2064Google Scholar
  8. 8.
    McCallum AK (2002) A machine learning for language toolkit. MALLET 15:131–136. doi: Google Scholar
  9. 9.
    Yang S, Ko Y (2011) Extracting comparative entities and predicates from texts using comparative type classification, vol 21. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 1636–1644Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.School of Economics and ManagementBeihang UniversityBeijingChina
  2. 2.China National Institute of StandardizationBeijingChina

Personalised recommendations