An Efficient Tool for Syntactic Processing of English Query Text

  • Sanjay Chatterji
  • G. S. Sreedhara
  • Maunendra Sankar Desarkar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8891)

Abstract

A large amount of work has been done on syntactic analysis of English texts. But, for analyzing the short phrases without any structured contexts like capitalization, subject-object-verb order, etc. these techniques are not yet proved to be appropriate. In this paper we have attempted the syntactic analysis of the phrases where contextual information is not available. We have developed stemmer, POS tagger, chunker and Named Entity tagger for English short phrases like chats, messages, and queries, using root dictionary and language specific rules. We have evaluated the technique on English queries and observed that our system outperforms some commonly used NLP tools.

Keywords

Stemming Parts-of-Speech Chunk Named Entity Trie Short text analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Folk, M.J., Zoellick, B.: File structures. Addison-Wesley, Reading (1992)MATHGoogle Scholar
  2. 2.
    Knuth, D.E.: The art of computer programming, 3rd edn. Sorting and Searching, vol. iii. Addison & Wesley, Reading (1998)Google Scholar
  3. 3.
    Morrison, D.R.: Patricia-practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM (JACM) 15(4), 514–534 (1968)CrossRefGoogle Scholar
  4. 4.
    Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)Google Scholar
  5. 5.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)Google Scholar
  6. 6.
    Marcus, M., Santorini, B., Marcinkiewicz, M., Taylor, A.: Treebank–3 (tech. rep.). Linguistic Data Consortium, Philadelphia (1999)Google Scholar
  7. 7.
    Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting on Association for Computational Linguistics. Citeseer (2013)Google Scholar
  8. 8.
    Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086 (2011)Google Scholar
  9. 9.
    Martins, A.F., Almeida, M., Smith, N.A.: Turning on the turbo: Fast third-order non-projective turbo parsers. In: ACL (2), pp. 617–622 (2013)Google Scholar
  10. 10.
    Martins, A.F., Das, D., Smith, N.A., Xing, E.P.: Stacking dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 157–166. Association for Computational Linguistics (2008)Google Scholar
  11. 11.
    Mohit, B., Schneider, N., Bhowmick, R., Oflazer, K., Smith, N.A.: Recall-oriented learning of named entities in arabic wikipedia. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 162–173. Association for Computational Linguistics (2012)Google Scholar
  12. 12.
    Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)CrossRefGoogle Scholar
  13. 13.
    Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992)Google Scholar
  14. 14.
    Hafer, M.A., Weiss, S.F.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10(11), 371–385 (1974)CrossRefGoogle Scholar
  15. 15.
    Mitra, M., Buckley, C., Singhal, A., Cardie, C., et al.: An analysis of statistical and syntactic phrases. In: RIAO, vol. 97, pp. 200–214 (1997)Google Scholar
  16. 16.
    Gey, F.C., Chen, A.: Phrase discovery for english and cross-language retrieval at trec 6. NIST SPECIAL PUBLICATION SP, 637–648 (1998)Google Scholar
  17. 17.
    Strzalkowski, T., Lin, F., Perez-Carballo, J., Wang, J.: Natural language information retrieval trec-6 report. In: TREC, pp. 347–366. Citeseer (1997)Google Scholar
  18. 18.
    De Lima, E.F., Pedersen, J.O.: Phrase recognition and expansion for short, precision-biased queries based on a query log. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 145–152. ACM (1999)Google Scholar
  19. 19.
    Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 385–396. International World Wide Web Conferences Steering Committee (2014)Google Scholar
  20. 20.
    Sedgewick, R.: Algorithms in Java, Parts 1-4. Addison-Wesley Professional (2002)Google Scholar
  21. 21.
    Hatcher, E., Gospodnetic, O., McCandless, M.: Lucene in action (2004)Google Scholar
  22. 22.
    Solanki, K., Sarkar, A., Manjunath, B.S.: YASS: Yet another steganographic scheme that resists blind steganalysis. In: Furon, T., Cayre, F., Doërr, G., Bas, P. (eds.) IH 2007. LNCS, vol. 4567, pp. 16–31. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Roth, D., Zelenko, D.: Part of speech tagging using a network of linear separators. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 1136–1142. Association for Computational Linguistics (1998)Google Scholar
  24. 24.
    Li, X., Roth, D.: Exploring evidence for shallow parsing. In: Proceedings of the 2001 Workshop on Computational Natural Language Learning, vol. 7, p. 6. Association for Computational Linguistics (2001)Google Scholar
  25. 25.
    Li, X., Morie, P., Roth, D.: Robust reading: Identification and tracing of ambiguous names. Technical report, DTIC Document (2004)Google Scholar
  26. 26.
    Van Rijsbergen, C.: An algorithm for information structuring and retrieval. The Computer Journal 14(4), 407–412 (1971)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Sanjay Chatterji
    • 1
  • G. S. Sreedhara
    • 1
  • Maunendra Sankar Desarkar
    • 1
  1. 1.Samsung R&D Institute IndiaBangaloreIndia

Personalised recommendations