Using Suffix Arrays for Efficiently Recognition of Named Entities in Large Scale

  • Benjamin Adrian
  • Sven Schwarz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6882)


In this paper, we present an efficient comparison of text and RDF data for recognizing named entities. Here, a named entity is a text sequence that refers to a URI reference within an RDF graph. We present suffix arrays as representation format for text and a relational database scheme to represent Semantic Web data. Using these representation facilities performs a named entity recognition in linear time complexity and without the requirement to hold names of existing entities in memory. Both is needed to implement a named entity recognition on the scale of for instance the DBpedia database.


Noun Phrase Resource Description Framework Entity Recognition Suffix Array String Comparison 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Manola, F., Miller, E., McBride, B.: RDF Primer. Technical report, World Wide Web Consortium (February 2004)Google Scholar
  2. 2.
    Adida, B., Herman, I., Sporny, M., Birbeck, M.: RDFa 1.1 Primer, rich structured data markup for web documents. Technical report, World Wide Web Consortium (March 2011)Google Scholar
  3. 3.
    Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science 36(3), 306–323 (2010)CrossRefGoogle Scholar
  4. 4.
    Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving gate to meet new challenges in language engineering. Nat. Lang. Eng. 10(3-4), 349–373 (2004)CrossRefGoogle Scholar
  5. 5.
    Tori, A.: Zemanta service. Zemanta (2008)Google Scholar
  6. 6.
    Nigam, K., Lafferty, J., Mccallum, A.: Using Maximum Entropy for Text Classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)Google Scholar
  7. 7.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc, San Francisco (2001)Google Scholar
  8. 8.
    McCallum, A.K.: Mallet: A machine learning for language toolkit (2002)Google Scholar
  9. 9.
    Zhang, T., Damerau, F., Johnson, D.: Text chunking using regularized winnow. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 539–546. Association for Computational Linguistics, Stroudsburg (2001)Google Scholar
  10. 10.
    Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53, 918–936 (2006)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Benjamin Adrian
    • 1
  • Sven Schwarz
    • 1
  1. 1.Knowledge Management DepartmentDFKI GmbHKaiserslauternGermany

Personalised recommendations