Advertisement

Classification of Text Processing Components: The Tesla Role System

  • Jürgen Hermes
  • Stephan Schwiebert
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

The modeling of component interactions represents a major challenge in designing component systems. In most cases, the components in such systems interact via the results they produce. This approach results in two conflicting requirements that have to be satisfied. On the one hand, the interfaces between the components are subject to exact specifications. On the other hand, however, the component interfaces should not be excessively restricted as this might require the data produced by the components to be converted into the system’s data format. This might pose certain difficulties if complex data types (e.g., graphs or matrices) have to be stored as they often require non-trivial access methods that are not supported by a general data format.

The approach introduced in this paper tries to overcome this dilemma by meeting both demands: A role system is a generic way that enables text processing components to produce highly specific results. The role concept described in this paper has been adopted by the Tesla (Text Engineering Software Laboratory) framework.

Keywords

Component framework Text engineering Text mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

We would like to thank Maryia Fedzechkina and Sonja Subicin for their help.

References

  1. Altschul, S. F. , Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410.Google Scholar
  2. Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., & Liberman, M. (1999). Atlas: A flexible and extensible architecture for linguistic annotation. Technical report, NIST, 1999.Google Scholar
  3. Cunningham, H., & Bontcheva, K. (2006). Computational language systems, architectures. In K. Brown, A. H. Anderson, L. Bauer, M. Berns, G. Hirst, & J. Miller (Eds.), The encyclopedia of language and linguistics (2nd ed.). Munich: Elsevier.Google Scholar
  4. Feldman, R., & Sanger, J. (2006). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  5. Götz, T., & Suhre, O. (2004). Design and implementation of the uima common analysis system. IBM Systems Journal, 43(3), 476–489.CrossRefGoogle Scholar
  6. Hahn, U., Buyko, E., Tomanek, K., Piao, S., Tsuruoka, Y., McNaught J., et al. (2007). An uima annotation type system for a generic text mining architecture. In UIMA-Workshop, GLDV Conference, 2007.Google Scholar
  7. Hamlet, D., Mason, D., & Woit, D. (1991). Theory of software reliability based on components. In Proceedings ICSE ‘01, pages 361–370. IEEE Computer Society, 2001.Google Scholar
  8. Harris, Z. S. (1951). Methods in structural linguistics. Chicago: University of Chicago Press.Google Scholar
  9. Kondrak, G. (2002). Algorithms for language reconstruction. PhD thesis, Department of Computer Science, University of Toronto, Toronto, ON, Canada, July 2002.Google Scholar
  10. Szyperski, C. (1998). Component software. Reading, MA: Addison-Wesley.Google Scholar
  11. van Gurp J., & Bosch, J. (2002). Role-based component engineering. In M. Larsson, & I. Crnkovic (Eds.), Building reliable component-based systems. Norwood, MA: Artech House.Google Scholar
  12. van Zaanen, M. (1999). Bootstrapping structure using similarity. In P. Monachesi (Ed.), Computational Linguistics in The Netherlands 1999 – Selected Papers from the Tenth CLIN Meeting; Utrecht, The Netherlands, pages 235–245, Utrecht, The Netherlands, 1999.Google Scholar
  13. van Zaanen, M., & Geertzen, J. (2006). Grammatical inference for syntax-based statistical machine translation. In Y. Sakakibara, S. Kobayashi, K. Sato, T. Nishino, & E. Tomita (Eds.), Eighth International Colloquium on Grammatical Inference, (ICGI), Tokyo, Japan, number 4201 in Lecture Notes in AI, pages 356–358. Berlin: Springer.Google Scholar
  14. Veronis, J., & Ide, N. (1996). Considerations for the reusability of linguistic software. Technical report, EAGLES, April 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Linguistic Data Processing, Department of LinguisticsUniversity of CologneCologneGermany

Personalised recommendations