Classification of Text Processing Components: The Tesla Role System
The modeling of component interactions represents a major challenge in designing component systems. In most cases, the components in such systems interact via the results they produce. This approach results in two conflicting requirements that have to be satisfied. On the one hand, the interfaces between the components are subject to exact specifications. On the other hand, however, the component interfaces should not be excessively restricted as this might require the data produced by the components to be converted into the system’s data format. This might pose certain difficulties if complex data types (e.g., graphs or matrices) have to be stored as they often require non-trivial access methods that are not supported by a general data format.
The approach introduced in this paper tries to overcome this dilemma by meeting both demands: A role system is a generic way that enables text processing components to produce highly specific results. The role concept described in this paper has been adopted by the Tesla (Text Engineering Software Laboratory) framework.
KeywordsComponent framework Text engineering Text mining
Unable to display preview. Download preview PDF.
We would like to thank Maryia Fedzechkina and Sonja Subicin for their help.
- Altschul, S. F. , Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410.Google Scholar
- Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., & Liberman, M. (1999). Atlas: A flexible and extensible architecture for linguistic annotation. Technical report, NIST, 1999.Google Scholar
- Cunningham, H., & Bontcheva, K. (2006). Computational language systems, architectures. In K. Brown, A. H. Anderson, L. Bauer, M. Berns, G. Hirst, & J. Miller (Eds.), The encyclopedia of language and linguistics (2nd ed.). Munich: Elsevier.Google Scholar
- Hahn, U., Buyko, E., Tomanek, K., Piao, S., Tsuruoka, Y., McNaught J., et al. (2007). An uima annotation type system for a generic text mining architecture. In UIMA-Workshop, GLDV Conference, 2007.Google Scholar
- Hamlet, D., Mason, D., & Woit, D. (1991). Theory of software reliability based on components. In Proceedings ICSE ‘01, pages 361–370. IEEE Computer Society, 2001.Google Scholar
- Harris, Z. S. (1951). Methods in structural linguistics. Chicago: University of Chicago Press.Google Scholar
- Kondrak, G. (2002). Algorithms for language reconstruction. PhD thesis, Department of Computer Science, University of Toronto, Toronto, ON, Canada, July 2002.Google Scholar
- Szyperski, C. (1998). Component software. Reading, MA: Addison-Wesley.Google Scholar
- van Gurp J., & Bosch, J. (2002). Role-based component engineering. In M. Larsson, & I. Crnkovic (Eds.), Building reliable component-based systems. Norwood, MA: Artech House.Google Scholar
- van Zaanen, M. (1999). Bootstrapping structure using similarity. In P. Monachesi (Ed.), Computational Linguistics in The Netherlands 1999 – Selected Papers from the Tenth CLIN Meeting; Utrecht, The Netherlands, pages 235–245, Utrecht, The Netherlands, 1999.Google Scholar
- van Zaanen, M., & Geertzen, J. (2006). Grammatical inference for syntax-based statistical machine translation. In Y. Sakakibara, S. Kobayashi, K. Sato, T. Nishino, & E. Tomita (Eds.), Eighth International Colloquium on Grammatical Inference, (ICGI), Tokyo, Japan, number 4201 in Lecture Notes in AI, pages 356–358. Berlin: Springer.Google Scholar
- Veronis, J., & Ide, N. (1996). Considerations for the reusability of linguistic software. Technical report, EAGLES, April 1996.Google Scholar