Skip to main content

Linguistic Computing with UNIX Tools

  • Chapter
Natural Language Processing and Text Mining

Abstract

This chapter presents an outline of applications to language analysis that open up through the combined use of two simple yet powerful programming languages with particularly short descriptions: sed and awk. We shall demonstrate how these two UNIX1 tools can be used to implement small, useful and customized applications ranging from text-formatting and text-transforming to sophisticated linguistic computing. Thus, the user becomes independent of sometimes bulky software packages which may be difficult to customize for particular purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Abramson, S. Bhalla, K.T. Christianson, J.M. Goodwin, J.R. Goodwin, J. Sarraille (1995): Towards CD-ROM based Japanese ? English dictionaries: Justification and some implementation issues. In: Proc. 3rd Natural Language Processing Pacific-Rim Symp. (Dec. 4–6, 1995), Seoul, Korea

    Google Scholar 

  2. H. Abramson, S. Bhalla, K.T. Christianson, J.M. Goodwin, J.R. Goodwin, J. Sarraille, L.M. Schmitt (1996): Multimedia, multilingual hyperdictionaries: A Japanese ? English example. Paper presented at the Joint Int. Conf. Association for Literary and Linguistic Computing and Association for Computers and the Humanities (June 25–29, 1996, Bergen, Norway, available from the authors

    Google Scholar 

  3. H. Abramson, S. Bhalla, K.T. Christianson, J.M. Goodwin, J.R. Goodwin, J. Sarraille, L.M. Schmitt (1996): The Logic of Kanji lookup in a Japanese ? English hyperdictionary. Paper presented at the Joint Int. Conf. Association for Literary and Linguistic Computing and Association for Computers and the Humanities (June 25–29, 1996, Bergen, Norway, available from the authors

    Google Scholar 

  4. A.V. Aho, B.W. Kernighan, P.J. Weinberger (1978): awk — A Pattern Scanning and Processing Language (2nd ed.). In: B.W. Kernighanm, M.D. McIlroy (eds.), UNIX programmer’s manual (7th ed.), Bell Labs, Murray Hill, http://cm.bell-labs.com/7thEdMan/vol2/awk

    Google Scholar 

  5. A.V. Aho, B.W. Kernighan, P.J. Weinberger (1988): The AWK programming language. Addison-Wesley, Reading, MA

    MATH  Google Scholar 

  6. B.T.S. Atkins (1992): Acta Linguistica Hungarica 41:5–71

    Google Scholar 

  7. J. Burstein, D. Marcu (2003): Computers and the Humanities 37:455–467

    Article  Google Scholar 

  8. C. Butler (1985): Computers in linguistics. Basil Blackwell, Oxford

    Google Scholar 

  9. K.T. Christianson (1997): IRAL 35:99–113

    Article  Google Scholar 

  10. K. Church (1990): Unix for Poets. Tutorial at 13th Int. Conf. on Computational Linguistics, COLING-90 (August 20–25, 1990), Helsinki, Finland, http://www.ling.lu.se/education/homepages/LIS131/unix-for-poets.pdf

    Google Scholar 

  11. W.F. Clocksin, C.S. Mellish (1981): Programming in Prolog. Springer, Berlin

    Google Scholar 

  12. A. Collier (1993): Issues of large-scale collocational analysis. In: J. Aarts, P. De Haan, and N. Oostdijk (eds.), English language corpora: Design, analysis and exploitation, Editions Rodopi, B.V., Amsterdam

    Google Scholar 

  13. A. Coxhead (2000): TESOL Quarterly 34:213–238

    Article  Google Scholar 

  14. A. Coxhead (2005): Academic word list. Retrieved Nov. 30, 2005, http://www.vuw.ac.nz/lals/research/awl/

    Google Scholar 

  15. A. Fox (1995): Linguistic Reconstruction: An Introduction to Theory and Method. Oxford Univ. Press, Oxford

    Google Scholar 

  16. P.G. Ganssler, W. Stute (1977): Wahrscheinlichkeitstheorie. Springer, Berlin

    Google Scholar 

  17. GNUPLOT 4.0. Gnuplot homepage, http://www.gnuplot.info

    Google Scholar 

  18. J.D. Goldfield (1986): An Approach to Literary Computing in French. In: Méthodes quantitatives et informatiques dans l’étude des textes, Slatkin-Champion, Geneva

    Google Scholar 

  19. M. Gordon (1996): What does a language’s lexicon say about the company it keeps?: A slavic case study. Paper presented at Annual Michigan Linguistics Soc. Meeting (October 1996), Michigan State Univ., East Lansing, MI

    Google Scholar 

  20. W. Greub (1981): Linear Algebra. Springer, Berlin

    Google Scholar 

  21. S. Hockey, J. Martin (1988): The Oxford concordance program: User’s manual (Ver. 2). Oxford Univ. Computing Service, Oxford

    Google Scholar 

  22. M. Hoey (1991): Patterns of lexis in text. Oxford Univ. Press, Oxford

    Google Scholar 

  23. A.G. Hume, M.D. McIlroy (1990): UNIX programmer’s manual (10th ed.). Bell Labs, Murray Hill

    Google Scholar 

  24. K. Hyland (1997): J. Second Language Writing 6:183–205

    Article  Google Scholar 

  25. S.C. Johnson (1978): Yacc: Yet another compiler-compiler. In: B.W. Kernighan, M.D. McIlroy (eds.), UNIX programmer’s manual (7th ed.), Bell Labs, Murray Hill, http://cm.bell-labs.com/7thEdMan/vol2/yacc.bun

    Google Scholar 

  26. G. Kaye (1990): A corpus builder and real-time concordance browser for an IBM PC. In: J. Aarts, W. Meijs (eds.), Theory and practice in corpus linguistics, Editions Rodopi, B.V., Amsterdam

    Google Scholar 

  27. P. Kaszubski (1998): Enhancing a writing textbook: a nationalist perspective. In: S. Granger (ed.), Learner English on Computer, Longman, London

    Google Scholar 

  28. G. Kennedy (1991): Between and through: The company they keep and the functions they serve. In: K. Aijmer, B. Altenberg (eds.), English corpus linguistics, Longman, New York

    Google Scholar 

  29. B.W. Kernighan, M.D. McIlroy (1978): UNIX programmer’s manual (7th ed.). Bell Labs, Murray Hill

    Google Scholar 

  30. B.W. Kernighan, R. Pike (1984): The UNIX programming environment. Prentice Hall, Englewood Cliffs, NJ

    Google Scholar 

  31. B.W. Kernighan, D.M. Ritchie (1988): The C programming language. Prentice Hall, Englewood Cliffs, NJ

    Google Scholar 

  32. G. Kjellmer (1989): Aspects of English collocation. In: W. Meijs (ed.), Corpus linguistics and beyond, Editions Rodopi, B.V., Amsterdam

    Google Scholar 

  33. L. Lamport (1986): Latex — A document preparation system. Addison-Wesley, Reading, MA

    Google Scholar 

  34. M.E. Lesk, E. Schmidt (1978): Lex — A lexical analyzer generator. In: B.W. Kernighan, M.D. McIlroy (eds.), UNIX programmer’s manual (7th ed.), Bell Labs, Murray Hill, http://cm.bell-labs.com/7thEdMan/vol2/lex

    Google Scholar 

  35. N.H. McDonald, L.T. Frase, P. Gingrich, S. Keenan (1988): Educational Psychologist 17:172–179

    Google Scholar 

  36. C.F. Meyer (1994): Studying usage in computer corpora. In: G.D. Little. M. Montgomery (eds.), Centennial usage studies, American Dialect Soc., Jacksonville, FL

    Google Scholar 

  37. A.N. Nelson (1962): The original modern reader’s Japanese-English character dictionary (Classic ed.). Charles E. Tuttle, Rutland

    Google Scholar 

  38. A. Renouf, J.M. Sinclair (1991): Collocational frameworks in English. In: K. Aijmer, B. Altenberg (Eds.) English corpus linguistics, Longman, New York

    Google Scholar 

  39. L.M. Schmitt, K. Christianson (1998): System 26:567–589

    Article  Google Scholar 

  40. L.M. Schmitt, K. Christianson (1998): ERIC: Educational Resources Information Center, Doc. Service, National Lib. Edu., USA, ED 424 729, FL 025 224

    Google Scholar 

  41. F.A. Smadja (1989): Literary and Linguistic Computing 4:163–168

    Article  Google Scholar 

  42. J.M. Swales (1990): Genre Analysis: English in Academic and Research Setting. Cambridge Univ. Press, Cambridge

    Google Scholar 

  43. F. Tuzi (2004): Computers and Composition 21:217–235

    Article  Google Scholar 

  44. L. Wall, R.L. Schwarz (1990): Programming perl. O’Reilly, Sebastopol

    Google Scholar 

  45. C.A. Warden (2000): Language Learning 50:573–616

    Article  Google Scholar 

  46. J.H.M. Webb (1992): 121 common mistakes of Japanese students of English (Revised ed.). The Japan Times, Tokyo

    Google Scholar 

  47. S. Wolfram (1991): Mathematica — A system for doing mathematics by computer (2nd ed.). Addison-Wesley, Reading, MA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag London Limited

About this chapter

Cite this chapter

Schmitt, L.M., Christianson, K., Gupta, R. (2007). Linguistic Computing with UNIX Tools. In: Kao, A., Poteet, S.R. (eds) Natural Language Processing and Text Mining. Springer, London. https://doi.org/10.1007/978-1-84628-754-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-754-1_12

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-175-4

  • Online ISBN: 978-1-84628-754-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics