Skip to main content

Text Data and Where to Find Them?

  • Chapter
  • First Online:
Book cover Text Mining for Information Professionals

Abstract

This chapter first throws light on the standard data file types with their usage, advantages, and disadvantages. In a digital library, data might be useless and considered incomplete without a metadata record. Therefore, the functions, uses, components, and importance of metadata are covered comprehensively, followed by steps to create quality metadata, common metadata standards available, different metadata repositories, common concerns, and solutions. The second part of the chapter focuses on the importance of the inclusion of optical character recognition (OCR) for digitized data, followed by different ways of getting data from (i) online repositories, (ii) relational databases, (iii) web APIs, and (iv) web/screen scraping to start a text mining project. Further, several online repositories, language corpora, and repositories with APIs available for text mining are enumerated. Finally, some of the essential applications of APIs for librarians and for what purpose librarians can use them in their day-to-day work are covered in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Eagle N, Macy M, Claxton R (2010) Network diversity and economic development. Science 328:1029–1031. https://doi.org/10.1126/science.1186605

    Article  MathSciNet  Google Scholar 

  2. Xu S (2018) Issues in the interpretation of “Altmetrics” digital traces: a review. Front Res Metr Anal 3. https://doi.org/10.3389/frma.2018.00029

  3. Salganik MJ (2017) Bit by bit: social research in the digital age. Princeton University Press, Princeton

    Google Scholar 

  4. Nicholson S (2003) The bibliomining process: data warehousing and data mining for library decision making. Inf Technol Libr 22(4):146–150

    Google Scholar 

  5. Breeding M (2014) The systems librarian: APIs unify library services. Comput Libr 34(3). http://www.infotoday.com/cilmag/apr14/Breeding--APIs-Unify-Library-Services.shtml. Accessed 26 Jul 2020

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lamba, M., Madhusudhan, M. (2022). Text Data and Where to Find Them?. In: Text Mining for Information Professionals. Springer, Cham. https://doi.org/10.1007/978-3-030-85085-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85085-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85084-5

  • Online ISBN: 978-3-030-85085-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics