Skip to main content

Text Data and Where to Find Them?

  • 364 Accesses

Abstract

This chapter first throws light on the standard data file types with their usage, advantages, and disadvantages. In a digital library, data might be useless and considered incomplete without a metadata record. Therefore, the functions, uses, components, and importance of metadata are covered comprehensively, followed by steps to create quality metadata, common metadata standards available, different metadata repositories, common concerns, and solutions. The second part of the chapter focuses on the importance of the inclusion of optical character recognition (OCR) for digitized data, followed by different ways of getting data from (i) online repositories, (ii) relational databases, (iii) web APIs, and (iv) web/screen scraping to start a text mining project. Further, several online repositories, language corpora, and repositories with APIs available for text mining are enumerated. Finally, some of the essential applications of APIs for librarians and for what purpose librarians can use them in their day-to-day work are covered in this chapter.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-85085-2_2
  • Chapter length: 45 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-85085-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Eagle N, Macy M, Claxton R (2010) Network diversity and economic development. Science 328:1029–1031. https://doi.org/10.1126/science.1186605

    MathSciNet  CrossRef  Google Scholar 

  2. Xu S (2018) Issues in the interpretation of “Altmetrics” digital traces: a review. Front Res Metr Anal 3. https://doi.org/10.3389/frma.2018.00029

  3. Salganik MJ (2017) Bit by bit: social research in the digital age. Princeton University Press, Princeton

    Google Scholar 

  4. Nicholson S (2003) The bibliomining process: data warehousing and data mining for library decision making. Inf Technol Libr 22(4):146–150

    Google Scholar 

  5. Breeding M (2014) The systems librarian: APIs unify library services. Comput Libr 34(3). http://www.infotoday.com/cilmag/apr14/Breeding--APIs-Unify-Library-Services.shtml. Accessed 26 Jul 2020

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Lamba, M., Madhusudhan, M. (2022). Text Data and Where to Find Them?. In: Text Mining for Information Professionals. Springer, Cham. https://doi.org/10.1007/978-3-030-85085-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85085-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85084-5

  • Online ISBN: 978-3-030-85085-2

  • eBook Packages: Computer ScienceComputer Science (R0)