Mapping the Early Modern News Flow: An Enquiry by Robust Text Reuse Detection

  • Giovanni ColavizzaEmail author
  • Mario Infelise
  • Frédéric Kaplan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8852)


Early modern printed gazettes relied on a system of news exchange and text reuse largely based on handwritten sources. The reconstruction of this information exchange system is possible by detecting reused texts. We present a method to individuate text borrowings within noisy OCRed texts from printed gazettes based on string kernels and local text alignment. We apply our methods on a corpus of Italian gazettes for the year 1648. Beside unveiling substantial overlaps in news sources, we are able to assess the editorial policy of different gazettes and account for a multi-faceted system of text reuse.


Early modern newssheets Gazettes News flows Information exchange Media history Text reuse OCR 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dooley, B. (ed.): The Dissemination of News and the Emergence of Contemporaneity in Early Modern Europe. Ashgate, Farnham (2010)Google Scholar
  2. 2.
    Dooley, B.: International news flows in the Seventeenth Century – problems and prospects. In: News and the Shape of Europe, 1500-1750 Conference, London (2013)Google Scholar
  3. 3.
    Fu, Y.: Kernel methods and applications in bioinformatics. In: Springer Handbook in Bioinformatics, Springer, Heidelberg (2014)Google Scholar
  4. 4.
    Garcia, J.-B., Glaudes, P., Del Lungo, A.: Automatic detection of reuses and citations in literary texts. LLC 29(3), 412–421 (2014)Google Scholar
  5. 5.
    Hardie, A., McEnery, T., Songlin, P.S.: Historical text mining and corpus-based approaches to the newsbooks of the commonwealth. In: [1]Google Scholar
  6. 6.
    Infelise, M.: Prima dei giornali: Alle origini della pubblica informazione. Laterza, Bari (2002)Google Scholar
  7. 7.
    Leslie, C., Kuang, R.: Fast string kernels using inexact matching for protein sequences. JMLR 5, 1435–1455 (2004)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. JMLR 2, 419–444 (2002)zbMATHGoogle Scholar
  9. 9.
    Piao, S.L., McEnery, T.: A tool for text comparison. In: Proceedings of the Corpus Linguistics 2003 Conference, pp. 637–646 (2003)Google Scholar
  10. 10.
    Raymond, J.: Newspapers: a national or international phenomenon? Media History 18(3–4), 249–257 (2012)CrossRefGoogle Scholar
  11. 11.
    Seo, J., Bruce Croft, W.: Local text reuse detection. In: SIGIR, Singapore (2008)Google Scholar
  12. 12.
    Slauter, W.: The paragraph as information technology: how news travelled in the eighteenth-century Atlantic world. Annales HSS 67(2), 253–278 (2012)Google Scholar
  13. 13.
    Smith, D.A., Cordell, R., Maddock Dillon, E.: Infectious texts: modeling text reuse in nineteenth-century newspapers. In: 2013 IEEE International Conference on Big Data, pp. 86–94 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Giovanni Colavizza
    • 1
    Email author
  • Mario Infelise
    • 2
  • Frédéric Kaplan
    • 1
  1. 1.EPFL, CDH, DH LaboratoryLausanneSwitzerland
  2. 2.Humanities DepartmentCa’ Foscari University of VeniceVeniceItaly

Personalised recommendations