From Videos to URLs: A Multi-Browser Guide to Extract User’s Behavior with Optical Character Recognition

  • Mojtaba HeidarysafaEmail author
  • James Reed
  • Kamran Kowsari
  • April Celeste R. Leviton
  • Janet I. Warren
  • Donald E. Brown
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 943)


Tracking users’ activities on the World Wide Web (WWW) allows researchers to analyze each user’s internet behavior as time passes and for the amount of time spent on a particular domain. This analysis can be used in research design, as researchers may access to their participant’s behaviors while browsing the web. Web search behavior has been a subject of interest because of its real-world applications in marketing, digital advertisement, and identifying potential threats online. In this paper, we present an image-processing based method to extract domains which are visited by a participant over multiple browsers during a lab session. This method could provide another way to collect users’ activities during an online session given that the session recorder collected the data. The method can also be used to collect the textual content of web-pages that an individual visits for later analysis.


Web search User behavior Image processing Optical character recognition 


  1. 1.
    Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior information. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–26. ACM (2006)Google Scholar
  2. 2.
    Barve, S.: Optical character recognition using artificial neural network. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 1(4), 131 (2012)Google Scholar
  3. 3.
    Berchmans, D., Kumar, S.: Optical character recognition: an overview and an insight. In: 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pp. 1361–1365. IEEE (2014)Google Scholar
  4. 4.
    Borisov, A., Markov, I., de Rijke, M., Serdyukov, P.: A context-aware time model for web search. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 205–214. ACM (2016)Google Scholar
  5. 5.
    Bradski, G., Kaehler, A.: Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media, Inc., Sebastopol (2008)Google Scholar
  6. 6.
    Buades, A., Coll, B., Morel, J.M.: Image denoising methods. A new nonlocal principle. SIAM Rev. 52(1), 113–147 (2010)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Catledge, L.D., Pitkow, J.E.: Characterizing browsing behaviors on the world-wide web. Technical report, Georgia Institute of Technology (1995)Google Scholar
  8. 8.
    Chandarana, J., Kapadia, M.: Optical character recognition. Int. J. Emerg. Technol. Adv. Eng. 4(5), 219–223 (2014)Google Scholar
  9. 9.
    Hölscher, C., Strube, G.: Web search behavior of internet experts and newbies. Comput. Netw. 33(1–6), 337–346 (2000)CrossRefGoogle Scholar
  10. 10.
    Hsieh-Yee, I.: Research on web search behavior. Libr. Inf. Sci. Res. 23(2), 167–185 (2001)CrossRefGoogle Scholar
  11. 11.
    Kumar, G., Bhatia, P.K.: A detailed review of feature extraction in image processing systems. In: 2014 Fourth International Conference on Advanced Computing and Communication Technologies (ACCT), pp. 5–12. IEEE (2014)Google Scholar
  12. 12.
    Lowe, D.: CPSC 425: Computer Vision (January–April 2007) (2007)Google Scholar
  13. 13.
    Mori, S., Nishida, H., Yamada, H.: Optical Character Recognition. Wiley, New York (1999)Google Scholar
  14. 14.
    Patel, C., Patel, A., Patel, D.: Optical character recognition by open source OCR tool Tesseract: a case study. Int. J. Comput. Appl. 55(10), 50–56 (2012)Google Scholar
  15. 15.
    Rose, D.E., Levinson, D.: Understanding user goals in web search. In: Proceedings of the 13th International Conference on World Wide Web, pp. 13–19. ACM (2004)Google Scholar
  16. 16.
    Shao, L., Yan, R., Li, X., Liu, Y.: From heuristic optimization to dictionary learning: a review and comprehensive comparison of image denoising algorithms. IEEE Trans. Cybern. 44(7), 1001–1013 (2014)CrossRefGoogle Scholar
  17. 17.
    Smith, R.: An overview of the Tesseract OCR engine. In: 2007 Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE (2007)Google Scholar
  18. 18.
    Spalevic, Z., Ilic, M.: The use of dark web for the purpose of illegal activity spreading. Ekonomika 63(1), 73–82 (2017)CrossRefGoogle Scholar
  19. 19.
    Srivastava, J., Cooley, R., Deshpande, M., Tan, P.N.: Web usage mining: discovery and applications of usage patterns from web data. ACM SIGKDD Explor. Newsl. 1(2), 12–23 (2000)CrossRefGoogle Scholar
  20. 20.
    Xue, Y.: Optical character recognition. Department of Biomedical Engineering, University of Michigan (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mojtaba Heidarysafa
    • 1
    Email author
  • James Reed
    • 2
  • Kamran Kowsari
    • 1
  • April Celeste R. Leviton
    • 2
    • 5
  • Janet I. Warren
    • 2
    • 4
  • Donald E. Brown
    • 1
    • 3
  1. 1.Department of Systems and Information EngineeringUniversity of VirginiaCharlottesvilleUSA
  2. 2.Institute of Law, Psychiatry, and Public PolicyUniversity of VirginiaCharlottesvilleUSA
  3. 3.Data Science InstituteUniversity of VirginiaCharlottesvilleUSA
  4. 4.Department of Psychiatry and Neurobehavioral SciencesUniversity of VirginiaCharlottesvilleUSA
  5. 5.Department of SociologyUniversity of CaliforniaRiversideUSA

Personalised recommendations