Skip to main content

System for Analysing Big Weblog Data

  • Conference paper
  • First Online:
Book cover Information Science and Applications

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 621))

Abstract

The behavior and purposes of Internet usage of the users need to be understood based on the Web usage history within an organization. The data are stored as huge log files. Often, data are stored separately and exist at various places; therefore, it is difficult to manage or utilize the data. This research aims at examining and developing an analysis tool for log files applying Hadoop and Hive. The development was divided into two parts. First, data from the Web History were gathered by using PHP via SQLite in order to classify the data into website categories, especially Google, YouTube and Facebook. The obtained data were then used to analyze the categories of accessed websites. The findings were recorded on Hive by an enhanced algorithm to be able to analyze the categories. The algorithm was also designed to analyze words and phrases used in Google search. Second, behavior and purposes of accessing websites during class was analyzed. The results can be displayed in real time in a percent format and the frequency of Website accesses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gavandi P, Guri B, Ingawle S, Yadav S (2016) Web server log processing using Hadoop. In: 1st International Conference on Research. Enhancement and Advancements in Technology and Engineering

    Google Scholar 

  2. Namahoot CS, Pinijkitcharoenkul S, Brückner M (2018) Travel review analysis system with big data (TRAS). In: Lecture Note in Computer Science, 11344, pp 18–28

    Google Scholar 

  3. Savitha K,Vijaya MS (2014) Mining of web server logs in a distributed cluster using big data technologies. Int J Adv Comput Sci Appl 5(1):137–142

    Google Scholar 

  4. Hingave H, Ingle R (2015) An approach for MapReduce based log analysis using Hadoop. In: 2nd International Conference on Electronics and Communication Systems, pp 1264–1268

    Google Scholar 

  5. Saravanan S, Maheswari BU (2014) Analysing large web log files in a Hadoop distributed cluster environment. Int J Comput Appl Technol 5(5):1677–1681

    Google Scholar 

  6. Narkhede S, Baraskar T (2013) HMR log analyzer: analyze web application logs over Hadoop MapReduce. Int J UbiComp, IJU 4(3):41–51

    Article  Google Scholar 

  7. Rashmi S, Anirban B (2015) Scheduling strategies in Hadoop: a survey. Orient J Comput Sci Technology 8(3):234–240

    Google Scholar 

  8. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Antony S et al (2010) Hive a warehousing solution over a map-reduce framework. The VLDB Endowment 2(2):1626–1629

    Article  Google Scholar 

  9. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang, N, et al (2010) Hive a petabyte scale data warehouse using Hadoop. In: ICDE Conference, pp 996–1005

    Google Scholar 

  10. Oh J, Lee S, Lee S (2011) Advanced evidence collection and analysis of web browser activity. Digital investigation 8:S62–S70

    Article  Google Scholar 

  11. Savant P, Bhattacharyya D, Kim T (2016) Hadoop based Weblog analysis: a review. International Journal of Software Engineering and its Applications 10(6):13–30

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chakkrit Snae Namahoot .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Namahoot, C.S., Brückner, M., Lekkam, W. (2020). System for Analysing Big Weblog Data. In: Kim, K., Kim, HY. (eds) Information Science and Applications. Lecture Notes in Electrical Engineering, vol 621. Springer, Singapore. https://doi.org/10.1007/978-981-15-1465-4_53

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1465-4_53

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1464-7

  • Online ISBN: 978-981-15-1465-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics