Improving Online Search Process in the Big Data Environment Using Apache Spark

  • Karim Aoulad Abdelouarit
  • Boubker Sbihi
  • Noura Aknin
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 37)


In this article, we study the use of the Apache Spark solution to improve the online search process from the Big Data flow as part of the refinement of our new Big-Learn solution for online search, used by a learner in an e-Learning environment. The purpose of this study is to evaluate the Spark system in terms of fast data processing, large-scale complex analysis, and also in terms of ease of use, execution and integration of this solution with other layers related to the data search process and especially with the Solr Framework. Apache Spark is considered better than Hadoop in terms of fast processing large data, and also in the real-time analysis. It is in this context that we propose to study the integration of the Spark solution in order to offer a technique that better processes the massive data and thus allows to improve and organize the results of the online search. Our solution is based on the combination of Spark technology for massive data processing, with the Solr search platform, in addition to the use of the Lucene engine for data indexing.


Big Data Online search Spark Hadoop Solr 


  1. 1.
    Andersson, L.: Natural Language Processing in a Distributed Environment: A comparative performance analysis of Apache Spark and Hadoop MapReduce (2016)Google Scholar
  2. 2.
    Abdelouarit, A.K., Sbihi, B., Aknin, N.: Towards an approach based on Hadoop to improve and organize online search results in big data environment. In: Proceedings of the International Conference on Communication, Management and Information Technology (ICCMIT 2016), pp. 543–550. CRC Press, July 2016Google Scholar
  3. 3.
    Abdelouarit, A.K., Sbihi, B., Aknin, N.: Solr, Lucene and Hadoop: Towards a Complete Solution to Improve Research in Big Data Environment (Case of The UAE) Mediterranean Congress Of Telecommunications (CMT 2016), pp. 12–13, May 2016Google Scholar
  4. 4.
    Abdelouarit, A.K., Sbihi, B., Aknin, N.: Big-learn: towards a tool based on big data to improve research in an e-learning environment. Int. J. Adv. Comput. Sci. Appl (IJACSA) 6(10), 59–63 (2015)Google Scholar
  5. 5.
    García-Gil, D., Ramírez-Gallego, S., García, S., Herrera, F.: A comparison on scalability for batch big data processing on Apache Spark and Apache Flink. Big Data Anal. 2(1), 1 (2017)CrossRefGoogle Scholar
  6. 6.
    Kolosov, I., Gerasimov, S., Meshcheryakov, A.: Architecture of processing and analysis system for big astronomical data. arXiv preprint arXiv:1703.10979 (2017)
  7. 7.
    Kulkarni, A.P., Khandewal, M.: Survey on Hadoop and introduction to YARN. Int. J. Emerg. Technol. Adv. Eng. 4(5), 82–87 (2014)Google Scholar
  8. 8.
    Lenka, R.K., Barik, R.K., Gupta, N., Ali, S.M., Rath, A., Dubey, H.: Comparative analysis of spatial Hadoop and geospark for geospatial big data analytics. arXiv preprint arXiv:1612.07433 (2016)
  9. 9.
    Mavridis, I., Karatza, H.: Performance evaluation of cloud-based log file analysis with apache Hadoop and apache spark. J. Syst. Softw. 125, 133–151 (2017)CrossRefGoogle Scholar
  10. 10.
    Padillo, F., Luna, J.M., Ventura, S.: Exhaustive search algorithms to mine subgroups on big data using Apache Spark. Prog. Artifi. Intell. 6(2), 1–14 (2017)Google Scholar
  11. 11.
    Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Ghodsi, A.: Apache spark: a unified engine for big data processing. Commun. ACM, 59(11), 56–65 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Information Technology and Modeling Systems Research Unit, Computer Science, Operational Research and Applied Statistics LaboratoryAbdelmalek Essaadi UniversityTetuanMorocco

Personalised recommendations