Introduction

Malware is a term used for all kinds of malicious software designed to attack and damage a computer system. The common malware causing harm to the network and operating system include Trojan horse, worm, virus, spyware, ransomware and adware (Razak et al. 2016). These malware attack the system in different methods (Qamar et al. 2019). Such a malware is proficient in triggering destruction to the operating system and networks. In Quarter 2 (Q2) of 2019, malware targeting mobile devices had increased by 50% as compared to the previous year (Palmer 2019). In January 2019, a total of two billion hacked records had been uncovered (Sanders 2019). McAfee Labs noted that attacks from ransomware had grown by 118% in 2019 (Mcafee 2019a, b) while banking Trojan had doubled from June to September 2018, serving as the most vigorous in growth, among the malware families (Mcafee 2019a, b). A new malware reported by Chebyshev et al. (2019) revealed that the new Trojan Android.MobOk took money from mobile accounts by using subscriptions. To date, a total of 905,174 malicious installation packages had been detected by Quarter 1 of 2019. The number had decreased by 151,624 in Quarter 2 of 2019. Statistics showed that the risk tool had increased by 41.24% in 2019 as compared to only 30.20% in 2018. Furthermore, the percentage of adware and Trojan was noted to have increased by 18.71% and 11.83%, respectively in 2019. This substantial growth in mobile attacks over the years showed that attackers were progressively noticing that the Android mobile devices are attractive targets (Verkijika 2019).

The Android network offers attractive functions for communication, such as entertainment, data storage, and social communication (Chen and Li 2017). The acceptance of the operating system in Android mobile devices has been one of the most targeted by malware, spurring the attention of unscrupulous authors (Shrivastava and Kumar 2019a). These people have been encouraged by their own unscrupulous goals and other lucrative benefits. The lack of priority given to security by mobile device developers has also caused the exploitation of malware into mobile devices (Thompson et al. 2017). To combat the security issues, Android itself has provided sand-boxing for security mechanism; however, malware authors creatively manipulated other vulnerabilities of Android to spread the malware (Qamar et al. 2019). In addition, the lack of user awareness (Lopes et al. 2019), and the vulnerabilities of a computer operating system boost the opportunity for malware to exploit data through malicious codes (Goel and Jain 2018). These malicious programmes accomplish different purposes, such as encrypting, stealing crucial information stored in phone storage, removing important data, modifying or controlling main computing functions, and capturing the activities that are unknown to the users (Basu et al. 2019). The reliance on mobile devices by most users for their personal work through Wi-Fi access (Sharma and Gupta 2018a) gives feasibility to attackers to attack a user’s credential (Sharma and Gupta 2016). The awareness on the emerging of malware should be alerted to all mobile users as a way to prevent the devices from being damaged.

To prevent the dissemination of malware, devices are protected by using existing methods such as anti-malware software and the intrusion detection system (IDS) (Talal et al. 2019). Nevertheless, novel approaches are still needed to detect the rapid increase in malware attacks throughout the year. With the advent of more advanced technologies, malware authors are able to hide malware from detection. Malware authors applied a diverse sophisticated obfuscation technique including encryption, packing, polymorphism and metamorphism (Huda et al. 2018). It is generally used to prevent signature extraction originating from the malware’s binary code (Or-Meir et al. 2019). This phenomenon has prompted many researchers to investigate and analyse the features of malware. Most of the studies were conducted so as to introduce a better approach of preventing, detecting, and proposing a new approach to solve Android malware. A study by De Lorenzo et al. (2020) used dynamic analysis with Vizmal to spot and avoid malware. Vizmal is a visualisation tool used to trace the execution of applications in Android. It is used to overcome the issue of obfuscation created by malware authors. Rolling acts as an assistant during inspection of malware analysis and observes the localisation of malicious. Others studies such as Yerima et al. (2014) and Yu et al. (2013) applied the Bayesian technique to detect malware. Another study Magdum (2015) used a feature of permission-based dimension in machine learning to identify the malware. All these studies which described the research activities in this field are crucial. Despite the many research activities that have been published, the bibliometric study of malware likewise becomes popular in today’s research trends to provide an impactful study.

Bibliometric is a quantitative analysis of articles published in a specific field (Blanco-Mesa et al. 2017; Baker et al. 2019). The bibliometric study analyses the data and features of articles, such as productivity, research area, Web of Science (WoS) categories, authors, high cited articles, institutions, and impact journals. The bibliometric method is used to evaluate the impact of published articles and to assist the researcher in understanding the structure of the research life (Reuters 2008). It reveals the area of the studies, thereby increasing the interest and attention of researchers and funding institutions. Analysis derived from the bibliometric method is able to compare the countries that contributed to the publications according to their respective fields. Bibliometric study has been applied in a wide range of fields including the COVID-19 pandemic (Gautam et al. 2020), environmental (Zhang et al. 2020), agricultural (Luo et al. 2020), sustainable development (Ye et al. 2020), Chinese loess plateau (Zhang and Chen 2020), accounting (Merigó and Yang 2017), economic (Bonilla et al. 2015), linguistic decision making (Yu et al. 2016) and fuzzy research (Merigó et al. 2015). Bibliometric studies contribute to several advantages such as: (a) reveal the importance of research in the related field, (b) reveal the development of research based on the institution and performance, (c) enable researchers to use the publication of related studies for future studies, and (d) to improve the knowledge of new researchers.

The current study aims to evaluate studies done on the Android malware which have been published in the WoS from the year 2010 to 2019. The study scrutinises the Android malware research topic, publication pattern, research area, authors, highly cited articles, impact journals, and the institution of the studies. The significant aspect in this analysis is that the Web of Science has a wider view of the contributions. In planning the review of Android malware articles in the WoS database, the following steps were followed: (1) identify and analyse the Android malware study in the Web of Science for 10 years (2009–2019); (2) present the findings of Android malware detection considering articles, productivity, research area, the Web of Science categories, authors, high-cited articles, institutions, and impact journals; (3) define and study the research gap, the highlighted questions, and the difficulties encountered in the prior studies; and (4) identify the latest trends on Android malware attack. The objective of classifying these steps is to deliver a better understanding of Android malware. The proliferation of Android malware studies has been analysed to determine the tendency of malware pattern and the detection procedures taken to prevent the spreading of malware. Focusing on the past 10 years publication of malware specifically for Android malware study, this bibliometric analysis similarly looked at the introduction of the scope and aims of the study by planning and evaluating the challenges in malware trends.

The current study used “android malware” as the main keyword to get the related publications. The keyword is imperative in order to retrieve current information on the research trend, and also to disclose the research direction and attraction. The related publications were searched by using the WoS Core Collection database. Limit was set at the past 10 years (2010–2019). Additionally, this paper also discussed the malware detection system and the challenges in malware study. As a summary of the paper, we analysed the research publication comprising seven (7) continents including Asia, Europe, North America, the Middle East, Australia, South America, and Africa. Asia had the highest publication at 40.5%, among all the continents, followed by Europe with 26.5%, and North America with 20.3%. This showed that Asia outperformed Europe by a difference of 14% while North America and Europe had some disparity. The continent with the least contribution of publication of Android malware seemed to be Africa, at 0.7% only. Table 1 illustrates the distribution of the publication in seven continents.

Table 1 Publication of 7 continents

The remainder of this paper is systematised as follows. Section 2 discusses the process of collecting data. Section 3 provides the findings of the studies. Section 4 explains the taxonomy for the detection system of malware. Section 5 discusses the challenges and imminent trends and Sect. 6 concludes the paper.

Methodology

Bibliometric is defined as the statistical method used to analyse articles, books, and other publications. It is frequently used in the library and information science field (Library 2020). Bibliometric is similarly referred to as scientometrics. Bibliometric analysis covers part of the research evaluation methodology, and various kinds of literature tend to have their own method of bibliometric analysis (Ellegaard and Wallin 2015). According to Razak et al. (2016), bibliometric is a process to appraise, analyse, and envision the arrangement of scientific fields. The bibliometric approach focuses on quantitative analysis, such as citation counts. In such analysis, the term ‘complementary’ is used as a qualitative indicator to search for issues like funding granted, rewards received, peer review, and number of patents (Library 2019). The key concepts of the bibliometric approach are output and impact, which are used as a measurement for publications and citations. Hence, bibliometric studies give many advantages in order to provide the important trend of the research topic.

Bibliometric analysis has been used in various areas of study. The bibliometric study done by Shanker et al. (2020) analysed the studies of neurosurgeon’s academic works in the New York metropolitan area. Another bibliometric study was by Iwami et al. (2019) who examined fields that co-evolved with information technology while (Ospina-Mateus et al. 2019) analysed the study of motorcycle accidents. A study by Baker et al. (2019) in the field of financial economics used bibliometric analysis to present the productivity and impact of RFE (review of financial economics). Additionally, Prashar and Sunder (2019) used bibliometric study in the field of sustainability development, Raparelli and Bajocco (2019) in the field of vehicle agricultural and Galetsi and Katsaliaki (2019) in the field of Information Science. Comparatively, the bibliometric study of malware is only just emerging in research trends as compared to other fields. In this study, the researcher illustrates how to evaluate the research by using the bibliometric method. The evaluation is conducted through the analysis to get the impact of the articles. Table 2 analyses past studies which had applied the bibliometric approach, in which the current study is similarly applying. However, there are some dissimilarities noted on the keyword and findings used.

Table 2 The list of studies of bibliometric methods

For the development of this study, the author used the database Web of Science which belongs to Thomson Reuters. In this study, WoS core collection database was chosen and SciELO Citation Index, KCI-Korean Journal Database, and Russian Science Citation Index were removed. The selected articles are solely written in English. To carry out the research analysis, the keywords malware and android malware were used to distinguish the numbers of publications of both keywords. The keyword Android malware focused on the publication of mobile malware while the keyword malware generated global information of malware including cybercrime, IoT, phishing and many more articles of malware in the WoS. The advantages of using the keyword ‘Android malware’ is the collected articles are related to mobile malware and resulted in better in findings. Thus, the Android malware is selected for the keyword in this bibliometric study.

The data for this study were analysed two times by considering the changes of number of the publication in the WoS database. Firstly, analysis of data was on October 2019 and secondly in February 2020. In February 2020, there were 1278 articles of Android malware and 5622 articles for malware. In this filter, 97 articles were excluded consisting of the SciELO Citation Index, KCI-Korean Journal Database, and Russian Science Citation Index. Then, the selected 1278 articles were analysed for the title, year of publications, research area, author/s, citation, institution/s and impact journal. These articles included articles, journals and book chapters. With the selected 1278 articles, an analysis was done by forming the affiliation between the research area, author/s, citation, institution/s and impact journal. Finally, the open-source application called R was used as a tool to visualise the final result. R was used because this tool supports many bibliographic visual for analysis and comprises excellent features. Figure 1 clarifies the data collection process.

Fig. 1
figure 1

Methodology of data collection

Web of Science (WoS)

The Web of Science (WoS) is a webpage that offers multiple databases for indexed journal articles. Formerly known as the Web of Knowledge, the WoS was introduced by the Institute for Science Information (ISI). It is presently managed by Clarivate Analytics (Iwami et al. 2019). The WoS has indexed coverage starting from the year 1900. The WoS has covered more than 12,000 impact journals, with 148,000 journals and book-based proceedings, across 256 disciplines in science, social sciences, and humanities (Webofknowledge 2018). It provides the basic search, cited reference search, author search, and advanced search, from four databases such as the Web of Science Core Collection, the KCI-Korean Journal Database, Russian Science Citation Index, and SciELO Citation Index. The WoS provides the citation report; it also analyses the result so that it can track the activities, and the impact of the journal through an appropriate keyword search.

This study chose the WoS database because the contents of the WoS had been evaluated before, based on publication impact, review, influence, and geographical distribution. The WoS served as a research tool that accommodates the user in acquiring the information, and in analysing and disseminating knowledge. The WoS has innumerable capabilities of search and analysis. These are useful for researchers when searching for index journals in their respective areas. The indexing was first used to search for the results across disciplines. Past studies of bibliometric included Baker et al. (2019), Shukla et al. (2020), Yao et al. (2020) and Chen et al. (2019) which utilised the WoS database comprising of science, social science, arts, and humanities field. Besides the WoS, there are other database websites, such as ScienceDirect, Elsevier's Scopus, IEEE Explore, Google Scholar, and Springer.

Findings

This section describes the findings of Android malware studies. Articles between 2010 and 2019 were analysed. Findings were divided into seven (7) sub-topics: publication year, countries, research areas, authors, institutions, highly cited article, and impact journals. The total publications were noted to be 1278 articles, as presented in Table 3.

Table 3 Publication based on the year

The statistics in Table 3 showed that publications of Android malware studies had increased twice in amount, starting from 2011 until 2014. The highest publication was in 2017, with 254 publications. The increased publication can be attributed to the wild growth of malicious software on Android devices. This seemed to have encouraged researchers to examine the factors infected by malware, the vulnerabilities of the devices, and the impact and method used to prevent and reduce those malware attacks. Publications on Android malware dropped slightly in the year between 2018 and 2019. The reason can be attributed to the delayed time taken by reviewers and publishers to accept such articles. Figure 2 describes the type of publications based on the years.

Fig. 2
figure 2

Numbers of publication type based on years

In Fig. 2, it is noted that publications were increasing smoothly year by year. This occurrence then dropped slightly in 2018, and more significantly in 2019. Undoubtedly, publications of journals consume time from acceptance of articles to publication, hence the rate in publication showed a decline. This had clearly affected the number of publications for that particular year. In addition, the publication of book chapters was more noticeable in the year 2017 and 2019.

Productivity

Table 4 illustrates the output of the publications among the continents. It is essential to scrutinise the output growth of the articles in order to analyse the malware issue that is a worldwide concern. These articles were analysed based on the continent category so as to detect the awareness of the malware issue and the frequency of malware attacks in the user country. Data presented in Table 4 list the publications across continents from year 2010 to 2019.

Table 4 Productivity based on continents

Following the analysis of publications across continents, data are subsequently categorised based on countries and continents according to year. Table 5 further illustrates.

Table 5 Productivity of continent based on year

From the above, it can be noted that the most productive continent in publishing articles were Asia and Europe. The former produced 40.5% while the latter produced 26.5%, and North America produced 20.3%. It appears that Asia had outperformed Europe by 14%, thereby making Asia the most prolific in publications focusing on Android malware. Among these, 20.1% of publications were from China. Other countries that followed suit include: the United States, India, Italy, and South Korea. Comparatively, the Middle East, Australia, Africa, and South America contributed less.

Research funding is genuinely needed in scientific research. Here, it is observed that the United States had spent around 500 billion USD for research and development (R&D) while China had spent about 400 billion USD (Enago Academy 2018). However, research in the United States remained stagnant due to economic trouble (Enago Academy 2018) whereas China managed to increase its R&D funding, simultaneously yielding the most in scientific research. This is because it had the support of its government with a lot of funding provided for a collaborative venture in China (International Center 2019). In this regard, China defeated the United States, for the first time in science publishing (Enago Academy 2018; Dockrill 2018). Thus, Asia has become the most prolific in the publication of Android malware articles.

Research area

The subsequent finding focused on research areas which discussed the total publications found on a particular research area. This measure is important for measuring the performance and challenges observed in the different fields of studies. The yield of the related research areas uncovered the movement of the research studies. Here, it was noted that the WoS contained 27 research fields in the publication of Android malware. Table 6 presents this outcome.

Table 6 Research area of studies

From the above, the statistics showed that there were numerous research areas that were related, for instance, Computer Science, Engineering, Telecommunication, Science Technology, and Automation Control systems. The publications noted for all these research areas were dominated by Computer Science and Engineering, with 86.1% and 38%, respectively. The total publications for Computer Science involving Android malware issues emerged from the evolution of device technology. Here, it was observed that the total publications from the Computer Science field were 1100 articles, followed by Engineering with 386 articles.

Second to Computer Science, the Engineering field was then followed by the Telecommunications field. Based on this, it can thus be deduced that Computer Science and Engineering correlated with each other. Both contributed to developing a new technology that could be used by academia and the public. Nonetheless, there were specific terms observed to be related to Computer Science and Engineering, for instance, machine learning, security, artificial intelligence, computer architecture, and data processing. The development of new mobile devices was associated with the expertise of Computer Science and Telecommunications, hence their link with each other.

The article with the highest citation was traced to Dissecting Android Malware: Characterisation and Evolution with 655 citations under the Computer Science and Engineering area in the WoS database. This confirmed that there was a close connection between the field of Computer Science and Engineering. Consequently, there was no significant difference within the first and second contributors in the publication of Android malware articles. Both areas were correlated in producing articles on the same topics. The rest of the research areas are listed in Table 6.

Web of Science categories

Table 7 lists the WoS categories, which presents the seven (7) sub-categories of Computer Science. The first among these was Computer Science Theory Methods, followed by Computer Science Information System. The other five sub-categories came under the research area of Engineering comprising Electrical Electronics Engineering, Multidisciplinary Engineering, Mechanical Engineering, Industrial Engineering, and Aerospace Engineering. As is obvious, Electrical Electronics Engineering comprised the most Android malware related publications, while Aerospace Engineering had the lowest.

Table 7 Web of Science categories

Author

The finding in terms of the author is significant in this bibliometric study. It facilitates other researchers in their studies by highlighting the most prolific or most active contributor in terms of publications in the Android malware research. Table 8 presents the top 20 most influential and productive authors. The table classified under Author is organised in terms of the number of publications, institutions, and countries.

Table 8 Authors

Data above highlights publications generated from all the seven continents. Countries like Europe and Asia were the most notable, producing the most publications in Android malware with countries like Italy, Luxembourg, Malaysia, China, and India holding the best record. The top three authors were from the continent of Europe, specifically, from Italy. The most prominent author was Francesco Mercaldo, who published 33 articles, followed by Fabio Martinelli with 20 articles and Mauro Conti with 19 publications. Both Francesco Mercaldo and Fabio Martinelli were from the University of Sanni, whereas Mauro Conti was from the University of Padua. From Asia, Yang Liu, Nor Badrul Anuar, and Vijay Laxmi served as the most active contributors. From China, Yang Li contributed a total of 16 publications while Nor Badrul from the University of Malaya, Malaysia, contributed a total of 15 articles. The top 20 authors who were involved in various research areas were from 16 different institutions.

High cited articles

This section describes the number of citations, as illustrated in Table 9. A list of 25 most cited articles with information in terms of citation numbers, published journal, year, and research areas was presented. The top three contained the most cited publications which were published between five and seven years ago. This information conformed with the theory that the citation came from articles that have been longer in the database (Razak et al. 2016). The research areas contributing to the publications on Android malware include Engineering, Telecommunications, Science Technology other topics, Automation Control Systems, Robotics, Mathematics, and finally, Computer Science which had become the dominant field for highly cited articles.

Table 9 Highly cited articles

As noted in Table 9, the article that was most cited was, “Dissecting Android Malware: Characterisation And Evolution” which received 655 citations (Zhou and Jiang 2012). The author of this article was from China, the continent of Asia, and the article was published by the journal of the IEEE Symposium on Security and Privacy in 2012. The article described the characteristics and evolution of malware by presenting a total of 1260 samples of Android malware from 49 dissimilar families. The characteristics of these malware samples were examined based on their behaviours, including installation, activation, and payloads. The article indicated the best detection of the malware at 79.6% and the worst detection at 20.2% based on the dataset. This outcome thus demanded that a better solution be developed for the next generation of mobile malware detection.

The top second article was “Flowdroid: Precise Context, Flow, Field, Object-Sensitive And Lifecycle-Aware Taint Analysis For Android Apps”, with 385 citations published in 2014 by Acm Sigplan Notices (Arzt et al. 2014). This article used static taint analysis to present FLOWDROID for Android applications. The experiment was implemented on 500 benign and 1000 malware from Google play and the VirusShare project, respectively. A closer view of both the two articles suggested that researchers studying malware detection could use this information for further knowledge. These articles were the most highly cited and acknowledged by other new researchers based on findings, methods, and ideas.

Institutions

This section discusses the publications that were linked to the respective institution. The aim of doing this was to categorise the institutions by comparing the publications. It was found that institutions from Asia held the highest in Android malware publications. Table 10 illustrates the top 30 of the greatest institution, comprising four continents: Asia, Europe, the Middle East, and North America.

Table 10 Institutions

Table 10 presents the most distinguished institutions in publishing Android malware articles. It is noted that the Chinese Academy of Science is the greatest institution for publication, followed by Beijing University. This also showed that institutions from the continent of Asia had the greatest number of publications. This was then followed by other institutions from the continent of Europe, followed in line by North America.

Other distinguished institutions that were from Asia include the University of Chinese Academy, Tsinghua University, University of Malaya, Korea University, and the University of Jinan. This study further discovered that most eminent institutions in Asia were located in China. Moreover, China's speed in the publication surpassed other countries in Asia, with mainly seven (7) institutions that contributed to these publications. Moreover, the analysis showed that the entire publications among institutions were held together by a small gap. Slightly different publications among the institutions proved that the researchers had excellent facility and high competition.

Impact journal

This section discusses the impact of the journal under the Computer Science field. A journal is a publication comprising of articles written by researchers and experts in a specific field of study and solely for academic or technical purposes. The impact journal is one of the critical parts in this study as it represents the most prominent journal with the greatest citations received in publications. The most influential journals are shown in Table 11 with the quartile, numbers of citation, impact factor, and average citations per year.

Table 11 Impact journal of Android malware articles

From the top 20 highest impact journal articles of Android malware, there were eight (8) articles with Quartile 1 (Q1) impact. Q1 to Q4 refers to journal’s ranking quartiles within a subdiscipline. Q1 is the greatest impact of the journal. In this regard, the most influential journal in this study was the IEEE Communications Surveys and Tutorials that have been in the WoS for five (5) years. It has an average of 22.2 citations per year. The title of the best impact journal article in the WoS was: Android Security: A Survey of Issues, Malware Penetration, and Defenses with 111 citations. Moreover, the oldest journal in the WoS is the Journal of Systems and Software which has been in the WoS for ten years. It has 44 citations and an average of 4.4 citations per year. Aforementioned, the number of journals for Quartile 2 (Q2) is two (2), and for Quartile 3 is seven (7).

Figure 3 illustrates the top 20 authors, with 17 countries, and 28 of the most used keywords. As seen in the figure, China is the highest contributor to the publication of an article with 12 authors. Next in line is Italy, the United States, India, and Luxembourg. There seemed to be a significant difference between the first contributor, China, and the second contributor, Italy. The most common keywords used by the authors were: malware, Android, malware detection, and machine learning. Likewise, Malaysia also contributed to the publication, with the keyword most used being Android. The figure shows that the continent of Asia is the most prolific contributor to the production of Android malware, with studies conducted in China, Malaysia, India, and Singapore.

Fig. 3
figure 3

Relationship between country, author, and keywords

Figure 4 illustrates the relationship between the title, the authors and their affiliations. The titles most frequently used by the authors are Android, malware, and detection, and this applies to all the institutions. The title less used by the authors were framework, dynamic classification, approach, and techniques. Yang Liu from China was the top author, as seen in the figure. He also used the keyword Android in the title of his articles. The top university noted in Fig. 4 is traced to the University of Chinese Academy Science from China. Likewise, the University of Malaya, and the University of Malaysia Pahang, from Malaysia, also contributed to this publication on Android malware.

Fig. 4
figure 4

Relationship between title with author and affiliation

Malware intrusion detection system (IDS)

This section describes the malware IDS used as a methodology in malware detection. Malware is purposely created to disrupt the computer or mobile devices so as to gain information and to spread the virus to infect the devices. Android has a size of 3.5 million applications and 99% have been targeted by malware (Amin et al. 2020). Most of the antivirus provided in Android apps do nothing to check the malware behaviour (Whitwam 2020). On top of that, 21.1 million Android mobile devices have been affected by malware applications when mobile users downloaded applications from Google Play Store (Counterpoint 2019). This malware will indirectly influence users to adhere to unwanted premium services, thereby causing severe damages to the mobile device (Computer Hope 2019). Malware applications calmly kidnap users’ account details, making users subscribe to premium messages via SMS, and then compromising the hardware (The App Store Celebrates 10 Years and 2 Million Apps 2018). Mobile devices usually contain a lot of personal data and crucial information that are often used for online transactions, and as a medium for bill payments (Wazid et al. 2019), thereby leading to many financial transactions. The impact of the malware is that it would conduct all these activities silently without the mobile device users’ knowledge, causing users’ financial losses. Some methods have been introduced to help researchers detect and overcome malware presence. Amin et al. (2020) proposed Android Intent (implicit and explicit) for malware detection by combining the Android permission and Android Intent. The use of intent continued in study (Shrivastava and Kumar 2019a) which focused on permission and intent modelling. On the other hand, Taheri et al. (2020) developed four detection methods using Hamming distance to find the similarities of benign and malware samples. Those mentioned studies used static analysis technique which is considered the greatest method in reducing power and time consumption in detecting the malware. Despite that, Garg et al. (2020) proposed a multi stage model using anomaly (dynamic) to solve the security of IoT-enabled application. Both techniques have different roles and advantages.

Malware detection system is divided into three (3) sections as illustrated in Fig. 5. This system includes the analysis techniques, the detection approaches, and the deployment approaches.

Fig. 5
figure 5

Taxonomy of the malware detection system

Analysis technique

Analysis technique is a method which can determine the malicious code by classifying the malware features into two types: dynamic analysis and static analysis (Belaoued et al. 2019). Both types of analysis techniques are used to detect malware presence. Unfortunately, the unscrupulous author can use obfuscation as a technique to prevent being detected (Or-Meir et al. 2019). Obfuscation is a technique practiced so as to make something difficult to understand.

Static analysis is a technique of investigating the code in offline mode (Amin et al. 2020). The examination is executed without running an application (Amin et al. 2020; Statista 2019; Tam et al. 2017; Akour et al. 2017). For this purpose, it uses the reverse engineering technique to extract certain features for analysis, such as API and data permissions (Singhal et al. 2019). Static analysis detects the malware by comparing the detection code with the source code in the database. The process of the static analysis reads the code and detects unfamiliar code as malware. Studies by Singhal et al. (2019) and Magdum (2015) have detected malware by using static analysis technique. The advantage of using static analysis is its fast detection. The process of detection can be performed without executing the applications (Shrivastava and Kumar 2017). Although static analysis is unable to detect the obfuscation technique, it is able to reveal and address the suspicious files much faster (Shrivastava and Kumar 2019a).

Dynamic analysis observes the behaviour of malicious files during the execution of an application (Akour et al. 2017). It is different from static analysis in that dynamic analysis is able to detect unknown malware, new malware, and even obfuscation techniques (Kuntz et al. 2017; Kim et al. 2019). The application that is detected as malicious by static analysis will then be re-analysed by dynamic analysis. This technique is more accurate and it reduces costs. Some studies such as Lanet et al. (2018) and Feizollah et al. (2017) had used dynamic analysis. The only limitation of dynamic analysis is that it is unable to identify malicious applications like IMEI stealers (Singhal et al. 2019). Table 12 illustrates the comparison between static and dynamic analyses.

Table 12 The comparison between static and dynamic analyses

Detection approach

Malware detection approaches can be divided into three types: signature, anomaly, and hybrid (Razak et al. 2016). The signature approach detects malware events by matching the signature stored in the database via the normal and abnormal patterns (Seo et al. 2014a, b). In comparison, the anomaly approach recognises malicious behaviours by supervising the events via network traffic and system (Suárez-Tangil et al. 2018). It has the advantage of detecting new malware and unfamiliar malware by observing the behaviour. Nevertheless, this approach is unable to detect unfamiliar and new malware that is not matched with the signature in the database. Thus, the database needs to be updated frequently in order to enable the detection of various malware. The comparison between the signature and anomaly approaches is presented in Table 13.

Table 13 The comparison between signature and anomaly approaches

Another approach is the hybrid approach which is the combination of the anomaly and signature approaches. The combination helps to enable the detection of new malware whenever the signature is unable to perform the detection. This approach overcomes the deficiency of both the anomaly and signature approaches. The studies by Seo et al. (2014a, b) and Yu et al. (2013) had used the anomaly approach to detect malware. Table 14 demonstrates the studies of the signature approach, Table 15 highlights studies of the anomaly approach and Table 16 presents the studies of the hybrid approach.

Table 14 Signature approach
Table 15 Anomaly approach
Table 16 Hybrid approach

Deployment approach

The deployment approach is used for detecting malware in the intrusion detection system (IDS). An IDS is a security tool used for recognising intrusions, just like the firewall (Feizollah et al. 2013). The IDS hardware, software, or combination, is used for monitoring the activities and for detecting the malware signal in the network or system. Anomaly detection and signature-based detection are two types of IDSs (Daimi 2017). The malware intrusion detection system is deployed either in a host-based, network-based, or hybrid-based system. An activity in the host-based system (HIDS) is monitored, analysed, and processed by itself whilst the deployment detection in network-based (NIDS) system is run by a remote server (Mas’ud et al. 2014a, b). Meanwhile, the hybrid-based detection system comes from the combination of the HIDS and the NIDS. The aim of the combination (HIDS and NIDS) is to increase the capabilities of the existing IDS (Potteti and Parati 2015). The deployment approach used by previous studies is presented in Table 17.

Table 17 Deployment and detection approach studies

Mobile malware

The popularity of the mobile device has spurred the emergence of malware. Most malware target Androids for spreading the malicious code because it is the most commonly used operating system in many mobile devices. As mentioned before, malware targets mobile activities by stealing user-sensitive data such by encrypting users’ banking data, eliminating crucial data, altering, and monitoring user’s activities without the users’ knowledge (Qamar et al. 2019; Arabo and Pranggono 2013). Malware is able to interrupt the operation of the devices by consuming the resources of the devices such as the storage, processor, and network (Shrivastava and Kumar 2019b) (Cyber Secur. Parallel Distrib. Comput. 2019). The malware author has a lot of creativity such that they spread the malware by infecting the devices and network insidiously. To better understand malware threats, this section reviews studies of mobile malware extracted from the WoS database, published from 2010 to 2019. Table 18 lists the various types of malware and its characteristics which are incredibly harmful to mobile devices. These diverse types of malware can threaten the devices by employing different purposes in order to damage the system in the mobile devices.

Table 18 Malware and the characteristics

The table indicates how each malware type can attack the mobile devices through varying methods. The infected and damaged mobile devices would then be infiltrated with fake emails, unnecessary software updates, fake websites, and counterfeit applications. Their presence is unnoticed because they are silent; thus, devices would be infected without the user’s knowledge. Users would only detect their presence when the devices are fully damaged, or in critical condition. Future studies should attempt to describe the detection using multiple methods so as to reduce such incidences on mobile devices. Figure 6 presents the mapping of malicious malware types and their behaviours.

Fig. 6
figure 6

The mapping of malicious malware types and their behaviours

Risk analysis

Risk analysis is a process used to identify the loss, the threat, and the level of risk occurring (Alali et al. 2018). The level of risk is measured based on the impact of the mobile attack. As mobile device functions grow drastically to compete with the new emergence of design among developers in the market place, mobile users face higher risks (Naga Malleswari et al. 2017). Risk analysis is thus analysed by some procedures, such as categorising the risk, triggers, effects of the risk, re-evaluating the possibility of the risk, and finding the factor to mitigate the risk (Sharma and Gupta 2018b; B 2018). There are three levels of risk, such as low, medium, and high (Shrivastava and Kumar 2019c). Likewise, there are three main elements of security materials, such as confidential data, availability, and integrity (B 2018). Table 19 illustrates the risk level of risk analysis.

Table 19 Risk level

The risk levels are the yield of the inacceptable effect of ambiguous events or impact of the event. The risk levels are evaluated based on the factor of impact and likelihood. Nevertheless, the vulnerabilities of this method are that they are unable to incorporate the abilities of the threat so as to determine the risk level. Moreover, the threat depends on the vulnerabilities of the system. Therefore, risk analysis helps the user to manage the risk factor for a specific event.

Threats

The risk analysis is the consequence of the threat on mobile devices. Mobile threats are divided into four (4) classes, such as application threats, web threats, network threats, and physical threats (Lookout 2019). Table 20 represents the details of each threat.

Table 20 The threat and descriptions

Evaluation measure

The common evaluation of measurements as practiced by researchers in malware IDS is the effectiveness of the system they used. This evaluation focuses on accuracy, true positive rate (TPR), false positive rate (FPR), true negative rate (TNR), false negative rate (FNR), f-measure, and recall. A true positive (TP) indicates the precise measurement of the presence of malware. The higher the true positive, the better the outcome. A false negative (FN) indicates a detection of malware erroneously defined as benign. A true negative (TN) refers the benign correctly as a benign while a false positive (FP) defines a benign erroneously as a malware (Kamesh and Sakthi Priya 2012).

Challenges and future direction

The challenges and movements for future research that are related to mobile malware are hereby also discussed. A number of studies had emphasised the malware issue which posed a threat to mobile devices. It is thus a challenge to many researchers looking at malware detection. Although numerous methods have been noted in advanced studies, and various systems have been proposed for detecting malware automatically, malicious files, websites and the number of malware continue to grow (Akour et al. 2017). Thus, more needs to be done in this research field.

Accuracy

The accuracy of malware detection is measured by using the measurement of TP, FP, TN, and FN. They are called true if the detection is accurate and matches reality. The perfect detection is when the TPR = 100%, TNR = 100, FPR = 0% and FNR = 0%. In truth, it is impossible to achieve 100% accuracy of TP and FN (Akour et al. 2017). However, with a larger amount of data, analysis may possibly provide a near accuracy of the positive or negative measurement. False positive or false negative is likewise known as a false alarm. It incorrectly identifies a legitimate programme as a malicious programme or a malicious programme as a legitimate program. This is a big challenge in the IDS. This issue frustrates users and the developers when the programme they had created is blocked. This occurrence can affect the reputation of their business. No one will run the programme anymore when it is flagged as malicious. Another effect for a false alarm is that it could turn the device to become dangerous when the suspicious programme runs into the user device. This scenario is a significant problem in current technology. A study by Wang et al. (2018) uses a hybrid approach to analyse the data of malware, and the results showed a lower rate of false alarm.

Features

Features are the first part to be selected prior to analysing and detecting the malware. The best feature selected would allow the detector to become more efficient (Aung and Zaw 2013). Inappropriate features may cause a high false alarm (Razak et al. 2016). However, the number of features can be reduced so as to get a higher level of accuracy. The first and crucial step in machine learning method is feature selection (Feizollah et al. 2015). The selection of appropriate features can thus lead to higher accuracy, thereby reducing the false alarm. Nevertheless, accumulating a massive number of inappropriate features for the machine learning classification may cause classifier drawbacks like the misunderstanding of algorithm learning, an increase in the model’s running time, and lower generality (Mas’ud et al. 2014a, b). Subsequently, an enormous size of features contributes to the growth of space usage, and intricacy management. Therefore, it is unsuitable for mobile devices with limited storage and restricted power consumption. The selection of appropriate features enables machine learning classifiers to make more efficient detections during the pre-processing of data. Thus, reducing the features is necessary in order to preserve the accuracy.

Dataset

The occurrence of Android malware attacking users has increased rapidly in recent years. The Android malware applies sophisticated techniques such as metamorphism, polymorphism, oligomorphic, obfuscation, and modification to avoid detection. The detection mechanism provided by mobile devices is unable to operate efficiently due to restricted datasets, and the lack of understanding of malicious activities. To evaluate the proposed system of detection, a dataset base is required. The limitation of the malware sample can make the detection system unreliable. This study Razak et al. (2016) discovered that more than 100,000 malware modifications belonged to 777 families. Studies by Zhou and Jiang (2012) and Arzt et al. (2014) had used malware samples from the Virus Share project, and benign samples from Google Play. They noted that the dataset of malware has been proliferating. Based on this, a restriction mechanism is needed. Moreover, outdated dataset has also become inappropriate for analysis, thus research also requires the latest dataset to be examined so as to improve the detection performance in terms of accuracy and to lower the rate of false alarms.

Risk assessment

Risk assessment is a fundamental method used for explaining the possibility of risk levels. It is a crucial part that shields the user against dangerous applications; it grants mobile users a possibility of reducing the threat impact to a tolerable level. The process for risk assessment is carried out so as to measure the impact of the threat based on the value of the assets, threats, vulnerabilities, and the effects resulting from the attack. The acknowledged risk from threats and weaknesses must be ranked depending on the criticality of the issue.

Leading a risk assessment is interesting due to less awareness on its effect on risk decision making. A study by Naga Malleswari et al. (2017) helped to improve users’ awareness by presenting the privacy risk for users before granting permission. Similarly, a study by Alali et al. (2018) proposed the Fuzzy Inference Model (FIS) which determines four (4) factors of risk: threat, vulnerability, impact, and likelihood. These were used for classifying the risk impact, and for providing the response to mitigate the risk. Razak et al. (2019) also presented the risk factor based on zoning approaches.

Android malware on the Internet of Things (IoT)

The IoT is the modernised technology of communication among things and objects (Wu et al. 2019). The IoT integrates widely with mobile devices by serving various services around the world. The mobile devices supervise and control the provided services for the long distance with keyless mechanisms. For example, Macmanus (2012) offered a location in Audi’s new business car. The volume of data produced every day from different IoT has enlarged, from terabytes to petabytes (Garg et al. 2020). The IoT services produce more convenient experiences such as remote monitors to lessen energy waste for home equipment, such as air conditioning, television, and refrigerator. With the sharp growth of technology, more and more IoTs services are controlled by Android mobile applications.

Behind the sophisticated technology of IoT, the issue of security in IoT services has also worsened, especially with malware attacks. Likewise, the IoT threat has also increased over the past few years. Attackers are able to slip into mobile user devices and reach the control of the IoT. It benefits the attacker by acquiring and integrating information such as personal data, contact number, location, payment data of Internet banking from mobile devices. The open-source in the Android application has become one of the factors causing malware increase in the IoT services. Studies showed that eight new malware families that emerged in the year 2015 had mostly originated from China and the United States (Johnson 2016).

To overcome the malware attack, some protection has been introduced. However, the protection of the IoT system is actually a part of the tough problem due to the difficulty in developing an effective detection system and in avoiding the leakage of information. The study by Park et al. (2019) proposed three levels of awareness to be introduced into the IoT system: define the threat, measure the risk, and optimise the risk. A study by Wu et al. (2019) was able to detect malware by using the Bayesian network which was grounded on traffic feature analysis. The result showed a higher accuracy with fewer substantial features. Another study (Ham et al. 2014) used the linear support vector machine (SVM) to detect malware so as to secure a reliable IoT service. Another study (Garg et al. 2020) used the Density-Based Spatial Clustering of Applications with Noise for the same purpose.

Mobile banking

The successful use of mobile phone among people and network thriving globally has encouraged the people to expose their business to online systems (Sharma and Gupta 2016). Exposing the business has to expand the users of mobile banking. Despite the advantages of using mobile banking, this type of banking also invites the proliferation of malware altogether. The emergence of banking malware necessitates more attention as this threat is one of the most dangerous threats to the mobile user; e.g. by generating malicious code with the intention of stealing personal financial information from banking and transferring funds activities to the hacker accounts. Mobile bankers previously spread the malware through third-party apps and recently infiltrate Google Play widely (Mobliciti 2020). A new version of mobile banking malware is impersonating as legitimate cryptocurrency wallet to steal money from the secure wallet found on Google Play (Seals 2020; Whittaker 2019). The malware will flourish more in sophistication as cryptocurrency trading becomes widespread.

Fake applications for COVID-19 pandemic

Covid-19 has threatened the world since 2019 and in this period, malware have been growing fast. Ransomware thrives 72% and mobile vulnerabilities grow 50% (Security 2020). This increase in malware is because most of world population is in lockdown, whereby this situation renders developers the busiest in gaining profit for their benefit. This complicated situation has offered attackers to highlight their talent of creating applications for users. Starting with fake application to control the dissemination of the coronavirus, malicious apps were also created to give recommendations on how to avoid infection from the biological virus. Users would show unlimited interest in any application that are related to COVID-19 in order to stay healthy (Moran 2020). Banking is a susceptible sector in this pandemic since users tend to utilise online shopping during lockdown situations. Banking trojan and information stealers were found rampant with the increase of unemployment (Ljubas 2020). Thus, this sector has contributed to the greatest amount of malware activities to spread malice during the COVID-19 pandemic.

Conclusion

The popularity of computers and mobile devices has led to the emergence of new malware. According to TMS (2011), malware had increased by 54% in 2017 as compared to previous years. A total of 24,000 malicious files are detected each day. An estimation by (Spring 2019) noted that one out of five computers would be attacked by at least one malware in 2019. The Internet is one of the factors frequently spreading malware into user’s devices. To alleviate malware problems and to improve safety in mobile devices, several approaches have been introduced by various studies.

The current study used the bibliometric technique to analyse the Android malware trends from 2010 until 2019. Some findings were noted and highlighted, for instance, productivity, research area, authors, highly cited articles, institutions, and impact journals. These criteria were able to highlight the research trends related to Android malware production. The number of Android malware production had increased at an average rate of 2.1% on a yearly basis. The report by Dobran (2019) stated that ransomware attacked new organisations every 14 s for the year 2019, and for the year 2021, it would be every 11 s.

The bibliometric analysis of the Android malware in 2010 until 2019 showed that Asia was the highest contributor of research publications, among other continents. Next was Europe and North America. The Middle East, Australia, Africa, and South America contributed less. The highest publication of Android malware articles was from China, with a total of 25% publications, followed by the United States, India, Italy, and South Korea. This implies that Asia had outperformed Europe by a difference of 17.6% of publications.

In addition, this study has also highlighted the top 20 authors who were most active in the area of research. The top author was Francesco Mercaldo, followed by Fabio Martinelli, Mauro Conti, and Carraro Aaron Visaggio. These top four authors were from Italy, the continent of Europe. The top two authors and the fourth top author were from the University of Sannio while the third top author was from the University of Padua. The subsequent authors were from the countries of Luxembourg, Malaysia, China, and India.

This study has shown the bibliometric analysis of the publications in the field of Android malware. The analysis provides the objective and a quantitative measure of the influence that a publication has on its respective specialty. The information present in this study is important for researchers to build the network of research in their field of study. It is hoped that the information would encourage more future research to be performed as a measure to overcome the rapid proliferation of malware. Finally, this study delivers a general depiction on the subject matter and aims to exhibit the importance of the expansion in the field of Android malware investigation.