1 Introduction

1.1 Background

The growing use of mobile applications has significantly changed communication and access to information such as personal data, emails, bills, and bank account details. The increasing number of smartphone users raises security concerns, such as those contributing to the growth of mobile malware on both iOS and Android platforms [1]. Malware can lead to harm, or damage data and devices, especially via online banking and social media exploitation. A report by Kaspersky discovered 676,190 malicious installation packages in the third quarter (Q3) of 2021, decreasing by 209,915 from the previous quarter and 445,128 from Q3 2020 [2]. For the first quarter (Q1) of 2021, a McAfee report stated that there were 2.34 million mobile malware cases, of which 389 were on iOS [3]. The McAfee report also stated that the number of new mobile malware incidences rose by 71% (1.35 million) in Q1 2020, while new iOS malware grew by over 50% (3,249), which is one of the reasons for the development of this paper [4]. Furthermore, in 2019, high-risk vulnerabilities were discovered in 38% of iOS mobile applications, compared to Android mobile applications with 43% [5]. In 2017, 40% of iOS malware attacks targeted financial services [6]. This raises the question of how iOS malware classification can be developed to mitigate security exploitation by attackers.

In this paper, social media exploitation is referred to as an act that causes victims to lose productivity and in which devices can be monitored and remotely controlled by the attackers. While, online banking exploitation is an act whereby victims lose financially, wherein their devices are exploited for banking information such as user ID and password.

Work by [7] reported different iOS zero-day malware in 2019, such as watering hole spyware. This type of attack is able to access confidential information such as iMessage, photos, and global positioning system (GPS) locations. While FinSpy was able to steal detailed personal information including SMS/MMS messages, telephone call records, emails, contacts, videos, files, and GPS locations [8], whereas Exodus could exfiltrate information including contacts, audio recordings, images, videos, GPS location, and device information [9]. In March 2021, Facebook closed down a hacking attempt by Evil Eye that used the social media platform to escalate Insomnia malware, a malicious program, to track Uyghur Muslims in China’s Xinjiang region [10].

In June 2020, the FBI released an official statement claiming that cybercriminals were taking advantage of the high usage of mobile banking applications. Since 2020, there has been a 50% rise in mobile banking. The FBI also warned people to be vigilant when installing applications for smartphones and tablets since some of them may be harmful. Banking trojans, which are harmful applications that masquerade as other applications such as games or utilities, are being used by cybercriminals to steal banking information [11]. This malware can steal confidential information, yet currently, there have been few studies examining the detection of iOS malware. The studies of [12,13,14,15,16] discussed iOS exploitation implications in general. Based on the cases above, the question arises as to what features are involved in developing an iOS malware detection model for social media and online banking exploitation.

This paper bridges this research gap by developing a new iOS mobile malware detection model. It consists of 30 new correlation patterns between malware behaviour and iOS architecture. Malware behaviour involves infection, activation, payload, operating algorithm, and propagation, whereas iOS architecture focuses on Cocoa Touch, media layer, core services, and core operating system (OS). The primary concern for iOS architecture is possible exploitation in its layers.

Many bio-inspired algorithms have been applied in different areas of cybersecurity especially in malware detection, such as Genetic Algorithm (GA), Fuzzy Logic (FL), Negative Selection Algorithm (NSA), and Danger Theory (DT) by [17,18,19,20]. In comparison with a bio-inspired algorithm, the phylogenetic method will be able to detect the root origins of malware evolution. It concerns the history of its evolution and is connected to a tree diagram with various organisms and taxonomic groups [21]. The phylogenetic concept underpins the development of this paper’s mobile malware classification and was used as the basis for producing a new iOS mobile malware detection model. The strength of this concept is its capability to identify the root of its ancestry, and a prediction can be made for any evolution of iOS malware. As a result, possible social media and online banking exploitation can be detected via this proposed model.

The main challenges for this paper are the rise of malware exploitation on social media and online banking, and a lack of analysis and mitigation solutions for iOS platforms, including jailbreak and iOS architecture exploitation. Few studies have been conducted in relation to iOS in comparison to Android, whose studies include [22,23,24,25,26,27,28]. The techniques used for previous studies were based on audio signal processing, artificial neural networks, feature selection, app similarity graph, feature weighting, combination API, and permission and formal methods. However, these techniques were only applied to the Android platform, and few studies were conducted on iOS exploitation. In addition, the most common method of iOS exploitation is using jailbreaks such as work by [29,30,31,32]. A jailbreak is used to eliminate AppStore restrictions, achieve full authority, add features, view the iOS device log, and bypass GSM providers and network consumption restrictions. Therefore, this paper has developed a model to detect malware on the iOS platform.

Previous studies, such as [12,13,14,15,16], only focused on static or dynamic analysis. Work by [12] used the static approach, which entails a grayscale histogram and reverse engineering of the application code to obtain the function from the source code, while [13] introduced five conditions the application must meet to determine a potentially exploited channel. By combining dynamic (low-level debugger and debug server) and static (reverse analysis tool IDA) analysis, the authors traced the data object’s origins back to the first triggered method for intercepting communication data. Work by [14] proposed a system that provides DNS and URL filtering using several backlists on mobile browser of iOS and Android, but it used manual evaluation to decide on the blacklist. Work by [15] introduced a threat model to test and analyse applications in Android and iOS, using a man-in-the-middle attack (MiTM) but only focusing on mobile banking applications. The authors used static and dynamic analysis for the testing process.

Work by [16] focused on static analysis only, using feature vector extraction to count the number of occurrences of a specific group of opcodes and a machine-learning algorithm to detect iOS mobile malware. Work by [33] suggested using Image Similarity-based Statistical Parameters (ISSP) and a visualization technique in which the disassembled malware code is turned into grayscale images. A Faster Region Proposals Convolution Neural Network (F-RCNN) classifier is used to train a vector made up of grayscale images with statistical characteristics. The proposed technique has an overall average classification accuracy of 98.12%. To identify and categorize Android malware, work by [34] provided a hybrid technique that extracts various features using static and dynamic malware analysis. These researchers also produced two dataset for detecting Android malware (dataset 1) and classifying its families (dataset 2). According to the experimental results, hybrid classification outperforms static and dynamic data in terms of detection and classification. Static and dynamic analyses focus only on certain features’ limitations and can miss a significant part of the detection. Therefore, this paper has implemented hybrid analysis with phylogenetic as the underlying concept for iOS malware detection to increase the accuracy rate.

In comparison, other works, such as [35,36,37,38,39,40,41], have used process mining, a fuzzy clustering algorithm, a persistent phylogeny tree model, a discrete-time Markov chain (DTMC), a Bayesian network algorithm, and an extension of the graphical lasso. These techniques were applied only on the Android and Windows platforms. Thus, in this paper, the phylogenetic concept has been implemented to predict the future evolution of iOS malware by developing a classification used as input for the iOS mobile malware detection model. Based on the previous studies mentioned above, this paper has overcome the existing challenges by focusing on the iOS platform, using hybrid analysis and the phylogenetic concept.

1.2 Contributions

The contributions of this paper can be summarized as follows.

  • It presents insight into works related to social media exploitation, online banking exploitation, iOS, and phylogenetics.

  • It provides an iOS malware classification, which is developed using a combination of functions extracted from malware. The classification is then mapped into phylogenetic. This consists of 30 new correlation patterns between malware behaviour and iOS architecture. From 30 malware classifications, 22 can be used against social media and online banking exploitation.

  • It also provides an iOS malware model detecting social media and online banking exploitation. During the evaluation, this model successfully detected seven out of 150 mobile applications with possible exploitation vulnerabilities related to social media and online banking.

1.3 Organization

The remainder of this paper is organized as follows. Section 2 discusses the relevant works used in this paper. This consists of social media exploitation, online banking exploitation, iOS architecture, iOS mobile malware exploitation, iOS mobile malware detection, phylogenetics, malware behaviour, iOS version, surveillance features, and accuracy. Section 3 presents the methodology, while Section 4 consists of the findings and malware detection modelling. Section 5 presents the evaluation of the proposed model, while Section 6 presents the discussion, limitations, and improvements. Section 7 presents the conclusion.

2 Related works

This section discusses the literature on social media exploitation, online banking exploitation, and iOS, focusing on the platform’s architecture and mobile malware exploitation and detection. Then, phylogenetic theory and its approach towards iOS are discussed. The chapter also discusses malware behaviour, the iOS version, surveillance features, and accuracy.

2.1 Social media exploitation

As of March 2021, Facebook had closed down a hacking operation using the social media site to spread Insomnia malware, a malicious program used to monitor Uyghur Muslims in China’s Xinjiang region. A group of hackers, collectively known as Evil Eye and linked to Chinese government entities, distributed iOS and Android malware on various websites to monitor activists, journalists, and protestors. The spyware known as Insomnia works on any web browser operating any version of iOS 10 and 11, as well as iOS 12.0, 12.1, 12.3, 12.3.1, and iOS 12.3.2, according to research published by the security firm Volexity. Once installed, this spyware gains access to users’ contacts, location, message data, and information from third-party applications [10].

Work by [42] discussed the common cybercrimes, mitigation techniques, and protection related to social media. They provided recommendations and techniques to prevent cybercrime regarding research studies. They declared that common types of cybercrime are spamming, hacking, malware, DoS attacks, phishing and social engineering, online identity theft, and cyberstalking. Work by [43] presented a detailed review of the most recent and relevant research papers on social media security and privacy, as well as the types of threats and attacks that affect users, released between 2018 and 2020. This work contributed to a firmer grasp of important factors in social media influencing security and privacy.

Work by [44] detailed several situations involving online social network threats and their remedies, including using various models, frameworks, and encryption approaches to protect social network users from multiple attacks. They presented various solutions and conducted a comparative analysis of different studies to understand the survey better. Work by [45] describes how cyberattacks are criminal attempts to infiltrate a user’s or an organization’s network to steal confidential or personal data for personal benefit and how social media users and organizations are exposed to cybercrime. The result suggests preventative actions that can slow or minimize the number of cybercrimes perpetrated against individuals and businesses. Work by [27] developed mobile malware classifications based on API and permission-based call logs, audio, and GPS that can be used for social media malware detection. This classification is beneficial to application development. The test results show that 16% of the mobile applications were classified as vulnerable to call log exploitation, 13% to audio exploitation, and 9% to GPS exploitation.

In conclusion, the works above discussed social media in terms of security and privacy but did not detail the OS of the social media application, either iOS or An-droid, except for [27], which focused on the Android environment.

2.2 Online banking exploitation

In June 2020, the FBI released an official statement declaring that cybercriminals are taking advantage of the high usage of mobile banking applications. Since the beginning of 2020, there has been a 50% rise in mobile banking. The FBI also warned people to be vigilant when installing applications for smartphones and tablets since some of them may be harmful. Banking trojans, which are harmful applications masquerading as other applications, such as games or utilities, are being used by cybercriminals to steal banking information. They also build fraudulent applications that imitate major financial institutions’ official applications to trick users into entering their login information [11].

Work by [46] proposed an Intrusion Response System (IRS) based on a network’s graphical network security model and a game-theoretic model of cyberattacks. The infection techniques, displayed behaviour, and network communication patterns of two popular banking trojans from the previous decade, Zeus (along with its companion ZitMo) and Emotet, were presented to provide additional insight into the future of banking trojans and malware in general. The study focused on Windows OS and Android. Work by [47] provided a summary of several studies published between 2009 and 2020 in 26 developed and developing countries from more than 24 different sources, with an average sample size of 460 users, to comprehend the adoption of mobile banking from the view of consumers. In addition, there have been a few other research publications on mobile banking, but there has been no quantitative survey or qualitative study, including in-depth interviews with professionals or users. This study did not include details on the OS of mobile banking, either iOS or Android.

Furthermore, work by [48] presented a model for mobile banking application selection that was proposed based on a combined fuzzy best-worst method (fuzzy-BWM) and fuzzy technique for order of preference by similarity to the ideal solution. The model can help financial institutions and customers overcome the difficulties of selecting an effective mobile banking application. Their approach has limitations in that expert decisions during pair-wise comparisons may be prejudiced; the existing BWM might be expanded to a new context to overcome the suggested model, and it does not entail details of the OS of mobile banking, either iOS or Android.

Work by [49] discussed several significant aspects of mobile banking in terms of threats, security requirements, and security solutions regarding limitations and improvements. This work did not include details of the OS of mobile banking, as being either iOS or Android. Work by [28] proposed applying a formal methods-based approach to detect banking malware in the Android environment. It reached precision and recall equal to 1 when evaluated with real-world Android applications. This work focused on the Android environment only.

In conclusion, several works are related to online banking but lack focus on the iOS environment.

2.3 iOS architecture

On 6 March 2008, Apple launched the first beta and a new operating system name: iPhone OS. Apple rebranded iPhone OS as ‘iOS’ in June 2010. The basic architecture for iOS is divided into four layers, which are Cocoa Touch, media, core services, and core OS [50]. Every layer has its own functions. The lower layers, such as the core services and core OS, manage basic services in iOS, while the upper layers, such as the media and Cocoa Touch, handle the user interface and advanced graphics [51]. Figure 1 presents the general components of the iOS architecture.

Fig. 1
figure 1

General Components Architecture of iOS

As shown in Fig. 1, the hardware layer is composed of the physical chips fused with the iOS circuit. The core OS layer is the lowest layer that interacts with the hardware directly. It has an operating system that is above the other layers. This layer handles memory management (allocation and de-allocation, once the application has finished), file management, and network management. The core service layer serves as the foundation on which the other layers are built. It contains several features, including data protection, SQLite database, file-sharing support, iCloud storage, XML support, and in-app purchases. Media layers manage graphics, audio, and video capabilities. Graphic framework, audio framework, and video framework are the three frameworks that form the media layer. These frameworks help access photos and videos stored on the device and manipulate images using filters and 2D sketching. The Cocoa Touch layer defines the look and feel of iOS applications by providing key frameworks. This layer oversees multi-tasking, touch-based input, push notifications, and many high-level system services [1]. The main concern related to iOS architecture is possible exploitation in the layers. This exploitation is also related to social media and online banking. Hence, these exploitations are the main focus of this paper and the reason for introducing the malware classification and malware detection for the iOS platform.

2.4 iOS mobile malware exploitation

There are many previous studies related to mobile malware exploitation for Android platforms. More researchers have focused on Android due to its open source policy, and there is a lack of work on a security solution for iOS platforms. Currently, most research is focused on attacks on jailbroken devices. Table 1 shows previous studies associated with iOS mobile malware exploitation.

Table 1 Previous Studies Associated with iOS Mobile Malware Exploitation

Based on Table 1, it can be concluded that most attacks focused on jailbroken de-vices. Jailbreak is a technique to penetrate Apple products’ iOS operating system. Users may use this approach to achieve full authority, view the iOS device log, and bypass restrictions of GSM providers and network consumption [53]. However, some attacks exploit non-jailbroken devices, where the attackers utilize iOS and private API vulnerabilities to attain control functions. When jailbreak was used, the iPhone’s surveillance features were indirectly affected. This paper has focused on the surveillance features, which are SMS, call log, GPS, audio, and camera and related them to iOS mobile malware exploitation. From this, a mobile malware classification has been developed.

2.5 iOS mobile malware detection

There has been a demand for mitigation solutions for iOS malware detection for the past few years. Table 2 shows previous studies on iOS mobile malware detection.

Table 2 Previous Studies Associated with iOS Mobile Malware Detection

Based on the Table 2, this paper has overcome the challenges of existing iOS mobile malware detection by using hybrid analysis and phylogenetic concepts to predict the future evolution of iOS malware.

2.6 Phylogenetic

Phylogenetic is beneficial in identifying the sources of evolving malware genes. It deals with the history of evolution, depicted in a tree diagram of various organisms and taxonomic groups [21]. Several types of phylogenetic tree models exist, including a minimum spanning tree, a persistent phylogenetic tree, and a dendrogram [38]. Figure 2 is an example of a phylogenetic diagram in general.

Fig. 2
figure 2

Example of General Phylogenetic Diagram

Figure 2 represents the ancestral population from which all other species descend is a root. A node represents an ancestral population branching point. The taxa or known as the terminal of the branch population (for example taxa A, B, C, D or E) represents the population that is used to designate the terminals located at the top of each branch.

Apparently, there are several works by [17,18,19,20] that used the bio-inspired concept to produce cybersecurity solutions. Table 3 shows a summary of the comparison of the different bio-inspired algorithms.

Table 3 Comparison Summarization of The Different Bio-Inspired Algorithms

Work by [35] proposed adopting the mining method to identify a malicious Android app’s belonging family. The authors used the process mining technique to automate the process and obtain accuracy ranging between 0.882 and 0.987 for the family recognition task, examining 12,604 Android application datasets. Another study that used process mining is that of [36]. The authors suggested a complex malware detection and monitoring approach based on process mining techniques. This extracted a declarative model called SEF from malware traces representing a fingerprint of their dynamic behaviour. The method was tested for malware identification on over 1,200 compromised application dataset in ten malware families.

Work by [37] used the process mining technique to detect a malware phylogeny model by analysing the system call traces recorded during application execution. It was combined with a fuzzy clustering algorithm to determine the malware samples’ derivation degree, where 4,000 infected applications across 39 malware families were analysed. The results showed that the method could cluster by comparing families with similar behaviour during malicious work. Work by [39] described a formalism inspired by phylogenetic methods to determine the similarity between trace data provided by malware families. In comparing the discrete-time Markov chain (DTMC) representation from the traces, the Kullback-Leibler Divergence (KLD) rate measures the proximity of an unknown malware trace to several malware families. The author showed the links between traces in the form of a network. Weights on the network edges measured the correlation between the nodes.

Work by [40] stated that malware phylogenetic can help researchers evaluate the nature of a new piece of malware code more quickly. The authors examined the use of expert knowledge of how the malware was developed within a family. The authors previously provided this information in a Bayesian network algorithm as novel. The benefit of Bayesian network learning is the combination of expert knowledge and statistical data. Work by [41] introduced an extension of a graphical lasso, which finds a weighted combination of static and dynamic views to produce a phylogenetic graph for a family of programs. The findings showed that the authors could locate phylogenetic charts effectively and that combining several views to optimize the equation increases efficiency dramatically relative to any single view and many baselines, such as minimum spans.

Earlier in Fig. 2, the general diagram of the phylogenetic diagram works is presented. Whereas Fig. 3 shows the mapping of the phylogenetic diagram to iOS malware classification based on the phylogenetic concept in Fig. 2. In Fig. 3, the taxa of a malware classification consisting of malware behaviour, iOS architecture and surveillance features are illustrated. Then, this malware classification will be used to identify possible malware exploitation. Additionally, a further explanation of how the phylogenetic is mapped with iOS malware classification can be referred to in Table 4.

Fig. 3
figure 3

Mapping phylogenetic diagram to iOS malware classification

Table 4 Mapping phylogenetic to mobile malware classifications in iOS

Then, based on the developed malware classification for iOS, further investigation is carried out to identify the related previous studies with phylogenetic for malware detection. As a result, Table 5 shows a summary of existing phylogenetic works.

Table 5 Existing works on phylogenetic

Based on Table 5, it can be concluded that phylogenetic can be used as a detection solution in predicting future iOS mobile malware. Therefore, this paper has developed a new iOS mobile malware detection classification inspired by phylogenetic. Three (3) features are mapped to phylogenetic: malware behaviour, iOS version, and surveillance features. The details of these elements are explained in the following section.

2.7 Malware behaviour

Malware behaviour comprises five (5) main elements: infection, activation, payload, operating algorithm, and propagation. These are essential for dynamic analysis to identify malware based on its behaviour [55]. For the enhancement of this paper, this worm classification was integrated with the phylogenetic concept as depicted as input α, as mentioned above.

Infection is the act of malware infiltrating a computer, system, or files [56]. Infection can be accomplished in several ways via the host or network. The host method entails that the malware is in its initial position and typically replicates itself and waits for the user to transfer it without permission. To activate is to make it operational. In a malware context, activation is the process for malicious activity to begin. There are four (4) types of basic malware activation: no activation, human trigger, schedule process, and self-activation. In the no activation scenario, the malware remains in and does nothing, taking up the hard drive’s storage space. The slowest means of activation is the human trigger. It needs to wait for the user to conduct some activity that triggers malicious action. As mentioned by [57], a malware creator could program malicious behaviour to only activate after a particular amount of time has passed since the application was executed. For a scheduled process, the malware creator must set up the malware to be triggered on a particular date and time. The quickest method to activate malware is self-activation, where the malware executes immediately.

As defined by [58], payload is the action whereby the vulnerability is conducted separately from the main behaviour. The payload can vary from stealing credential information to removing hard disk contents. Several types of payloads can harm devices. Some payloads allow the malware to access the embedded backdoor for the attacker’s purpose. A backdoor is a process wherein an attacker bypasses the normal authorization and is given unauthorized access [59].

The operating algorithm is considered to be a malware evasion method. There are several types of malware mechanisms, such as polymorphic, stealth, terminate and stay resident, and anti-antivirus. Polymorphic malware is a type of malware that modifies its identifiable features regularly to avoid detection [60]. The most prominent mechanism is stealth, wherein the malware slowly spreads, not producing any abnormal contact pattern, making identification difficult. As malware variants grow exponentially year-on-year, there is an immediate need to counter stealth malware techniques [61]. Terminate and stay resident is a software program that remains in memory until needed and executes a dedicated function [62]. Anti-antivirus is a technique used to aggressively attack software, making antivirus analysis difficult for researchers, or preventing malware detection by antivirus software.

Malware may attempt to replicate and distribute itself to a specific host or network. As mentioned by [63], malware propagation is the act of malware spreading across a network from one host to another. For example, random scanning, sequential scanning, or passive scanning can be performed in several ways. Random scanning is the most popular among this type of malware. The malware is linked to a particular IP address, and it attempts to scan the IP and commence a connection to propagate the malware.

It can be concluded that all malware exhibits certain behaviour during the exploitation of users’ devices, such as infection, activation, payload, operating algorithm, or propagation. The behaviour acts according to the malware’s functionality to steal information or be linked to particular IP addresses. In this paper, some malware exhibits similar behaviour: phishing for payload, host for infection, stealth for the operating algorithm, self-activation for activation, and passive monitoring for propagation.

2.8 iOS version

Apple supports its customers by providing them with the latest iOS edition. The version must be updated to ensure the user is adequately protected against existing security problems, such as bug fixes, patch vulnerabilities, and issues with the latest features [64]. Beginning with iOS 7.1.2, Apple upgraded its 32-bit processor to a 64-bit processor to improve its applications’ performance and graphics. Table 6 shows the iOS version and processor used.

Table 6 iOS Version and Processor Used

Malware writers use vulnerabilities in the iOS version to exploit devices that affect productivity, personal information, and victims’ finances. This includes the NSO Group’s Pegasus spyware that has successfully exploited “zero-day” vulnerabilities affecting the media player, which unidentified bugs or flaws in the mobile phone’s operating system that the manufacturer has not fixed. This spyware hacks and monitors iPhones used by potential targets. It is used as a zero-click attack and impacts a range of iPhone and iPad models, such as iPhone 5s, iPhone 6, iPhone 6 Plus, iPad Air, iPad mini version, and iPod touch [65]. To overcome the exploitation of these zero-day vulnerabilities, Apple released iOS 12.5.5, which includes a patch for a CoreGraphics vulnerability that allows maliciously created PDFs to execute arbitrary code on a target device. From 2007 to 2021, there were 2,443 reported vulnerabilities, consisting of denial of services (DoS), code execution, overflow, and memory corruption, as well as others. In 2021 there were 232 vulnerabilities, including 14 DoS, 111 code executions, 22 overflows, 23 memory corruptions, five cross-site scripting, three directory traversals, seven bypasses, six information gains, and seven privilege gains [66]. From the information above, it can be concluded that vulnerabilities in the iOS version can be exploited by attackers and cause damage to users.

2.9 Surveillance features

The smartphone has five (5) main surveillance features: SMS, call log, GPS, audio, and camera. Some applications that exploit SMS will anonymously charge users to send premium text messages. These can also prevent SMS operations from device operators to avoid users receiving fee messages [67]. Work by [68] tested 50 anonymous mobile applications from the Google Play store, of which 36% matched the proposed classification, composed of 16 new SMS Android package index (API) classifications that detect SMS exploitation. Call log is one of a phone’s main functions that lead to malware attacks. The attackers typically use the call log to gain useful contact list information and call history. According to study results by [60], 7% of the mobile applications evaluated were linked to possible call log exploitation, with mobile applications from the communication, entertainment, and game categories obtaining the highest scores.

A satellite navigation device used to determine the land location of an object or unit with GPS is known as a global positioning system. As stated in the work of [69], GPS is one of the surveillance features of a mobile phone that attackers frequently target. As a result, 10% of the mobile applications evaluated were considered to be high-risk and vulnerable to attack. Most attackers use GPS to exploit smartphone malware to gain information such as satellite data and track a victim’s movements [70]. Work by [71] indicated that audio and camera could be exploited, wherein new malware was found, allowing attackers to steal data, record audio and video, and even infect the device with ransomware. In conclusion, malware writers commonly use surveillance features to exploit devices to collect the credential information of the users. These features are mapped by the phylogenetic concept, malware behaviour, and the iOS version.

2.10 Accuracy

Accuracy is one parameter for classification evaluation. Informally, accuracy is the percentage of the classification’s predictions. The following definition refers to accuracy in evaluating the pattern developed in detecting social media and online banking exploitation:

$$Accuracy=\frac{{Number \; of \; correct \; predictions}}{{Total \; numbers \; of \; predictions}}$$

3 Methodology

Figure 4 shows the detailed process from the beginning to the end of this paper.

Fig. 4
figure 4

Detail Research Process

This work started with data collection, which comprised of downloading malware dataset from the Contagio dump website. The laboratory environment was set up by installing Mac OS Catalina as the operating system. The hopper disassembler and frida dump tool were run in the virtual machine. Then, the malware dataset were cleaned, and a hybrid analysis was conducted. For this analysis, several steps were necessary to unencrypt and decipher binary code using frida dump. This step could be skipped if the malware sample was already in .deb or. dylib format. Next, the parent process was identified and retrieved from each iOS application. Then, the malware files were monitored and documented. The functions for each feature were identified during this step. From the malware sample, classifications were created based on their malicious payload. The classifications were integrated into the phylogenetic concept for the classification detection process. Lastly, for evaluation, the classification developed was compared with 150 applications from the AppStore and third-party platforms to identify the applicability of the patterns developed with current iOS applications that possess malicious scripts related to social media and online banking exploitation.

Figure 5 shows the lab architecture used, while Table 7 shows the software and hardware used.

Fig. 5
figure 5

Lab architecture

Table 7 Software and Hardware Used

The similarity between iPhone 5S and the latest iPhone is the 64-bit architecture and processor. This paper used the iPhone 5S because it was easier to jailbreak the device to acquire full access to the operating system’s root and all its features. For this paper, 12 malware families consisting of 50 malware dataset were collected from the Contagio website for training, while 150 applications were taken from the AppStore and third-party platforms for testing. Several works, such as those by [16, 72, 73], have used the Contagio dataset as training and evaluation samples. Based on the previous studies and implications on iOS exploitation, the 12 malware families were selected due to their capacity to perform iOS exploitation. By choosing this malware, the fundamental concept of iOS architecture exploitation was captured, and by integrating phylogenetic, future malware evolution can be predicted.

To detect malware, several variables have been identified for the formulation of malware detection. This includes three (3) main variables for malware phylogenetic: malware behaviour, iOS architecture, and surveillance features. The formula is as follows:

  • Let \(\alpha\) be malware behaviour i, where \(\alpha =\bigcap _{i=1}^{p}{\alpha }_{i},{\beta }_{j}\) be iOS architecture j, where \(\beta =\bigcup _{i=1}^{m}{\beta }_{i}, \text{a}\text{n}\text{d} \gamma\) be a surveillance feature k, where\(\gamma =\bigcap _{i=1}^{p}{\gamma }_{i}.\)

  • Let M be the malware classification and T be the iOS devices. S is the detection model, and it can be defined as the following function:

$$ f({\text{M}},{\text{ T}}) = {\text{S}}, $$
(1)

where

$$ M\left(\alpha ,\beta ,\gamma \right)= \alpha +\beta +\gamma $$
(2)
$$\varvec{f}\left(\varvec{M}\varvec{i},\varvec{T}\varvec{j}\right)=\varvec{S}\varvec{i}\varvec{j}$$
(1.1)
  • where, M represents malware classification, T represents iOS devices, and S is the detection model.

$$M\left(\alpha ,\beta ,\gamma \right)=\alpha +\beta +\gamma$$
$$\alpha ={\alpha }_{1}\cap {\alpha }_{2}\cap {\alpha }_{3}\cap {\alpha }_{4}\cap {\alpha }_{5}$$
(2.1)
$$\beta ={\beta }_{1}\cup {\beta }_{2}\cup {\beta }_{3}\cup {\beta }_{4}\cup {\beta }_{5}$$
(2.2)
$$\gamma ={\gamma }_{1}\cup {\gamma }_{2}\cup {\gamma }_{3}\cup {\gamma }_{4 }\cup {\gamma }_{5}$$
(2.3)
$$ \begin{array}{*{20}c} {M_{i} } & \beta & \gamma \\ \vdots & \ddots & \vdots \\ {M_{n} } & \ldots & {\gamma _{n} } \\ \end{array} $$
  • where \({\alpha }_{1}-{\alpha }_{5}\): payload, infection, operating algorithm, activation, and propagation.

\({\beta }_{1}-{\beta }_{5}\) : iOS 10.x, iOS11.x, iOS12.x, iOS13.x, iOS14.x, iOS15.x.

\({\gamma }_{1}-{\gamma }_{5}\) : SMS, call log, GPS, audio, and camera.

4 Findings

This section further analysed and classified all the extracted functions using the reverse-engineered malware dataset. Then, the extracted functions used for exploitation by the malware were cross-checked and mapped with the iOS framework, as summarized in Table 8. This table is significant for identifying and classifying functions; it can either be used for actual operation or for possible exploitation. The iOS framework is a hierarchical directory structure that contains common resources like a dynamic shared library, nib files, picture files, localized strings, header files, and reference material. Multiple applications could simultaneously utilize all of these resources. Frameworks are similar to static and dynamic shared libraries in that they provide a library of routines that an application can access to do a specific task.

Table 8 Function And Framework Related to Exploitation

The framework involved the following: two (2) security, five (5) UIKit, six (6) foundation, one (1) AVFoundation, one (1) system configuration, and one (1) MessageUI, while for iOS architecture, there were two (2) Cocoa Touch, one (1) media, two (2) core services, and one (1) core OS layer involved. All details are summarized in Fig. 6

Fig. 6
figure 6

Summarisation of Malware Mapped to the framework and iOS Architecture

Patterns were created using the functions and frameworks obtained from the Contagio dataset. If the features are the same, these patterns can be analysed to recognize any possible malicious behaviour from other malware dataset. Table 9 displays the patterns that have been constructed. At this point, data from the static analysis were used to create patterns. All the functions were extracted from the malware dataset, and a combination of functions was used to make patterns. The commonly used function in the applications are _Unwind_SjLj_Unregister, Objc_msgSend, and _Unwind_SjLj_Register, where their occurrences in malware dataset are 14, 12, and 11, respectively. As mentioned above, patterns were developed using a combination of functions extracted from the malware. Table 9 shows the patterns for iOS malware classification.

Table 9 Patterns For iOS Malware Classification

The next section further evaluates all the patterns that were developed. Table 10 shows the similarities of functions from the malware dataset based on their malware types.

Table 10 Similarities of Functions from The Malware Dataset

The similarities or roots in the malware dataset are depicted in Table 10. The exploit classification was then mapped into the concept of phylogenetics in order to complete the iOS malware detection model. Based on the mapping results, the malware might lead to the possible exploitation of social media or online banking. For example, suppose surveillance features such as SMS or phone calls are being used. In that case, it might consider that online banking is being exploited because the applications usually use SMS and phone calls for verification and transaction purposes. The attack will constitute social media exploitation if the attackers are using those five features. Next, Fig. 7 shows the mapping of library function similarities between malware dataset.

Fig. 7
figure 7

Mapping of Library Function Similarities between Malware Dataset

As in Table 10 and Fig. 7, a conclusion can be drawn that each malware type has its functional similarities. As a result, these newly developed iOS malware classifications have overcome the challenges related to iOS platform exploitation. Hence, this classification has fulfilled the first objective of this paper. The iOS malware model for detecting social media and online banking exploitation was developed, as illustrated in Fig. 8.

Fig. 8
figure 8

iOS Malware Model Methods in Detecting Social Media and Online Banking Exploitation

Figure 8 shows the iOS model developed from malware dataset, functions extraction, and functions combination until integration with the phylogenetics concept. The classification resulted from combining malware functions based on the types. When the model matches the applications tested, it will be detected as a possible exploitation of social media and online banking. An example of mapping malware classification and phylogenetic is depicted in Fig. 9, while the detailed classification description can be referred to in Table 11.

Fig. 9
figure 9

Example Of Mapping Malware Classification and Phylogenetic Using Unflod Malware

Figure 9 shows Unflod malware, which can exploit social media and online banking. The type of malware behaviour exhibited by Unflod is phishing, host, stealth, self-activation and passive monitoring for payload, infection, operating algorithm, activation, and propagation. For the iOS chain, Unflod uses Chain 1 (iOS 10. x and below); for surveillance features, it uses GPS, SMS, call, audio, and camera.

Table 11 Malware Classification Mapped to Phylogenetic

Table 11 displays the phylogenetic mapping of 22 of 30 malware classifications, which can be involved in exploitation against social media and online banking. These show that exploitation could have occurred if malware behaviour, iOS architecture, and surveillance features matched the classification. If the attackers exploit SMS or phone calls, generally, this is online banking exploitation, because these applications commonly use SMS and phone calls for verification and transaction purposes. Attackers using those five (5) elements are exploiting social media. The malware behaviour of each classification needs to be examined deeply to identify the malware act. In summary, E1 (Unflod malware), E4 + E5 + E6 + E7 (Inception malware), E8 + E9 (XCodeghots malware), E10 + E11 + E12 + E13 (Wirelurker malware), E14 (Zerghelper malware), E16 (Xsser malware), E22 + E23 + E24 + E25 + E26 + E27 + E28 (Yispecter malware), and E29 + E30 (Keyraider malware) are the classifications that involve possible exploitation of social media and online banking.

5 Evaluation

The dataset evaluation involved three (3) categories of iOS applications: social media and online banking, none of the social media and online banking (games), and, lastly, a combination of social media and online banking with other applications. Table 12 shows the dataset categories’ details that match the pattern developed.

Table 12 Categories of Dataset Evaluation

Based on the analysis, 30 patterns were created. These were used as input for iOS malware model detection. For evaluation purposes, 150 applications from the AppStore and third-party platforms were compared with the patterns developed to identify applications that possess malicious scripts related to social media and online banking exploitation.

Fig. 10
figure 10

Occurrence of Functions From 150 Evaluated Apps

Figure 10 shows the occurrence of functions from 150 evaluated applications. The functions most often used in the applications are FC127 (Init) with 146 occurrences, FC61 (MainBundle) with 145 occurrences, and FC14 (setDelegate) with 144 occurrences. It can be concluded that most of the applications evaluated have these three essential functions. An application would become malicious if several functions were exploited, such as:

  • ObjectForKeyedSubscript

  • URLWithString

  • StringWithFormat

  • MobileInstallationUninstall

  • DatabaseWithPath

  • NSMutableDiction52ary

  • dataWithJSONObject

  • SharedApplication

  • CanOpenURL

To conclude, 4% percent of the evaluation dataset (seven out of 150 applications) matched the patterns developed. Table 13 shows the details of possible exploitation by the applications evaluated mapped to phylogenetics.

Table 13 Possible exploitation evaluated application mapped to phylogenetic

This table concludes that all seven applications have similarities with the pattern developed and exposed possible vulnerabilities based on phylogenetic classification. Apart from the result obtained based on matched patterns, further analysis was conducted to verify the proposed detection model. A few points to be considered when categorizing the mobile application as malicious or not are as follows.

False alarms could occur for any mobile applications analysed in this paper. Based on the analysis, some features might not be used for exploitation, specifically surveillance features. In this case,, theoretically, seven mobile applications should be using the surveillance features based on the mobile application named and described by the developer. The functions exploited that matched the evaluated applications are shown in Table 14.

Table 14 Functions Exploited Matched with Evaluated Applications.

It can be concluded that this model is able to identify features used for exploitation. For example, the pattern produced for this application matched with App 48, App 71, App 80, App 84, App 97, App 137, and App 144. The applications were under the media and health categories. Thus, all seven (7) applications have the possibility of exploiting social media and online banking.

6 Discussion

This paper developed an iOS malware classification focusing on social media and online banking. From the malware dataset, 30 patterns were created. Based on the analysis conducted, 22 of the 30 patterns have been linked to social media and online banking exploitation: E1 (Unflod malware), E4, E5, E6, and E7 (Inception malware), E8 and E9 (XCodeghost malware), E10, E11, E12, and E13 (Wirelurker malware), E14 (Zerghelper malware), E16 (Xsser malware), E22, E23, E24, E25, E26, E27, and E28 (Yispecter malware), and E29 and E30 (Keyraider malware). These classifications matched their functions, which involved possible command and control, monitoring devices, and exploiting banking information.

This paper also developed an iOS model detecting social media and online banking exploitation. The model was devised by integrating classifications and the phylogenetic concept, including malware behaviour, iOS version, and surveillance features. During the evaluation, this model successfully detected seven out of 150 mobile applications with possible exploitation vulnerabilities related to social media and online banking. It matched with App 48, App 71, App 80, App 84, App 97, App 137, and App 144. The classifications involved were E8 (Xcodeghost malware) and E24 (Yispecter malware). The applications that matched fall under the categories of media and health. All seven applications used five surveillance features: GPS, SMS, call log, audio, and camera. These applications have revealed that several functions can be exploited, such as dataWithJSONObject, SharedApplication, and CanOpenURL.

Based on the behaviour of malware, there are many possibilities for social media and online banking exploitation. Today, malware such as Pegasus spyware can activate the phone’s camera and microphone and record messages, texts, emails, and phone calls, including those sent via encrypted messaging and phone applications. All this information is sent back to the spyware’s clients. This spyware can execute all iPhone users’ functions on their smartphones. The Pegasus spyware involves the surveillance features of GPS, SMS, call log, audio, and camera. The malware behaviour can be connected to a c2 server, where the devices are remotely controlled. Thus, social media and online banking exploitation can occur.

The model created in this paper has successfully detected 4% of the mobile applications with possible exploitation vulnerability during its evaluation. This demonstrates that seven out of 150 applications downloaded from the AppStore matched the pattern developed. This proves that the model developed in this paper can detect any possible security exploitation related to social media and online banking for iOS mobile applications.

6.1 Limitations and points of improvement

There are some limitations to this paper. First, the constraint of the availability of a malware dataset related to this paper’s scope, focusing on social media and online banking exploitation. Currently, the data are more focused on diverse areas requiring more effort to perform the analysis. As the iOS versions continue to be updated and released, new frameworks and functions may be introduced. Collecting recent iOS malware dataset is crucial to identify new behaviour and malware actions using the phylogenetic concept. A larger and dedicated malware dataset will result in a more comprehensive and accurate classification that represents future malware evolution.

The second limitation is the method used to analyse decrypted applications. At the moment, it is evident that there is an increase of applications being encrypted with malicious intention. Automation and efficient techniques will help to hasten the analysis of encrypted applications. This research used manual analysis to identify the features’ functions. Hence, there is a need to automate the process to identify functions for each feature in the applications.

7 Conclusion

Based on the findings identified in this paper, it can be concluded that the misuse of existing applications’ functions and frameworks might lead to the exploitation of online banking and social media. An analysis of the pattern of malicious applications can be used to counteract this issue. Malicious applications require combinations of frameworks and functions to exploit the intended features successfully.