Keywords

1 Introduction

Nowadays, Android has become the mobile phone operating system with the largest market share, and its development boom has also brought about new network security issues [1, 2], such as criminals taking advantage of mobile phone program vulnerabilities to seek benefits, and leaking user privacy. Therefore, vulnerability testing of Android applications is essential before facing users [3].

There are few types of research onvulnerability mining of office software onthe Android platform, and the design of test cases is relatively simple. To better solve the threat of Android memory corruption vulnerability, this paper designs, and implements a Fuzzing-based Android platform domestic office software vulnerability mining system. Under the Android platform, office software constructs special test cases, observes the exceptions thrown and the process crashes to find out the possible vulnerabilities, and ensures the security of the mobile offices.

The main contributions of this paper are as follows:

  1. 1.

    Generate test cases by mutation-based, generation-based, and Char-RNN-based methods to ensure the coverage of test cases and detect applications from multiplesides.

  2. 2.

    Analyze the operating mechanism of office software applications under the Android platform, and construct a set of effective fuzzing test schemes, which can run successfully under various versions of Android and have a wide range of applications.

  3. 3.

    Design and implement a set of office software vulnerability mining systems based on Fuzzing technology to find possible vulnerabilities [4]. The system is simple and easy to use, displays the process and results intuitively, and reduces the threshold of use. The system adopts a modular design, and each module runs independently to facilitate the subsequent functional debugging and upgrading of the vulnerability mining system [5].

The structure of this paper is as follows: Chapter One gives a brief introduction, Chapter Two designs the overall framework and various modules of the system, Chapter Three implements the system, Chapter Four conducts experiments and evaluations, Chapter Five summarizes and puts forward the improvement direction.

2 System Architecture Design

The system is divided into four modules: visualization platform module, test case generation module, fuzzing module, and automatic analysis module. The visualization platform module constructs the graphic page of the entire system, the test case generation module is responsible for constructing semi-effective test cases, the fuzzing module is responsible for the entire process of test cases from sending to running, and the automatic analysis module is responsible for analyzing the crash information and logs that appear during the test to discover the security vulnerabilities that exists. As shown in Fig. 1:

Fig. 1.
figure 1

System module division.

3 Implementation of System Module

3.1 Test Case Generation Module

Mutation-based Method. Mutation-based test case generation requires samples to be obtained in advance, and the steps for generating PDF and HTML are similar. Take the generation of a PDF file as an example, collect a malicious PDF sample set from GitHub as input for subsequent mutation operations. In the program, use the generate_dumb_pdf_sample() method to achieve. By controlling the number of mutations, the input files are mutated to different degrees to ensure the coverage of the generated samples. The specific process is shown in Fig. 2:

Fig. 2.
figure 2

The specific process of the mutation-based Fuzzing method.

The main steps are as follows:

  1. (1)

    Use the choice() function of the random module to randomly select one from the preset sample library as the given valid input;

  2. (2)

    Obtain the length of the file, use the randrang() function in the random module to randomly select a position “start” as the starting point for subsequent operations;

  3. (3)

    Determine the text length “len” for mutation, and choose arbitrarily on the premise that it does not exceed the maximum length of the file;

  4. (4)

    Perform mutation operations based on the values of “start” and “len”, such as inserting a random character, deleting a character or flipping a character, etc.;

  5. (5)

    Write the content obtained after mutation into a new PDF file for subsequent fuzzing.

Generation-based Method. The system made some modifications to the grammar rules of the Google Domato open-source fuzzing test tool to generate PDF files and HTML files for testing. To generate HTML, just call the gen_new_jscript_js() function in Domato. Generate PDF test cases using m PDF (a PHP library) method, the generation steps are as follows:

  1. (1)

    Call the header() method in mpdf to write the file header of the pdf, where “%PDF-1.1” is used.

  2. (2)

    Call the indirect object() method in mpdf to write the object.

  3. (3)

    Call the gen_new_jscript_js() method to randomly select and generate a javaScript script from the modified Domato grammar rule library and write it into the object.

  4. (4)

    Call the xref And Trailer() method in mpdf to write the cross-reference table and tail of the pdf.

Char-RNN-based Method. The system uses Char-RNN to generate test cases as a supplement to ensure the comprehensiveness of test cases and uses TensorFlow to quickly build the Char-RNN framework. The specific process is as follows:

  1. (1)

    Read and decode the sample set, and convert it to UTF-8 encoding. Vectorize the sample and establish the mapping relationship between strings and numbers.

  2. (2)

    The text is divided into text blocks with the growth of x + 1. Each input sequence contains x characters in the text, and the corresponding target sequence is moved one character to the right. Rearrange and package the data into batches.

  3. (3)

    Use tf. keras. Sequential to define the model.

  4. (4)

    Add optimizer and loss function. Apply the tf. keras. Model. compile method to configure the training steps.Use tf. keras. optimizers. Adam with default parameters and loss function.

  5. (5)

    Use tf. keras. callbacks. Model Checkpoint to ensure that checkpoints are saved during training.

3.2 Fuzzing Module

Fuzzing is the core part of the entire vulnerability mining system. Before running the system, get the device id of the Android device. After installing adb under windows, use a data cable to connect the Android device to the PC. Set the Android device connection mode to “USB MIDI”, and enter the “adb devices” command to get the device id of the currently connected device.Take WPS as the test object for fuzzing. The test process is shown in Fig. 3.

Fig. 3.
figure 3

Fuzzing implementation process.

  1. (1)

    Call the adb_connection_int() method to initialize the connection. Restart the adb server, connect to the Android device and clear its background according to the WPS package name “cn.wps.moffice_eng” to minimize the interference of other factors in the subsequent testing process.

  2. (2)

    Enter “http://192.168.189.1:1337/” in any browser to open the visualization page, select the fuzzing test method on this page, and click the “Start” button to start the test.

  3. (3)

    The background receives the information from the front endand generates the corresponding PDF test case according to the fuzzing method selected by the user. Call the pdf _fuzz() method to start the fuzzing process. Run the WPS application after unlocking the screen of the device, then open the test file and collect all kinds of information feedback from the application during the running process. Execute “adb shell am force-stop cn. wps. moffice_eng” to stop the application.Wait for a while of time before the next fuzzing operation to prevent problems caused by the long-time load operation of the equipment.

3.3 Automatic Analysis Module

The automatic analysis process filters the log information collected during the fuzzing process.Due to the influence of many human factors and uncontrollable factors such as equipment, server, operating environment, etc., Fuzzing technology has the possibility of false alarms, that is, the abnormal information thrown maybe just some bugs, which cannot be called vulnerabilities. Therefore, the automatic analysis function is added to the system. The specific process is shown in Fig. 4.

Fig. 4.
figure 4

The implementation process of automatic analysis module.

Use the “adb logcat -d” command to view the corresponding log information, call “subprocess. Popen()” to run the command as a subprocess and get a return value, which is the log information. Use a loop to determine whether there are key signals about vulnerabilities predefined in the setting file in the log information, as shown in Table 1. If it exists, save this piece of log information and the test case that caused the log information in the specified folder. Finally, use the adb command “adb logcat -c” to clear the old logs and enter the next test process.

Table 1. Linux abnormal signal comparison table.

4 Experiment and Evaluation

4.1 Experimental Environment

The equipment used in this system includes a PC device and an Android device. The system of the PC device is win10 system, and the IP address is 192.168.189.1. The system of the Android device is Android 4.The mobile office applications tested are WPS Office and UC browser, In addition, Adobe Reader and Chrome browsers are selected as test comparisons.The applications are downloaded from regular channels.

4.2 Experimental Results

Use the system to test different mobile office applications, and the results are shown in Table 2:

Table 2. Mobile office application test results.

4.3 Evaluation

Among the three test case generation methods, the mutation-based method has the least amount of calculation and the fastest generation speed, while the Char-RNN based method has the largest amount of calculation and the slowest generation speed. On the effectiveness of test cases, the method based on generation is the best, the method based on char RNN is the second, and the method based on variation is the worst. The overall test speed of the same type of application is similar. Compared with the PDF Reader, the browser is more likely to be attacked in DOM parsing [9].

Enter the crash folder to view the recorded log file, as shown in Fig. 5.

Fig. 5.
figure 5

View log files.

Check the log files of all the vulnerabilities and find that they all contain the “SIGSEGV” keyword, and all appear “Fatal signal 11 (SIGSEGV) at 0x0000413d (code = -6), thread 16718 (CrRenderer Main)” type of crash, indicating that the problem of null pointer triggers the vulnerability and then causes the application to crash. The backtrace file in the log records the specific information when the application crashes, and the result is shown in Fig. 6. It can be seen from the figure that there is a problem with the so file, that is, an overflow of the static data area of the application.

Fig. 6.
figure 6

View back trace.

5 Conclusion

Currently, the vulnerability of office software under the Android platform has security risks. In response to this problem, this paper designs and implements a domestic office software vulnerability mining system based on Fuzzing technology, analyzes the vulnerabilities that may cause it to crash, generates a large number of test cases, and conducts vulnerability mining through the method of fuzzing. The experimental results show the feasibility of the designed system, which can provide support for developers to improve the application program and improve the completeness of the application program.

The system designed in this paper has certain limitations. It can only detect specific vulnerabilities in specific types of applications, that is, memory vulnerabilities in mobile office software. It is not yet possible to conduct comprehensive vulnerability detection on all Android applications. More in-depth research is needed in the future.