1 Introduction

This case study involves a monitor application system developed for public services and administrative businesses. As an information service grows in scale, the increased cloud traffic creates more load for the systems. In this case study, the current implementation provides information services via a single server that hosts multiple application systems. System maintenance personnel do not notice occasional application system instability and abnormal abortions until after users experience difficulties and report the problem. This is caused by the lack of an effective automated monitoring and management mechanism for the application system services and server resources. Locating and analyzing the causes of system abnormalities is often time-consuming, and providing timely support is unlikely when human resources are limited. Therefore, it is a major task to enhance the service quality of the information systems. Quality of service is both a major independent variable and a dependent variable [1,2,3].

Academicians studying networks have published and proposed various methods to improve server performance, including dynamic load balancing for cloud servers, a client-side server selection algorithm, load-balanced front-end and back-end cloud servers, solutions based on resource type for cloud services, as well as a pre-fetching method to reduce cloud latency [4,5,6,7]. Performance monitoring and management of the application systems enables system maintenance personnel to enhance system operations via a better understanding of usage dynamics and better employment of software and hardware system resources. Therefore, to increase the stability and availability of the application systems, the following goals can be achieved by developing and implementing a performance monitoring and management mechanism [8,9,10,11]:

  1. 1.

    Effective monitoring and management of the performance of critical operations of the application systems.

  2. 2.

    Increased efficiency in problem diagnosis via an integrated analysis and management platform to centrally manage and analyze possible causes for abnormalities.

  3. 3.

    Establishment of an early warning mechanism that allows preventive management of the situation to ensure uninterrupted operation and maintain the service quality of critical application systems.

  4. 4.

    Understanding of resource utilization and trends, which provide the basis on which proper measures for improvement can be taken.

2 Literature review

2.1 Java

Java was first released by SUN Microsystems, Inc. as a programming language with a runtime environment that allows it to be executed on multiple hardware platforms. Over time, other companies perfected the Java runtime environment. Java can be used to write either Java applets, which are embedded in a cloud page and are executed via a browser, or Java applications [8] which run independently. The Java architecture is divided into language syntax, the runtime environment and the application programming interface (API). The Android Gradle plugin (v3.0.0 and higher) supports all Java 7 language features and a subset of Java 8 language features that vary by platform version. When building an app using Android Gradle 4.0.0 and higher, a number of Java 8 language APIs can be used without requiring a minimum API level for the app [12,13,14, 16].

This page describes the Java 8 language features that can be used, how to properly configure the project to use them, and any known issues that may be encountered. This paper provides a link to a video for an overview. Note that when developing apps for Android, using Java 8 language features is optional. The project's source and target compatibility values can be set to Java 7, but the app must still be compiled using JDK 8. Gradle provides built-in support for using certain Java 8 language features and the third-party libraries that use them. As shown in Fig. 1, the default toolchain implements the new language features by performing bytecode transformations (e.g., “desugar”) as part of the D8/R8 compilation of class files into dex cod [17,18,19,20].

Fig. 1
figure 1

J2EE 3-tier architecture

2.2 J2EE architecture

Java 2 Platform Enterprise Edition (J2EE) is a Java platform jointly specified by Sun Microsystems and IBM with support from many other companies. It is a standard that provides a multi-tiered, distributed application model and features reusable components, XML-based data exchange, a unified security model and flexible transaction control. It allows developers to deliver innovative customer solutions to market faster than ever, but the platform-independent J2EE component-based solutions are not tied to the products and APIs of any vendor. In other words, J2EE delivers high productivity, rapid development, high quality and easier maintenance in system development [5, 9, 21,22,23].

As shown in Fig. 1, the J2EE runtime architecture is a 3-tiered model defining the information exchange between the tiers: client, middle tier and the enterprise information system (EIS) tier. Using Tomcat as an example, the client tier is a cloud-based client such as a browser or mobile device. The client sends a request for a JavaServer Page (JSP) page to the middle tier. When the JSP executes in the middle tier, it either requests database information from the EIS tier or responds directly to the client’s request. The middle tier differs from the EIS tier in that the middle tier hosts business logic while the EIS tier is the data repository [10, 24,25,26].

2.3 Performance monitoring

The technologies used for performance monitoring can be categorized into two types: hardware-based and software-based (see Table 1) [6].

Table 1 Technologies for performance monitoring
Table 2 Table 2

Hardware-based monitoring relies on the support of hardware devices to measure hardware events. An example would be a register (i.e., performance counter) on the processor designed specifically to accumulate the number of cache misses [27].

Software-based monitoring, on the other hand, records and measures events generated during software execution and can be supported by either software or hardware. An example would be monitoring the number of occurrences of context switching in the operating system. Frequently applied monitoring technologies include timing, counting/sampling and tracking. Timing and counting technology records only the duration and frequency of a particular type of event, while tracking technology records the time of the occurrence and relevant data for each event, which in turn can be used to derive the duration and frequency of the event occurrence [28, 29,30,31].

The following paragraphs give detailed explanations regarding software failure and HTTP response status codes:

  • Software failure

When software failure is detected by the system administrator through monitoring, it is expected to have a serious impact on all application systems tasks. In terms of the whole software lifecycle, software failures that are discovered late cost more to fix. From a software engineering perspective, high quality has to be maintained at each stage of the software development process: from software design, coding and testing to installation and maintenance. Software failures fall under three categories: system failure, system error and system fault [11].

  • HTTP response status codes

In a cloud-based application environment, the user relies on a browser to create a dynamic cloud page containing hyperlinks according to requested connections. Thus, the performance monitoring must consider the HTTP response status of the application systems. The HTTP response status codes are described in Table 2 [15, 32,33,34].

3 Monitor application systems architecture and abnormalities detection methods

3.1 Monitor application systems architecture

The organization in our case study provides information services around the clock, so any system that supports it must be active, timely, secure, highly efficient and fully automatic. The studied organization has subordinate units that are dispersed around Taiwan and provides application systems services to a variety of users (firms, people and other business-related organizations) who may be scattered across long distances. For the purpose of easy deployment, maintenance and upgrading of the application systems, the studied organization has specified that the system to be developed must be cloud-based with a 3-tiered, component-based architecture. Figure 2 shows the proposed J2EE-based 3-tiered application systems development environment architecture [35,36,37]. The system construction shown in Fig. 2 uses a 3-tiered application systems monitor architecture. The first tier is presentation, e.g., a notebook, desktop, handset or personal digital assistance (PDA). The second tier is business logic and is comprised of the application server and an operating system such as Windows or Unix. The third tier is the data tier, e.g., a database that uses Oracle or MS SQL. The presentation tier uses HTTP to connect to the application server, and the application server uses JDBC to access data from Oracle database [38,39,40].

Fig. 2
figure 2

3-tiered application systems monitor architecture

The case study currently runs its information applications on UNIX and MS Windows NT Server, with major application systems and cloud servers running on top of the UNIX server. Cloud services are provided by Java programs (compiled into binary code) running on the Tomcat application system.

To ensure easy deployment, maintenance and upgrading of the application systems or cloud services, as well as to follow the cloud-based, 3-tiered, component-based development architecture, the case under study requires the use of the following development tools on UNIX operating systems: Oracle 11i for database management, and a Java-based integrated development environment (IDE) supporting EJB components development and supporting the J2EE development platform [41].

3.2 Abnormality checking methods and problem analysis

As noted in the introduction, the case under study is an application system developed for public service and administrative businesses. As the service grows in scale, increased cloud traffic creates more load for the information systems. In the current implementation, a single server hosts multiple application systems that provide information services. Without an effective means of monitoring application services and server resources, system maintenance personnel may not be able to detect system instability or failure before users notify them of the problem. Locating and analyzing the causes for such abnormalities costs considerable time and effort, and limited human resources make the provision of timely support unlikely. Thus, enhancing the service quality of the information systems becomes a critical task.

3.2.1 Abnormality checking methods

System maintenance personnel spend a lot of time manually checking the application systems to see if they are functioning normally. Most of the time, system maintenance personnel are notified by the users when application systems abnormalities occur, instead of proactively identifying problems in advance.

When a system abnormality does occur, system maintenance personnel use the following steps to check for the initial problem:

  • Check to see if the application systems server is up and running.

  • Observe the resource utilization or processes running on the application systems server.

  • Determine whether the J2EE application platform (Tomcat 5.X) is running properly on the application systems server.

  • Determine whether the application systems are running properly on the J2EE application platform (Tomcat 5.X).

  • Check to see if the database server connected to the application systems is up and running.

  • Check to see if the database management system is running properly on the database server.

  • Observe the resource utilization or processes running on the database server.

  • Check for proper functioning of related network devices such as DNS and NDS.

System maintenance personnel follow the above manual checking procedures to sort out the problems in each operation environment related to the application systems. Quite some time must still be spent following standard operating procedures, and there may simply not be enough time to do so. Meanwhile, application system services abnormalities are hard to prevent when they are caused by system resource depletion. Thus, given limited resources, maintaining the normal operation of the provided application services is often seemingly beyond the capacity of system maintenance personnel. Finding a correct path to tune the system for the desired resource allocation and to adjust the operation architecture for better performance usually takes time, and the trial and error method is relatively inefficient.

4 Problem analysis

The problems we found for the organization under study after a thorough analysis are shown in Table 3. Data were collected from the organization’s information systems maintenance personnel and focused on the following four aspects: the time when an application systems abnormality occurs, the preliminary check for initial problems when an abnormality occurs, problem prevention and understanding the resource utilization.

Table 3 Problems in system performance checking and management

5 Performance monitoring and management design

5.1 System design

The performance monitoring and management system is designed as an integrated management platform allowing system maintenance personnel to quickly carry out monitoring and management tasks via automation. The architecture of the management platform shown in Fig. 3 comprises three functional components: the server resource performance monitor, the application systems monitor and the abnormality notification system.

Fig. 3
figure 3

Architecture for monitoring management platform

The installed management platform will monitor the application system, collect and analyze data, give early warnings and notify personnel regarding abnormalities to ensure the system is available to provide normal service.

5.2 Design of server performance monitoring mechanism

5.2.1 Monitoring items

  1. 1.

    Server aliveness monitoring

To monitor the availability of the server IP address.

  1. 2.

    Aliveness monitoring of the J2EE application platform services on the application systems server

To monitor the normal operation of the J2EE application platform TCP port services.

  1. 3.

    Server resource monitoring

CPU loading, memory usage, network traffic.

  1. 4.

    HTTP connection data monitoring on application systems server

Collect HTTP connection data on the application systems server in order to understand the utilization of the application systems.

  1. 5.

    SQL connection data monitoring on the database server

Collect SQL connection data on the database server in order to understand the database utilization.

6 Operation design

The server aliveness monitor is designed to send a series of ping commands (which serves the purpose of timing and counting) from the monitoring system to each monitored server at five-minute intervals. Each series contains five consecutive ping commands separated by 0.1 s. Server abnormality is assumed if there is no response to all five pings, in which case the monitoring system sends an email and messages to notify relevant system administrators and maintenance personnel.

The aliveness monitoring for the J2EE application platform services is designed to execute the nmap utility program built into the monitoring system to query each monitored server for service port aliveness (via the counter monitoring method). A service port in its normal LISTEN state will respond with a message, indicating that it is open. A closed message indicates that the service is not in the normal LISTEN state. In this case, the monitoring system sends email and messages to notify relevant system maintenance personnel.

Figure 4 shows the design of the above-described aliveness monitoring of the server and the J2EE application platform services, respectively.

Fig. 4
figure 4

Monitoring aliveness of server and J2EE application platform services

The monitoring mechanism for server resource efficiency, application systems server HTTP connections and database server SQL connections is designed to track via the monitoring packages installed on the monitored server. Server resources are reported to the data collection platform, which collects all data regarding server resource utilization. To view the resource utilization and trends, system maintenance personnel can chart the cloud traffic, using the collected monitoring data.

Figure 5 is a schematic diagram showing how the monitoring items of the monitored server analyze the traffic data (via the tracking method) which the redtop tool uses to generate the cloud traffic chart that is presented on the browser.

Fig. 5
figure 5

Design server traffic monitoring

System maintenance personnel may select a monitored server and examine its cloud traffic chart to understand the service condition. The cloud traffic chart contains the current value, average value and maximum value that can be shown by day, week, month and year. Every five minutes, the data collection program requests monitoring data from each monitored server and sends the collected data to programs that generate the cloud traffic chart or service condition chart, which are displayed on a browser. System maintenance personnel can read the cloud traffic chart of the collected monitoring data and understand the resource utilization and trends.

6.1 Design of monitor systems mechanism

6.1.1 Monitoring items

Normal operation of the J2EE application system service platform does not guarantee normal operation of the application systems, so any application systems running on the J2EE application platform must be monitored. The method used for monitoring is to check if the pages can be displayed on the user’s browser.

6.1.2 Operation design

The cloud-based application systems monitor issues connection requests to specified cloud-based application systems (the page server), and by emulating a browser to retrieve the respective HTML text strings from those systems. The monitoring system then locates specific keywords to prove that the application system has responded properly to the browser request. This design requires the identification of all the information that must be checked in the cloud pages of all application systems. The information includes:

  • Cloud page source code.

  • Cloud page URL.

  • Keywords for checking.

  • Cloud page compiled code.

Figure 6 shows the design for the checking of cloud-based application systems. The monitoring server first requests page information from the application system (the page server) being monitored, and then analyzes the pages that are returned. The detailed checking steps are as follows.

Fig. 6
figure 6

Architecture of the cloud-based application systems monitor

For items to be checked for one application system;

  1. (1)

    Connect to the specified application system’s cloud page.

  2. (2)

    Retrieve the corresponding HTML text string returned from the application system’s page.

  3. (3)

    Check the text string for specified keywords. Finding the keywords indicates that the page server is operating normally. If keywords are missing, there may be an application systems abnormality. The source codes returned by the application systems are written as a record in the check failure log.

  4. (4)

    Repeat steps (1) through (3) until all specified application systems are checked.

6.2 Design of abnormality notification mechanism

Alerts are one of the functions of monitoring for the aliveness of the application systems server, database server, J2EE service platform and the application system itself. Threshold values are set for each monitored resource item. In case of a system abnormality or of a value above the warning threshold, system maintenance personnel will be promptly notified via email or message. Upon notification, system maintenance personnel will follow standard operation procedures to respond to the abnormality, either to restore the system and resume operations or to take preventive measures in advance, in the shortest amount of time, to ensure normal system services and adequate system availability.

Figure 7 is a schematic diagram showing the system monitoring and abnormality notification design. When the monitoring server detects that the monitored server group has exceeded the abnormality warning threshold, it sends emails and messages via sendmail and the message dispatching program to the specified email box and mobile phone of the pre-assigned system administrator and maintenance personnel.

Fig. 7
figure 7

System monitoring abnormality notification

7 System architecture and implementation

7.1 Management platform setup

The performance monitoring and management system relies on an integrated management platform to allow system maintenance personnel to quickly navigate and select functions and perform management tasks. After entering the account number and password, the user is shown the cloud-based performance monitoring and management platform screen.

The cloud-based management platform allows system maintenance personnel to carry out resource monitoring and management tasks from a central system instead of having to log in to a variety of disparate systems, thus quickly grasping the operational status of the information system.

7.2 Server resource performance monitoring mechanism

The server aliveness monitor is designed to send a series of ping commands (which serves the purpose of timing and counting) to each monitored server, at five-minute intervals. Each series contains five consecutive ping commands separated by 0.1 s. Server abnormality is assumed if there is no response to all five ping requests, in which case the monitoring system sends email and messages to notify relevant system administrators and maintenance personnel.

The aliveness monitoring for J2EE application platform services is designed to execute the nmap utility program built into the monitoring system to query each monitored server for service port aliveness (via the counter monitoring method). A service port in its normal LISTEN state will respond with a message indicating that it is open. A closed message indicates that the service is not in the normal LISTEN state, in which case the monitoring system sends email and messages to notify relevant system maintenance personnel.

To monitor device resource utilization and HTTP connection traffic for application systems, and SQL connections for database systems, the user clicks on the system menu and navigates to the screen shown in Fig. 8. On the left side of the screen is a list of monitored devices, with our studied application systems server and database server marked by a red frame. The system maintenance personnel may select the desired device by day, week, month or year to display the device resource utilization and cloud traffic charts for the application systems’ HTTP connections and the database SQL connections. This allows them to better understand traffic trends.

Fig. 8
figure 8

Selecting the server to monitor for resources/traffic

7.3 Implement monitor application systems

The cloud-based application monitoring system issues connection requests to the specified cloud-based application system (the page server), and emulates a browser to retrieve the corresponding page’s HTML text string. Specific keywords are located to prove that the application system is responding properly to browser requests. For example, the information for checking the call-for-service system is specified by bringing up the operation screen to enter information for the application systems cloud pages: the cloud page source code and URL, and keywords for checking and page’s compiled code. When an application systems abnormality causes improper operation, the specified cloud page will not appear. The message notification management menu allows administrators to set up personnel to be notified. In the setup screen, the email addresses and mobile phone numbers are entered for personnel to be notified of system abnormalities or warnings via an email or message.

8 Experiments and result analysis

8.1 Data sources

From February, 2019 to April, 2019, the monitoring and management system collected data from the monitored application systems and related servers such as the cloud-based application systems server and its corresponding ORACLE DBMS server. The monitoring data on server resources, cloud-based application systems server http connections and database server SQL connections were collected every 5 min. A total of 25,800 samples were collected. These were used to generate the trend chart and statistics for application system abnormality frequency, the time required for the system maintenance personnel to be notified of the abnormality (since the occurrence) and the time required for the notified system maintenance personnel to locate the preliminary problems since the start of the incident. The collected data were compared with the data for 2020.

8.2 Experiment design

8.2.1 Ways of identifying service abortion

We compared the 2020 data with that collected between February and April, 2019. Table 4 shows the analysis results. After the performance monitoring and management system was installed, the percentage of application systems service interruption events (during office hours, alone) of which the system maintenance personnel were aware rose from 18.8% to 60%. The percentage of noticed service interruptions during non-office hours rose from 4.7% to 20%. Users notified maintenance personnel of only 20% of all events. However, the automatic notification portion of the performance monitoring mechanism did not make much difference in terms of timeliness.

Table 4 Ways of system maintenance of monitor application systems abnormalities

Based on the data in Table 5, the performance monitoring and management system allows the system maintenance personnel to be aware of application system service interruptions during office hours 10 min earlier than before, when no monitoring systems were in place. The edge gained is almost 8 h for system interruptions that occur during non-office hours, thus avoiding situations in which the system would be down for dozens of hours before being noticed.

Table 5 Comparison of time required system services abnormalities

8.2.2 Preliminary locating of the abnormality

After the performance monitoring and management system is installed and put into operation, it checks the application systems at regular five-minute intervals to see if they are alive. If there is no response, it sends out email and messages to notify system maintenance personnel.

If the application system services responds slowly, system maintenance personnel can identify the problem and determine the cause by analyzing the utilization of system resources, with help from the cloud traffic chart. This is based on the email notification that resources are being overused beyond the preset warning level or via message notification regarding abnormalities. Table 6 shows how system maintenance personnel can recognize and make a primary diagnosis of the problem within 0.5 to 5 min of the incident, regardless of whether the issue occurs during office hours or non-office hours. The automated or approximately synchronized checking mechanism identifies system abnormalities and problems such as server resource overloading, J2EE application platform overloading or database management system overloading. During office hours, the performance monitoring and management systems gives system maintenance personnel a 23-min edge (on average) to be aware of the application systems abnormality and the preliminary problems. The edge is 8 h during non-office hours.

Table 6 Comparison of time required for system maintenance to notify services abnormalities

8.3 Resource utilization and performance analysis

When an abnormality occurs or a threshold is exceeded, the email or message notification allows system maintenance personnel to grasp the current state of operations of the application systems and environments in order to take proper measures. In addition to preventing the overall paralysis of the application systems environment, the monitoring system helps reduce user complaints. During the period of study, the average number of user complaints was 1.67 per month after deployment of the performance monitoring and management system (2020), a decrease of 76.2% from the 7.01 per month recorded in 2019. This is an effective increase in service availability.

8.3.1 Resource utilization and performance analysis

Once the performance monitoring and management system is set up, it is possible to monitor traffic within the operation environment related to the application systems, such as resource usage and traffic on the server:

  • CPU loading.

  • Memory usage.

  • Network traffic.

The monitored traffic is shown in Fig. 9.

Fig. 9
figure 9

Monitored traffic

The performance monitoring and management systems allow system maintenance personnel to obtain continuous monitoring data. Compared to the fragmented system resource utilization data collected prior to the installation of the monitoring system, the continuous datasets allow for easier understanding of the operation of the application systems environment and enable the identification of time-consuming services to improve the performance of the application systems. Graphics-based data and trend analysis allows for easier and faster understanding of problems, and facilitates the quick implementation of improvements that can increase performance.

8.4 Discussion

During office hours, the monitoring system allows system maintenance personnel to be aware of application system service abnormality incidents 10 min earlier than before when no monitoring systems existed. The edge gained is almost 8 h when system abnormalities occur during non-office hours. Application systems service interruptions can be minimized by the timely management of problems. Again, during office hours, the monitoring system allows system maintenance personnel to preliminarily locate system abnormality problems 23 min earlier than before when no monitoring systems were in place. This model can help with problem diagnosis, and saves time for problem management. The average number of system abnormalities was 1.67 per month after deployment of the performance monitoring and management system, i.e., 76.2% lower than the 7.01 per month recorded in 2019. Thus, service availability has been effectively raised. The performance monitoring and management system allows system maintenance personnel to easily identify time-consuming application system services via graphics-based data and trend analysis. This helps the staff to grasp the problems and devise improvement measures. Even this model has limitations: while the performance monitoring system does not eliminate system abnormalities, it provides greater opportunities for issue minimization and improvements to performance.

A well-implemented performance monitoring and management system for application systems can improve the level of service performance in non-quantifiable aspects as well:

  1. 1.

    Information operation aspect

  • Quality of application systems:

The application systems can be fully grasped in terms of system operations and resource usage. Based on the evidence provided by the monitoring system, poorly performing systems can be identified and improvements can be required in order to stabilize operations.

  • Operation cost

Application system problems can be reduced, and limited computing resources can be optimally allocated and managed to lower the operational costs of information management.

  1. 2.

    Customer satisfaction aspect:

  • Satisfaction

The application systems are operationally under control, and the whole system works smoothly to meet customer requirements and to boost customer satisfaction.

  • Requirement for timeliness

Increased system availability can be achieved through effective prevention of system abnormalities, better meeting customer requirements for timeliness in system usage and rapid response.

  • Customer relationship

Increased system availability and better response times reduce user complaints and enhance the relationship with the customer.

9 Conclusion

This research proposed and developed a cloud-based monitor system using Java to run on the J2EE platform. We built a performance analysis and monitoring mechanism. This paper has following contributions:

  1. 1.

    The building of a cloud-based monitor system that has an integrated performance analysis and monitoring mechanism, as well as an active diagnosis and maintenance system that notifies system maintenance personnel of abnormal operations and deviations from specified performance parameters.

  2. 2.

    We showed how system availability can be increased by effectively lowering the rate of abnormal operation incidents. Compiled from our data, the company report on system resource usage enables the optimal allocation of limited computing resource.

  3. 3.

    The monitor system ensures high quality and lowers the operational cost of providing information service, enhancing the relationship with the customer. The average number of system abnormalities was reduced by 76.2%, effectively increasing service availability.