1 Introduction

A string of breakthroughs in artificial intelligence has placed AI in increasingly visible positions in society, heralding its emergence as a viable, practical, and revolutionary technology. In recent years, we have witnessed IBM’s Watson win first place in the American quiz show Jeopardy! and Google’s AlphaGo beat the Go world champion, and in the very near future, self-driving cars are expected to become a common sight on every street. Such promising developments spur optimism for an exciting future produced by the integration of AI technology and human creativity.

AI technology has grown remarkably over the past decade. Countries around the world have invested heavily in AI technology research and development. Major corporations are also applying AI technology to social problem solving; notably, IBM is actively working on their Science for Social Good initiative. The initiative will build on the success of the company’s noted AI program, Watson, which has helped address healthcare, education, and environmental challenges since its development. One particularly successful project used machine learning models to better understand the spread of the Zika virus. Using complex data, the team developed a predictive model that identified which primate species should be targeted for Zika virus surveillance and management. The results of the project are now leading new testing in the field to help prevent the spread of the disease [1].

On the other hand, investments in technology are generally mostly used for industrial and service growth, while investments for positive social impact appear to be relatively small and passive. This passive attitude seems to reflect the influence of a given nation’s politics and policies rather than the absence of technology.

For example, in 2017, only 4.2% of the total budget of the Korean government’s R&D of ICT (Information and Communication Technology) was used for social problem solving, but this investment will be increased to 45% within the next five years as the improvement of Korean people’s livelihoods and social problems are selected as important issues by the present government [2]. In addition, new categories within ICT, including AI, are required as a key means of improving quality of life and achieving population growth in this country.

In this letter, I introduce research on the informatics platform for social problem solving, specifically based on spatio-temporal data, conducted by Hanyang University and cooperating institutions. This research ultimately intends to develop informatics and convergent scientific methodologies that can explain, predict and deal with diverse social problems through a transdisciplinary convergence of social sciences, data science and AI. The research focuses on social problems that involve spatio-temporal information, and applies social scientific approaches and data-analytic methods on a pilot basis to explore basic research issues and the validity of the approaches. Furthermore, (1) open-source informatics using convergent-scientific methodology and models, and (2) the spatio-temporal data sets that are to be acquired in the midst of exploring social problems for potential resolution are developed.

In order to examine the applicability of the models and informatics platform in addressing a variety of social problems in the public as well as in private sectors, the following social problems are identified and chosen:

  1. 1.

    Analysis of individual characteristics with suicidal impulse

  2. 2.

    Study on the mobility of the disabled using GPS data

  3. 3.

    Visualization of the distribution of anxiety using Social Network Services

  4. 4.

    Big data-based analysis of noise environment and exploration of technical and institutional solutions for its improvement

  5. 5.

    Analysis of the response governance regarding the Middle Eastern Respiratory Syndrome (MERS)

The research issues in the above social problems are explored, and the validity of the convergent-scientific methodologies are tested. The feasibility for the potential resolution of the problems are also examined. The relevant data and information are stored in a knowledge base (KB), and at the same time research methods that are used in data extraction, collection, analysis and visualization are also developed. Furthermore, the KB and the method database are merged into an open informatics platform in order to be used in various research projects, business activities, and policy debates.

2 Pilot Research and Studies on Social Problem Solving

2.1 Analysis of Individual Characteristics with Suicidal Impulse

While suicide rates in OECD countries are declining, only South Korea has increasing suicide rates; moreover, Korea currently has the highest suicide rate among OECD countries as shown in Fig. 1. Its high suicide rate is one of Korea’s biggest social problems, entailing the establishment of effective suicide prevention measures by understanding the causes of suicide. The goals of the research are to: (1) understand suicidal impulse by analyzing the characteristics of members of society according to suicidal impulse experience; (2) predict the likelihood of attempting suicide and analyzing the spatio-temporal quality of life; and (3) to establish a policy to help prevent suicide.

Fig. 1.
figure 1

2013Y suicide rate by OECD countries

The Korean Social Survey and Survey of Youth Health Status Data are used for the analysis of suicide risk groups through data mining techniques, using a predictive model based on cell propagation to overcome the limitations of existing statistics methods such as characterization or classification. In the case of the characterization technique, results indicate that there are too many features related to suicide, and that there are variables including many categorical values, making it difficult to identify the variables that affect suicide. On the other hand, the classification technique had difficulties identifying the variables that affect suicide because the number of members attempting suicide was too small.

Correlations between suicide impulses and individual attributes of members of society and the trends of the correlations by year are obtained. The concepts of support, confidence and density are introduced to identify risk groups of suicide attempts, and computational performance problems caused by excessive numbers of risk groups are solved by applying a convex growing method.

The 2014Y social survey including personal and household information of members of the society are used for analysis. The attributes include gender, age, education, marital status, level of satisfaction, disability status, occupation status, housing, and household income.

The high-risk suicide cluster was identified using a small number of convexes. A convex is a set of cells, with one cell being the smallest unit of the cluster for the analysis, and a density is the ratio of the number of non-empty cells to the total number of cells in convex C [3].

Figure 2 shows that the highest suicidal risk group C1 is composed of members with low income and education level. It was identified that level of satisfaction with life has the highest impact on suicidal impulse, followed in order of impact by disability, marital status, housing, household income, occupation status, gender, age and level of education. The results showed that women and young people tend to have more suicidal impulse.

Fig. 2.
figure 2

Suicide risk groups represented by household income and level of education

New prediction models with other machine learning methods and the establishment of mitigation policies are still in development. Subjective analyses of change of well-being, social exclusion, and characteristics of spatio-temporal analysis will also be explored in the future.

2.2 Study on the Mobility of the Disabled Using GPS Data

Mobility rights are closely related to quality of life as a part of social rights. Therefore social efforts are needed to guarantee mobility rights to both the physically and mentally disabled. The goal of the study is to suggest a policy for the extension of mobility rights of the disabled. In order to achieve this, travel patterns and socio-demographic characteristics of the physically impaired with low levels of mobility are studied. The study focused on individuals with physical impairments as the initial test group as a means to eventually gain insight into the mobility of the wider disabled population. Conventional studies on mobility measurement obtained data from travel diaries, interviews, and questionnaire surveys. A few studies used geo-location tracking GPS data.

GPS data is collected via mobile device and used to analyze the mobility patterns (distance, speed, frequency of outings) by using regression analysis, and to search for methods to extend mobility. A new metrics for mobility with a new indicator (travel range) was developed, and the way mobility impacts the quality of life of the disabled has been verified [4].

About 100 people with physical disabilities participated and collected more than 100,000 geo-location data over a month using an open mobile application called traccar. Their trajectories are visualized based on the GPS data as shown in Fig. 3.

Fig. 3.
figure 3

Visualization of trajectory of disabled using geo-location data

The use of location data explained mobility status better than the conventional questionnaire survey method. The questionnaire surveyed mainly the frequency of outings over a certain period and number of complaints about these outings. GPS data enabled researchers to conduct empirical observations on distance and range of travel. It was found that the disabled preferred bus routes that visit diverse locations over the shortest route. Age and monthly income are negatively associated with a disabled individual’s mobility.

Based on the research results, the following has been suggested: (1) development of new bus routes for the disabled and (2) recommendation of a new location for the current welfare center that would enable a greater range of travel. Further study on travel patterns by using indoor positioning technology and CCTV image data will be deployed.

2.3 Visualization of the Distribution of Anxiety Using Social Network Services

Many social issues including political polarization, competition in private education, increases in suicide rate, youth unemployment, low birth rate, and hate crime have anxiety as their background. The increase of social anxiety can intensify competition and conflict, which can interfere with social solidarity and cause a decrease in social trust.

Existing social science research mainly focused on grasping public opinion through questionnaires, and ignored the role of emotions. The Internet and social media were used to access emotional traits since they provide a platform not only for the active exchange of information, but also for the sharing and diffusion of emotional responses. If such emotional responses on the internet and geo-locations can be captured in real-time through machine learning, their spatio-temporal distribution could be visualized in order to observe their current status and changes by geographical region.

A visualization system was built to map the regional and temporal distribution of anxiety psychology by combining spatio-temporal information using SNS (Twitter) with sentiment analysis. A Twitter message collecting crawler was also developed to build a dictionary and tweet corpus. Based on these, an automatic classification system of anxiety-related messages was developed for the first time in Korea by applying machine learning to visualize the nationwide distribution of anxiety (See Fig. 4) [5].

Fig. 4.
figure 4

Process of Twitter message classification

An average of 5,500 tweets with place_id are collected using Open API Twitter4j. To date, about 820,000 units of data have been collected. A Naïve Bayes Classifier was used for anxiety identification. An accuracy of 84.8% was obtained by using 1,750 and 70,830 anxiety and non-anxiety tweets as training data respectively, and 585 and 23,610 anxiety and non-anxiety tweets as testing data, respectively.

The system indicated the existence of regional disparities in anxiety emotions. It was found that Twitter users who reside in politicized regions have a lower degree of disclosure about their residing areas. This can be interpreted as the act of avoiding situations where the individual and the political position of the region coincide.

As anxiety is not a permanent characteristic of an individual, it can change depending on the time and situation, making it difficult to measure by questionnaire survey at any given time. The Twitter-based system can compensate for the limitations of such a survey method because it can continuously classify accumulated tweet text data and provide a temporal visualization of anxiety distribution at a given time within a desired visual scale (by ward, city, province and nationwide) as shown in Fig. 5.

Fig. 5.
figure 5

Regional distribution of anxiety in Korean society and visualization by geo-scale

2.4 Big Data-Based Analysis of Noise Environment and Exploration of Technical and Institutional Solutions for Its Improvement

Environmental issues are a major social concern in our age, and interest has been increasing not only in the consequences of pollution but also in the effects of general environmental aesthetics on quality of life. There is much active effort to improve the visual environment, but not nearly as much interest has been given to improve the auditory environment. Until now, policies on the auditory environment have remained passive countermeasures to simply quantified acoustic qualities (e.g., volume in dB) in specific places such as construction sites, railroads, highways, and residential areas. They lack a comprehensive study of contextual correlations, such as the physical properties of sound, the environmental factors in time and space, and the human emotional response of noise perception.

The goal of this study is to provide a cognitive-based, human-friendly solution to improve noise problems. In order to achieve this, the study aimed to (1) develop a tool for collecting sound data and converting into a sound database, and (2) build spatio-temporal features and a management platform for indoor and outdoor noise sources.

First, pilot experiments were conducted to predict the indicators that measure emotional reactions by developing a handheld device application for data collection.

Three separate free-walking experiments and in-depth interviews were conducted with 78 subjects at international airport lobbies and outdoor environments.

Through the experiment, the behavior patterns of the subjects in various acoustic environments were analyzed, and indicators of emotional reactions were identified. It was determined that the psychological state and the personal environment of the subject are important indicators of the perception of the auditory environment. In order to take into account both the psychological state of the subject and the physical properties of the external sound stimulus, an omnidirectional microphone is used to record the entire acoustic environment.

118 subjects with smartphones with the built-in application walked for an hour in downtown Seoul for data collection. On the app, after entering the prerequisite information, subjects pressed ‘Good’ or ‘Bad’ whenever they heard a sound that caught their attention. Pressing the button would record the sound for 15 s, and subjects were additionally asked to answer a series of questions about the physical characteristics of the specific location and the characteristics of the auditory environment. During the one-hour experiment, about 600 sound environment reports were accumulated, with one subject reporting the sound characteristics from an average of 5 different places.

Unlike previous studies, the subjects’ paths were not pre-determined, and the position, sound and emotional response of the subject are collected simultaneously. The paths can be displayed to analyze the relations of the soundscapes to the paths (Fig. 6).

Fig. 6.
figure 6

Subject’s paths and marks for sound types

The study helped to build a positive auditory environment for specific places, to provide policy data for noise regulation and positive auditory environments, to identify the contexts and areas that are alienated from the auditory environment, and to extend the social meaning of “noise” within the study of sound.

2.5 Analysis of the Response Governance Regarding the Middle Eastern Respiratory Syndrome (MERS)

The development and spread of new infectious diseases are increasing due to the expansion of international exchange. As can be seen from the MERS outbreak in Korea in 2015, epidemics have profound social and economic impacts. It is imperative to establish an effective shelter and rapid response system (RRS) for infectious diseases control.

The goal of the study is to compare the official response system with the actual response system in order to understand the institutional mechanism of the epidemic response system, and to find effective policy alternatives through the collaboration of policy scholars and data scientists.

Web-based newspaper articles were analyzed to compare the official crisis response system designed to operate in outbreaks to the actual crisis response. An automatic news article crawling tool was developed, and 53,415 MERS-related articles were collected, clustered and stored in the database (Fig. 7).

Fig. 7.
figure 7

Automatic news article collection & classification system

In order to manage and search for news articles related to MERS from the article database, a curation tool was developed. This tool is able to extract information into triplet graphs (subjects/verbs/objects) from the articles by applying natural language processing techniques. A basic dictionary for the analysis of the infectious disease response system was created based on the extracted triplet information. The information extracted by the curation tool is massive and complex, which limits the ability to correctly understand and interpret information.

A tool for visualizing information at a specific time with a network graph was developed and utilized to facilitate analysis and visualization of the networks (Fig. 8). All tools are integrated into a single platform to maximize the efficiency of the process.

Fig. 8.
figure 8

Visualization of graph network by specific time

As for the official crisis response manual in case of an infectious disease, social network analysis indicated that while the National Security Bureau (NSB) and Public Health Centers play as large a role as the Center for Disease Control (CDC) in crisis management, the analysis of the news articles showed that the NSB was in fact rarely mentioned. It was found that the CDC and Central Disaster Response Headquarters, the official government organizations that deal with infectious diseases, as well as the Central MERS Management Countermeasures & Support Headquarters, a temporarily established organization, were not playing an important role in response to the MERS outbreak. On the other hand, the Ministry of Health and Welfare, medical institutions, and local governments all have played a central role in responding to MERS. This means that the structure and characteristics of the Command & Control and communication in the official response system seems to have a decisive influence on the cooperative response in a real crisis response. These results provided concrete information on the role of each respondent and the communication system that previous studies based on interviews and surveys have not found.

Much research based on machine learning has been criticized for giving more importance on method itself from the start rather than focusing on data reliability.

This study is based on a KB in which policy researchers manually analyze news articles and prepare basic data by tagging them. This way, it provides a basis for improving the reliability of results when executing text mining work through machine learning.

By using text mining techniques and social network analysis, it is possible to get a comprehensive view of social problems such as the occurrence of infectious diseases by examining the structure and characteristics of the response system from a holistic perspective of the entire system.

With the results of this study, new policies for infectious disease control are suggested in the following directions: (1) Strengthen cooperation networks in early response systems of infectious diseases; (2) Develop new, effective and efficient management plans of cooperative networks; and (3) Create new research to cover other diseases such as avian influenza and SARS [6].

3 Convergent Approaches and Open Informatics Platform

An ever-present obstacle in the traditional social sciences when addressing social issues are the difficulties of obtaining evidences from massive data for hypothesis and theory verification. Data science and AI can ease such difficulties and support social science by discovering hidden contexts and new patterns of social phenomena via low-cost analyses of large data. On the other hand, knowledge and patterns derived by machine learning from a large data set with noise often lack validity. Although data-driven inductive methods are effective for finding patterns and correlations, there is a clear limitation to discovering causal relationships.

Social science can help data science and AI by interpreting social phenomena through humanistic literacy and social-scientific thought to verify theoretical validity, and identifying causal relationships through deductive and qualitative approaches. This is why we need convergent-scientific approaches for social problem solving. Convergent approaches offer the new possibility of building an informatics platform that can interpret, predict and solve various social problems through the combination of social science and data science.

In all 5 pilot studies, the convergent-scientific approaches are found valid and sound. Most of the research agendas involved the real-time collection and development of spatio-temporal databases in a real-time manner, and analytic visualization of the results. Such visualization promises new possibilities in data interpretation. The data sets and tools for data collection, analysis and visualization are integrated onto an informatics platform so that they can be used in future research projects and policy debates.

The research was the first transdisciplinary attempt to converge social sciences and data sciences in Korea. This approach will offer a breakthrough in predicting, preventing and addressing future social problems. The research methodology, as a trailblazer, will offer new ground for a research field of a transdisciplinary nature converging data sciences, AI and social sciences. The data, information, knowledge, methodologies, and techniques will all be combined onto an open informatics platform. The platform will be maintained on an open-source basis so that it can be used as a hub for various academic research projects, business activities, and policy debates (See Fig. 9). The Open Informatics Platform is planned to be expanded to incorporate citizen sensing, in which people’s observations and views are shared via mobile devices and Internet services in the future [7].

Fig. 9.
figure 9

Structure of informatics platform

4 Conclusions

In the area of social problem solving, fundamental problems have complex political, social and economic aspects that have their roots in human nature. Both technical and social approaches are essential for tackling social problem solving. In fact, it is the integrated, orchestrated marriage between the two that would bring us closer to effective social problem management.

We need to first study and carefully define the indicators specific to a given social problem or domain. There are many qualitative indicators that cannot be directly and explicitly measured such as social emotions, basic human needs and rights, and life fulfillment [8].

If the results of machine learning are difficult to measure or include combinations of results that are difficult to define, that particular social problem may not be suitable for machine learning. Therefore, there is a need for new social methods and algorithms that can accurately collect and identify the measurable indicators from opinions of social demanders. Recently, MIT has developed a device to quantitatively measure social signals. The small, lightweight wearable device contains sensors that record the people’s behaviors (physical activity, gestures, and the amount of variation in speech prosody, etc.) [9].

Machine learning technologies working on already existing data sets are relatively inexpensive compared to conventional million-dollar social programs since machine learning tools can be easily extended. However, they can introduce bias and errors depending on the data content used to train machine learning models or can also be misinterpreted. Human experts are always needed to recognize and correct erroneous outputs and interpretations in order to prevent prejudices [10].

In the development of AI applications, a great amount of time and resources are required to sort, identify and refine data to provide massive data for training. For instance, machine learning models need to learn millions of photos to recognize specific animals or faces, but human intelligence is able to recognize visual cues by looking at only a few photos. Perhaps it is time to develop a new AI framework which can infer and recognize objects based on small amounts of data, such as Transfer Learning [11], generate lacking data (GAN), or integrate traditional AI technologies, such as symbolic AI and statistical machine learning into new frameworks.

Machine learning is excellent in predicting, but many social problem solutions do not depend on predictions. The organic ways solutions to specific problems actually unfold according to new policies and programs can be more practical and worth studying than building a cure-all machine learning algorithm. While the evolution of AI is progressing at a stunning rate, there are still challenges to solving social problems. Further research on the integration of social science and AI is required.

A world in which artificial intelligence actually makes policy decisions is still hard to imagine. Considering the current limitations and capabilities of AI, AI should primarily be used as a decision aid.