On the Need to Understand Human Behavior to Do Analytics of Behavior

Meyer, Joachim

doi:10.1007/978-3-031-39101-9_3

Joachim Meyer⁴

Part of the book series: Knowledge and Space ((KNAS,volume 19))

1373 Accesses

Abstract

Artificial Intelligence and data science are rapidly gaining importance as parts of decision support systems. As these systems improve, it becomes necessary to clarify humans’ roles in the decision-making processes. Humans may not be able to improve on the choices a good algorithm makes, they may not be able to adjust the parameters of the algorithm correctly, and their role in processes that use good algorithms may be limited. However, this does not mean human involvement in data-supported decision processes is unnecessary. A closer look at the analytical process reveals that each step entails human decisions, beginning with the data preparation through the choice of algorithms, the iterative analyses, and the display and interpretation of results. These decisions may affect the following steps in the process and may alter the resulting conclusions. Furthermore, the data for the analyses often result from recordings of human actions that do not necessarily reflect the actual recorded events. Data for certain events may often not be recorded, requiring a “big-data analysis of non-existing data.” Thus, adequate use of data-based decisions requires modeling relevant human behavior to understand the decision domains and available data to prevent possible systematic biases in the resulting decisions.

You have full access to this open access chapter, Download chapter PDF

Understanding Behaviors in Different Domains: The Role of Machine Learning Techniques and Network Science

Decision Intelligence Analytics: Making Decisions Through Data Pattern and Segmented Analytics

Tutorial: Lessons Learned for Behavior Analysts from Data Scientists

Article 25 May 2023

Keywords

In our current “age of data”, Artificial Intelligence (AI), Machine Learning (ML), Data Science (DS), and analytics are becoming part of problem-solving and decision-making in many areas, ranging from recommendations for movies and music to medical diagnostics, the detection of cybercrime, investment decisions or the evaluation of military intelligence (e.g., McAfee & Brynjolfsson, 2012). These methods can be used because an abundance of information is collected and made available. Also, the tools for analyzing such information are becoming widely accessible, and their use has become easier with platforms such as BigML. While in the past, statisticians or data scientists were in charge of the analytics process, now anybody with some basic computing skills can conduct analyses with R or Python, using open-source tools and libraries.

These developments are the basis for new insights and understanding social and physical settings. They also alter the decision processes used by organizations and the information that is available to individuals. As such, they affect reality, its representation in digital records and the media, and the ways people interpret this reality and act in it. The dynamic interaction between the physical, digital, and social realms shapes current societies. Understanding and modeling it is a major challenge for both data science and the social sciences.

Data analytics, and the information one can gain from them, can be used in decision-making processes, in which they help to choose among possible alternatives. Algorithmic decisions can be advantageous in legal contexts, such as bail decisions (Kleinberg, Lakkaraju, Leskovec, Ludwig, & Mullainathan, 2018). In medical settings, the development of personalized evidence-based medicine for diagnostic or treatment decisions (Kent, Steyerberg, & van Klaveren, 2018) depends on analyzing electronic medical records with data science tools. AI-based analyses in medicine can indeed improve diagnostic or therapeutic decisions (Puaschunder, Mantl, & Plank, 2020). Similarly, algorithms in financial markets, implemented as algorithmic advisors or in algorithmic trading, can provide clear benefits (Tao, Su, Xiao, Dai, & Khalid, 2021).

Together with the large potential benefits for decision-making one can derive from data science, there are also potential dangers. For instance, in medicine, clinical decision support systems can exacerbate the problem of alarm fatigue by generating numerous alarms that have limited clinical importance, or they can have a negative effect on physicians’ or nurses’ skills if the medical staff learns to rely on the support and does not practice independent decision-making (Sutton et al., 2020). In financial markets, algorithmic decision-making can also be problematic, causing possible systematic anomalies, such as flash crashes (Min & Borch, 2022).

Decision Quality and Data

The desire to improve decision-making is often the rationale for information collection and for making this information available. A major premise in research on decision-making is that the quality of a decision depends on the quality of the information on which it is based (Raghunathan, 1999). Ideally, information should provide the decision-maker with as accurate a picture as possible of the expected results from choosing one rather than another course of action, given the conditions in which the decision is made, the developments over time that will occur, and any other factors that need to be considered. This will depend on the properties of the available information and on the decision-maker’s understanding of the causal processes that determine outcomes.

While data science was mainly developed in organizational contexts, such as business administration, transportation, or medicine, the notion exists that data can also be used by individual citizens or households. Access to data can help them, for instance, decide on investments based on the analysis of relevant economic variables. Data can also help in choosing a neighborhood where one wants to live, depending on information about the education system, crime levels, scores of individual happiness, or other relevant variables. This view, together with the ease of collecting and making data available, led to the idea that citizens should have access to data to use it to make informed decisions (e.g., Marras, Manca, Boratto, Fenu, & Laniado, 2018).

If one takes the notion that the quality of the data determines the quality of the decisions to an extreme, one could argue that appropriate analyses of the data make decision-making unnecessary. The results of the analysis point clearly to the alternative that should be chosen. This is indeed implemented, to some extent, in contexts in which algorithms make most decisions, such as algorithmic trading in financial or other markets (Virgilio, 2019).

The optimistic view of the value of data is not limited to decision support. The claim has been made that with the emergence of data science, the availability of large volumes of data, and the development of very efficient algorithms to analyze the data, there will be an end of theory (Anderson, 2008). One does not need theories anymore to explain phenomena, but rather one can simply look at the data to understand a phenomenon. Some observers may consider this as a step forward from the conundrum that is caused by the multiple theories of social phenomena that often have relatively limited predictive value and the replication crisis that plagues, for instance, psychology (Jack, Crivelli, & Wheatley, 2018). So far, however, this expectation has not received any support.

The availability of data may support a better understanding of the world that can be used for policy, organizational or individual decision-making. It can also be formalized in scientific generalizations regarding social phenomena. These developments may provide major opportunities for technological, economic, social, or intellectual progress. However, some caution may be warranted when considering these possible developments, and specifically, the hope that algorithms can help people make better decisions.

In the following sections, I will first show that automating decision-making has great potential. However, human involvement in the decision processes may be difficult to implement or may, at times, be practically impossible. This does not mean that there is no need for human involvement. I will argue that human involvement is crucial for understanding the processes that create the data that are input for the analyses and generate the results.

The Human Role in Decision-Making When an Intelligent System Is Involved in the Process

Any analytics-based decision support an organization wants to implement needs to be integrated into the decision processes the organization (or an individual) uses. Specifically, the organization must decide on the appropriate use of the information from the decision support. To what extent should decision-makers (such as physicians who need to make diagnostic or treatment decisions) rely on the information an algorithm provides, and when can they override it? For the decision support to be useful, it needs to be good, that is, the quality of the recommendations should be similar to or better than decisions made by people without the support. There are indeed decision support systems that reach such a level of performance, for instance, in the AI-based detection of early-stage breast cancer (McKinney et al., 2020). However, when introducing decision support, it is unclear how humans should be involved in the decisions. Three forms of human involvement in decisions turn out to be problematic.

First, it is often suggested that the AI output should serve as support for the human decision-maker, a notion captured by the term decision support. When decisions are relatively clear, such as the decision whether a lump is a malignant tumor or not, the output of the decision support can replace the human decision-maker if the decision support is better than the human. It is problematic to assume that we can simply provide decision-makers with the output of the decision support, and they will be able to integrate it correctly into their decision. To do so, they must assign appropriate weights to the information they have and the additional information the decision support provides. Empirical research on people’s ability to use decision support shows consistently that people often assign nonoptimal weights to information from different sources. They tend to give too little weight to better information sources and may assign excessive weight to bad information sources (Meyer, Wiczorek, & Günzler, 2014). Also, when the human and the automation differ in their ability to perform the detection task, it is very difficult to improve the performance beyond that of the better of the two acting alone (Meyer & Kuchar, 2021).

Second, it is also unrealistic to assume that people can adjust the parameters of the automation to make it work better. Here, too, empirical research has shown that people often set incorrect system parameters, especially if they don’t get the optimal information for setting the parameters (Botzer, Meyer, Bak, & Parmet, 2010). Furthermore, the number of observations needed to determine the correct setting of a system parameter is often so large that it is simply impossible for a person to collect sufficient information to determine the setting (Meyer & Sheridan, 2017). Thus, one can either specify rules on how parameters should be adjusted (which can then be easily automatized), or one can use fixed parameter settings. In both cases, human involvement is unnecessary.

Third, another widely held approach is used, for instance, in the discussion of autonomous lethal weapon systems or the protection of citizens from algorithmic decisions, as required by Article 22 in the EU General Data Protection Regulation (Roig, 2017). This demand may also be unrealistic. A system that is better than the human decision-maker in a decision task will lower the human involvement in the task and the human responsibility for outcomes (Douer & Meyer, 2020, 2021). Consequently, it may seem that humans have no actual role in the decisions once there are good AI-based algorithms that can support the decisions making.

The development of processes that rely on algorithms without human involvement may not be bad. Meehl already showed in 1954 that statistical predictions (namely predictions based on statistical tools, such as linear models) are better than clinical predictions, the predictions made by human experts (Meehl, 1954). This conclusion has been consistently replicated (Dawes, Faust, & Meehl, 1989; Grove & Lloyd, 2006). Furthermore, there may be an inherent tendency to avoid information from algorithms, which may lead to the nonoptimal use of algorithmic decision support (Dietvorst, Simmons, & Massey, 2015). Thus, algorithmic decisions are potentially better than human decisions, even if high-quality algorithmic decision support is available to human decision-makers.

The Analytics Process as a Human Activity

A simplistic view sees the data science as a way to reach insights and to make decisions that are as objective, evidence-based, and “mathematically correct” as may be possible. However, a closer look at the process by which results are obtained reveals that matters are more complicated. In fact, any analytics process involves a sequence of choices and decisions made by people throughout the process (see Fig. 3.1 for a schematic depiction of the process). Some choices may simply be based on the analyst’s intuition or habit, may follow a default option, or use a convention in the field. In contrast, some decisions may result from carefully weighing the advantages and disadvantages of different courses of action based on systematic analyses and understanding of the specific problem.

A flow diagram of the data science process. World, locate records, data, data selection, selected data, data preprocessing, data for analyses, prepare analyses, algorithm, define output, analysis output, interpret results, and conclusions. — **Fig. 3.1**

Decisions are made at all points at which there are arrows in the figure. At each point, the person performing this part of the analytics process (who may differ from the people who perform other parts) must select one of a number of possible alternatives. It is important to analyze the selections because they may strongly affect the results obtained in the analyses. So far, this issue has gained relatively little attention. However, studies did show that different groups of data scientists may reach very different conclusions when analyzing the same data set.

Any analytics process that is related to decision-making begins with some questions the process is intended to answer. The posing of the questions results, of course, from decisions. The process itself begins with creating the data set that will be analyzed. First, relevant records need to be located. Data sources can be, for instance, patients’ electronic medical records, court records, recordings in a call center, and so forth. An important part of the creative use of data science is coming up with possible sources of data that can be analyzed. The raw data must be adapted to serve as input for the analyses. It is necessary to select the specific data that will be analyzed. This includes definitions of the variables and the temporal and geographic limits of the data to be analyzed, thas is, data from how many years back or from what locations does one want to analyze? If data is collected over a large area, should one analyze all subregions or focus on specific regions? An analyst may, for instance, choose to ignore more rural parts and focus on cities. Certain subpopulations may also be excluded from analyses. For example, in Israel, ultraorthodox Jewish neighborhoods are very different in many respects from other neighborhoods. For instance, the use of smartphones is limited, and web browsing can be socially sanctioned. Consequently, their inclusion in some analyses may create biased results.

The raw data is combined into files that can be analyzed. These data then undergo processes of data preprocessing, where it is cleaned, duplicate records and outliers are identified and possibly deleted, and so on. The definition of outlier values is in itself a decision the analyst needs to make. Some values are clearly faulty (a parent who is less than 10 years old, according to the birthdate on record), but others are less clearly outliers. Is spending 60% of one’s income on restaurants a legitimate value or an error in the data?

After preprocessing the data, one must prepare the analysis by choosing the specific algorithm to use. One then actually runs the algorithm. This, too, requires choices, such as the definition of parameters. Every algorithmic tool is sensitive to certain properties of the data and less sensitive to others. Each tool is more likely to reveal certain phenomena and less likely to reveal others. Hence, the choice of the tool and the parameters are likely to influence the results.

For instance, in one study, 29 teams of data scientists received the same data set, aiming to test the hypothesis that soccer referees give more red cards to players with darker skin color than to players with lighter skin (Silberzahn et al., 2018). The groups used 21 unique covariate combinations in the analyses. About two-thirds of the group showed a significant effect in the expected direction, while one-third did not. Thus, the choice of the analytical method is by no means determined by the data and the research question.

The next step is defining the output of the algorithm, which can be presented in numerous ways, and the analyst must decide which one to use (Eisler & Meyer, 2020). The different presentation modes will make different types of results more or less salient. This may depend on the particular aspects the analyst, or those who requested the analysis, consider important and want to emphasize. For instance, the presentation of results by executives depends to some extent on the quality of a company’s business results. When business results are not very good, there may be a tendency to use more elaborate graphics (Tractinsky & Meyer, 1999).

At the end of the process, one reaches the interpretation of the results and the drawing of conclusions from them. Different people may focus on different aspects of the results, depending on the individual’s preferences or preinclination, tendencies, interests, and so on. One should also remember that only in academic or research settings are analytics purely done for analytics sake. Beyond research, analytics serve some purpose. Someone wants to make a decision, such as a clinical decision in medicine or a policy decision regarding municipal, regional, or countrywide policies, or maybe a business decision in a company, and so on.

Thus, data science and data-based AI are complex processes, with decisions at numerous points along the way. All these decisions involve stakeholders, and the choices will depend to some extent on factors such as the beliefs, preferences, or the costs and benefits of the people involved in this process. The decisions determine and affect the course of the analytics process. They will affect what can be analyzed, the questions that can be asked, the tools that are used, and the insights that can be gained. It is of great importance to understand these decisions to create awareness of their possible impact on the outcome of the analytics process.

In a large-scale study (Botvinik-Nezer et al., 2020), 70 teams of scientists analyzed the same functional magnetic resonance imaging data set. No two teams chose the same workflow for the analyses, leading to large variability in the results the different groups reached. The results of a study on microeconomic causal modeling are similar. Different teams of analysts went through different analytics processes for the same data set, resulting from many decisions each team made that were not made explicit (Huntington-Klein et al., 2021).

This is not an argument against the use of data science in decision-making. Data science can definitely provide valuable new tools and methods to support decision-making. However, data-science-based decision-making is not without problems. Very often, the people who do data science come from a computer science or mathematics background. This does not necessarily prepare them for critical analyses of the analytics process. The decision support is then often evaluated in terms of the elegance of mathematical solutions or algorithms or the quantitative evaluation of algorithm output, compared to some benchmark, in measures such as precision and recall, the area under the curve (AUC), the F1 score, and so forth (Padilla, Netto, & da Silva, 2020).

The output of an algorithm needs to be compared to some accepted measure of the reality it is supposed to reflect, what is often referred to as the “ground truth.” For instance, an algorithm that is supposed to predict complications in medical treatments needs to be run on data for which the occurrence of complications is known. The extent to which the algorithm correctly predicts which patients will experience complications indicates the quality of the predictions. The evaluation of algorithm output with statistical tests of its match to some “ground truth” creates the impression that the process is objective. However, seen from a somewhat critical perspective, data science is a human activity that is concerned with human actions. It is necessary to understand this activity to make adequate use of these methods for decision-making and the understanding of phenomena.

Considering the Data Generation Process

Human behavior and decisions not only affect the analytics process in which data serve as input and conclusions are derived from the series of analytical steps. The data generation process itself is not a simple recording of events that occurred. This process creates the traces that eventually will become the analyzed data. This by itself often reflects human activities.

Any individual can observe only certain, very limited parts of reality. Information about other parts can be conveyed by other people (lore, tradition, teachings, gossip, social networks, news, etc.). Computers, and our digital age, create an additional level of complexity. One could argue that it is only a quantitative change from the past to the present, in which more information about events that are not directly observable is now available. However, there is also a qualitative change. We receive much information about the world (be it the physical or the social world) from digital representations.

This information, in turn, may affect our actions in the physical or social worlds (a navigation aid that guides cars may create congestion in certain places). A recommender system that informs us about a certain venue may affect our behavior and the subsequent physical reality.

The digital representation itself is not simply a partial reflection of reality. It also reflects the decisions and behaviors of the people who were involved in the collection of information and its recording. These decisions can be direct actions that affect the occurrence of recorded events, decisions regarding the recording (e.g., what is recorded), and decisions regarding the recorded data (categories, etc.).

A data-science-based decision process aims to base decisions on data, and the data provide a glimpse at the reality that will serve as the basis for the decisions. The analysis of the data is supposed to provide insights into this reality. The approach to reality can be seen as the interplay between three realms (see Fig. 3.2). There is an individual who observes the physical world and interacts with it. Parts of this physical world are other people, so interactions are also happening in a social context. Both the physical and the social realms may leave digital traces in the form of records of activities conducted in organizational settings, social media posts, recordings from sensors that are positioned in the environment (e.g., cameras) or carried on the person, such as the person’s cellphone that allows the recording of locations and communication activities. The output from the digital realm may affect social interactions and, to some extent, can even affect the physical reality, for instance, through responses to traffic advisory systems that direct vehicles according to traffic measurements.

A cycle diagram depicts the interaction of the individual with, 1, physical world, 2, social context, and 3, digital representation. — **Fig. 3.2**

Individuals interact with all three realms. They act in the physical world, for instance, by purchasing certain goods or moving to a different location or by performing some physical activity. This is often done in close interaction with other people, such as family, neighbors, friends, colleagues, service providers, or people who have some other encounter, relation, or interaction with the person. These interactions are facilitated by digital means and create digital traces.

The records of an individual’s social interactions are becoming part of a digital representation of reality. These traces, in turn, will be the basis for data sets that can serve as input for analyses we may want to conduct to gain an understanding of reality. The data sets may contain records of the individual’s behavior or properties of the physical world or properties of the social context or properties of the interactions between individuals or between individuals and the social or physical realms. So we have a complex dynamic interplay between physical entities, social relations and interactions, and digital representations. To understand these multifaceted phenomena, combining qualitative and quantitative research approaches is often necessary. This is in line with the proposed combination of methods in the study of social networks (Glückler & Panitz, 2021), in which qualitative and quantitative methods are jointly used to study processes and properties of social interactions.

Big Data of Nonexisting Data

In this digital representation, we expect to find data that can be used to guide the decision-making process for which we do the data analysis. We expect the data to contain information that can improve decisions. However, we must keep in mind that digital representations reflect only a very limited part of the reality of the physical world, individual behaviors, or social interactions because only some physical events or social interactions are recorded.

Typical examples we have for nonrecorded data are, for instance, the survivorship bias, where data are only collected on events that pass some selection process. For instance, Abraham Wald conducted airplane survival analyses as part of the Statistical Research Group (SRG) at Columbia University during World War II. The placement of protection on planes should be in places in which few (returning) planes had been hit because apparently, planes that were hit in these places, such as the engine or the cockpit, did not make it back to the airfield (Mangel & Samaniego, 1984). A similar story is told about the introduction of steel helmets in the British army in World War I. Supposedly there was a demand to stop using steel helmets because after issuing them, the number of head injuries increased greatly. The reason was that soldiers with the traditional, nonsteel head gear, when being hit by shrapnel in their head, were highly likely to be killed, and the number of injured was smaller. With the steel helmet, previously fatal injuries were not fatal anymore, so people ended up in the hospital. Simple analyses of these data may have led to misleading conclusions, such as that steel helmets make head injuries more likely.

Also, often knowledge of the physical realm that is not represented in the data is necessary. For instance, Twitter activities can be used as an indication of the strength of a storm. Such an analysis was applied to assess the effects Hurricane Sandy had on New York City when it hit in 2012 (Shelton, Poorthuis, Graham, & Zook, 2014). This was the strongest hurricane that had hit the New York City area in recorded history. There is indeed a strong correlation between Twitter activities and the strength of a storm, but there were very few Twitter activities in the areas in which the storm was the strongest. Two causes can explain the nonmonotonic relation between Twitter activities and storm strength. Both are related to the physical realm. One is that, very often, people flee an area after they receive a hurricane warning and are told to evacuate a certain area, so they will not tweet anymore from this area. A second reason may be that storms tend to topple cellular towers. So even if people remained in the area, they may not have been able to communicate, causing a decrease in communication activity in these areas.

These are examples of nonexisting data of existing events that result from a biased or partial recording of data. They are due to the physical properties of the data collection process or of the events that generate the data in physical reality. However, the selectivity of the data does not only depend on the external statistics of the physical properties of the world. It may also result from specific human actions that may create a somewhat partial view of reality. For instance, a study of credit card data in a country in which there was social unrest showed that the effect of the localized unrest (which mainly involved large demonstrations in specific locations in a metropolitan area) diminished with distance from the demonstrations, as expressed in the number of purchases and the amounts of money spent on a purchase (Dong, Meyer, Shmueli, Bozkaya, & Pentland, 2018). This effect was not the same for all parts of this society. Some groups of the population showed a greater change than others. However, when interpreting these results, we need to keep in mind that we have only partial data on the economic activities in this country during this era of unrest because we only have credit card data. People in this country also use cash, and the decrease in credit card purchases may only reveal part of the picture.

Another factor that affects the digital records of behavior that can be analyzed is the fact that some behaviors will be more easily recorded while others are less so. For instance, on social media, socially desirable and high-prestige behavior will appear more often in posts than less desirable behavior. Viewers, consequently, may feel that others are more engaged in these positively valued behaviors than they themselves (Chou & Edge, 2012). Also, the digital image of the world that may emerge from scraping social media data will present a biased view, possibly overrepresenting the behaviors people like to post about on the web. Any decisions made based on these data, for instance, concerning the public investment in different facilities for leisure activities or the development of product lines for after hours, may be biased and may be misled by people’s tendency to post about some things and not post about others.

Another example of the partial representation of the physical or social reality in data is demonstrated in Omer Miran’s master’s thesis (Miran, 2018). The study dealt with the analysis of policing activity in the UK, as expressed in the data the UK police uploaded to their website.^{Footnote 1} Making police data openly available allows the public to monitor police activities. It also provides the basis for the assessment of the risk of crime in different areas. This can, for instance, help individuals in their decisions about where to live, rent or buy an apartment, and raise their kids.

The study aimed to determine the relative frequency of different types of crimes in different parts of the UK, where each part was defined by the specific police station that oversaw an area. The analysis combined information from the “crime cases database” for the years 2010–2015, which includes reports of crime incidents and their locations. The most important one is the UK police database, in which all crime events are recorded with relatively rough geographical information. A second database is the database on police stop and search activities for the year 2014, also downloaded from the UK police site. Here, the location at which a person was stopped is also recorded. Two other databases were from the UK Office for National Statistics and included population size and the average weekly for different locations.

The analysis focused on two different types of crime—burglary and drug-related crime. In a burglary, one or more people enter a location (a house, business, etc.) without permission, usually with the intention of committing theft. One can assume that a burglary will almost always be reported to the police and will appear in the records. Therefore, the number of burglary incidents in police records likely reflects the actual frequency of burglaries in an area.

The second type of crime was crimes related to drugs, such as drug deals. In this case, the people involved in the crimes (such as drug deals) will usually not report their occurrence. Consequently, a drug-related crime will usually only appear in the police files if the police make an active effort to detect it. Hence the data on drug-related activities does not really reflect the volume of such activities in an area but rather the police activity in the area.

The analyses of the data showed that there was no correlation between the amount of police activity in an area (as measured through the number of stop and search events in the area) and the number of burglary events (r = −0.047). However, there was a positive correlation between police activity and recorded drug-related crimes (r = 0.180). Thus, the two types of crime data indeed reflect somewhat different types of events, namely the activity of criminals (in the burglary data) and the activity of the police (in the drug-related crime data). These two types of activities can, of course, be correlated or can be related to other variables that characterize the location.

The analysis of the police databases revealed additional clear differences between the picture of reality they provide and the actual reality. In the UK Home Office drug survey for 2013, 2.8% or 280 out of 10,000 adults aged 16 to 59 reported using illicit drugs more than once a month in the last year. Assuming that these people purchased drugs once a month, they were involved in approximately 12 * 280 = 3360 drug deals in a year. In the UK police data set, the yearly average of drug-related crimes per year was about 28.7 per 10,000 people. Clearly, less than 1% of drug deals appear in police data. This demonstrates the large potential gap between the image of the reality that appears in the analysis of data and the actual reality this image is supposed to reflect.

Conclusions

The availability of data can have great value for decision-making. For instance, data-based decisions may lower the effects of biases due to faulty preconceptions or naïve beliefs. Also, many processes, such as controlling large-scale networks or high-frequency trading in financial markets, are only possible with algorithms and must rely on data.

The use of data science and AI in decision-making can often provide valuable information, but the process is not without potential problems. One needs to keep in mind that the data analysis process is a human activity that involves numerous decisions along the way. Each of them impacts the following steps in the process and the eventual outcome. It is important to monitor these decisions and to test the sensitivity of the conclusions to specific changes in the decisions made along the process. Furthermore, the analytics process often concerns human activities. The records they generate depend on the decisions of those who do the recording and, to some extent, the people whose behavior is recorded.

The development of data-based decision-making or support tools requires a combined modeling effort. On the one hand, the usual analytics modeling process needs to proceed, aiming to generate models that can identify the preferable choices in different settings. A model in this context would be the output of the algorithm used for the analytics process, together with information about the quality of the output, compared to some criterion. Often this would result from tests of the model, computed on a training set of data, on a separate, independent data set, the test set. An additional output of the algorithmic process can be information on feature importance, identifying the relative importance of different variables for predicting the outcome variable.

This should be accompanied by a modeling effort that develops more traditional social sciences models based on psychological, sociological, economic, or other disciplines. These models can be used to model the behavior that is related to the analytics process (choices made regarding the questions asked, the selection of the data, the preprocessing of data, the choice of algorithms and their parameters, the presentation of results, the interpretation, the implementation of insights gained). The models can also be related to behaviors that generate the data that is analyzed, as shown in the examples of drug-related crimes or social media posts during emergencies.

Thus, traditional modeling techniques and data science methods should be combined. Such a combination has the potential to better decisions and utilization of data. One can take several steps to achieve this goal. First, data scientists (who often have computer science, mathematics, or engineering backgrounds) should be trained in social sciences. This would give them some critical analytical skills that will allow them to question assumptions behind the analyses and the behaviors that are represented in the data. The data scientists would detach themselves from the mechanistic process of taking input, running analyses, and interpreting the results only in terms of the input variables and the model output, with the feature importance tables and other output data. Analyses of results in view of theories in the social sciences can provide a deeper understanding of phenomena beyond what is possible with a-theoretical analyses.

Also, interdisciplinary teams should analyze, evaluate or implement the results of data science processes that are used in decision-making. The output of these process needs to be critically assessed, and the value of the insights gained through the process needs to be calculated. It is important to determine how the information can actually be implemented in the operation of the organization. This requires the conduct of sensitivity analyses that evaluate the procedures and their robustness.

A critical view of the analytics process and of the implementation of its results is particularly important because data-science-based decision support always depends on the particular data that served as input for the algorithm. Dynamic changes in the data may cause predictions to become less (or sometimes more) precise. The relevance of the data for the decisions may also change with time because options become more available or less expensive or because new alternatives arise.

We need to combine traditional social science methods, such as methods in economics, political science, geography, sociology, and psychology, with the methods used in analytics and data science. There should be a dynamic interplay between the two approaches to phenomena. The combined use of the two has the potential to create a synergy that can lead to better decision-making processes and better decisions. It can also provide insights into the dynamic shaping of reality, following the use of data science, and the effects human behavior has on the data science process.

Notes

1.
See https://data.police.uk.

References

Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired Magazine, 16(7). Retrieved from https://www.wired.com/2008/06/pb-theory/
Botvinik-Nezer, R., Holzmeister, F., Camerer, C. F., Dreber, A., Huber, J., Johannesson, M., … Schonberg, T. (2020). Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582, 84–88. https://doi.org/10.1038/s41586-020-2314-9
Botzer, A., Meyer, J., Bak, P., & Parmet, Y. (2010). User settings of cue thresholds for binary categorization decisions. Journal of Experimental Psychology. Applied, 16(1), 1–15. https://doi.org/10.1037/a0018758
Chou, H.-T. G., & Edge, N. (2012). “They are happier and having better lives than I am”: The impact of using Facebook on perceptions of others’ lives. Cyberpsychology, Behavior and Social Networking, 15(2), 117–121. https://doi.org/10.1089/cyber.2011.0324
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243(4899), 1668–1674. https://doi.org/10.1126/science.2648573
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033
Dong, X., Meyer, J., Shmueli, E., Bozkaya, B., & Pentland, A. (2018). Methods for quantifying effects of social unrest using credit card transaction data. EPJ Data Science, 7, 8. https://doi.org/10.1140/epjds/s13688-018-0136-x
Douer, N., & Meyer, J. (2020). The responsibility quantification model of human interaction with automation. IEEE Transactions on Automation Science and Engineering, 17(2), 1044–1060. https://doi.org/10.1109/TASE.2020.2965466
Douer, N., & Meyer, J. (2021). Theoretical, measured, and subjective responsibility in aided decision making. ACM Transactions on Interactive Intelligent Systems, 11(1), 5. https://doi.org/10.1145/3425732
Eisler, S., & Meyer, J. (2020). Visual analytics and human involvement in machine learning. arXiv, 2005.06057v1. https://doi.org/10.48550/arxiv.2005.06057
Glückler, J., & Panitz, R. (2021). Unleashing the potential of relational research: A meta-analysis of network studies in human geography. Progress in Human Geography, 45(6), 1531–1557. https://doi.org/10.1177/03091325211002916
Grove, W. M., & Lloyd, M. (2006). Meehl’s contribution to clinical versus statistical prediction. Journal of Abnormal Psychology, 115(2), 192–194. https://doi.org/10.1037/0021-843X.115.2.192
Huntington-Klein, N., Arenas, A., Beam, E., Bertoni, M., Bloem, J. R., Burli, P., Chen, N., Grieco, P., Ekpe, G., Pugatch, T., Saavedra, M., & Stopnitzky, Y. (2021). The influence of hidden researcher decisions in applied microeconomics. Economic Inquiry, 59(3), 944–960. https://doi.org/10.1111/ecin.12992
Jack, R. E., Crivelli, C., & Wheatley, T. (2018). Data-driven methods to diversify knowledge of human psychology. Trends in Cognitive Sciences, 22(1), 1–5. https://doi.org/10.1016/j.tics.2017.10.002
Kent, D. M., Steyerberg, E., & van Klaveren, D. (2018). Personalized evidence based medicine: Predictive approaches to heterogeneous treatment effects. BMJ, 363, k4245. https://doi.org/10.1136/bmj.k4245
Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., & Mullainathan, S. (2018). Human decisions and machine predictions. The Quarterly Journal of Economics, 133(1), 237–293. https://doi.org/10.1093/qje/qjx032
Mangel, M., & Samaniego, F. J. (1984). Abraham Wald’s work on aircraft survivability. Journal of the American Statistical Association, 79(386), 259–267. https://doi.org/10.1080/01621459.1984.10478038
Marras, M., Manca, M., Boratto, L., Fenu, G., & Laniado, D. (2018). BarcelonaNow: Empowering citizens with interactive dashboards for urban data exploration. WWW ’18: Companion Proceedings of the The Web Conference 2018, Lyon, 219–222. https://doi.org/10.1145/3184558.3186983
McAfee, A., & Brynjolfsson, E. (2012). Big data: The management revolution. Harvard Business Review, 90(4), 60–68.
Google Scholar
McKinney, S. M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., … Shetty, S. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577, 89–94. https://doi.org/10.1038/s41586-019-1799-6
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis, MN: University of Minnesota Press.
Book Google Scholar
Meyer, J., & Kuchar, J. K. (2021). Maximal benefits and possible detrimental effects of binary decision aids. 2021 IEEE 2nd International Conference on Human-Machine Systems (ICHMS), Magdeburg, 1–6. https://doi.org/10.1109/ICHMS53169.2021.9582632
Meyer, J., & Sheridan, T. B. (2017). The intricacies of user adjustments of alerting thresholds. Human Factors, 59(6), 901–910. https://doi.org/10.1177/0018720817698616
Meyer, J., Wiczorek, R., & Günzler, T. (2014). Measures of reliance and compliance in aided visual scanning. Human Factors, 56(5), 840–849. https://doi.org/10.1177/0018720813512865
Min, B. H., & Borch, C. (2022). Systemic failures and organizational risk management in algorithmic trading: Normal accidents and high reliability in financial markets. Social Studies of Science, 52(2), 277–302. https://doi.org/10.1177/03063127211048515
Miran, O. (2018). On the relation between data and reality: The case of crime data (Unpublished master’s thesis). Tel Aviv University, Department of Industrial Engineering, Tel Aviv, Israel.
Google Scholar
Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A survey on performance metrics for object-detection algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130
Puaschunder, J. M., Mantl, J., & Plank, B. (2020). Medicine of the future: The power of artificial intelligence (AI) and big data in healthcare. RAIS Journal for Social Science, 4(1), 1–8. https://doi.org/10.5281/zenodo.3839002
Raghunathan, S. (1999). Impact of information quality and decision-maker quality on decision quality: A theoretical model and simulation analysis. Decision Support Systems, 26(4), 275–286. https://doi.org/10.1016/S0167-9236(99)00060-3
Roig, A. (2017). Safeguards for the right not to be subject to a decision based solely on automated processing (Article 22 GDPR). European Journal of Law and Technology, 8(3). Retrieved from https://ejlt.org/index.php/ejlt/article/view/570
Shelton, T., Poorthuis, A., Graham, M., & Zook, M. (2014). Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of “big data”. Geoforum, 52, 167–179. https://doi.org/10.1016/j.geoforum.2014.01.006
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337–356. https://doi.org/10.1177/2515245917747646
Sutton, R. T., Pincock, D., Baumgart, D. C., Sadowski, D. C., Fedorak, R. N., & Kroeker, K. I. (2020). An overview of clinical decision support systems: Benefits, risks, and strategies for success. npj Digital Medicine, 3, 17. https://doi.org/10.1038/s41746-020-0221-y
Tao, R., Su, C.-W., Xiao, Y., Dai, K., & Khalid, F. (2021). Robo advisors, algorithmic trading and investment management: Wonders of fourth industrial revolution in financial markets. Technological Forecasting and Social Change, 163, 120421. https://doi.org/10.1016/j.techfore.2020.120421
Tractinsky, N., & Meyer, J. (1999). Chartjunk or goldgraph? Effects of presentation objectives and content desirability on information presentation. MIS Quarterly, 23(3), 397–420. https://doi.org/10.2307/249469
Virgilio, G. P. M. (2019). High-frequency trading: A literature review. Financial Markets and Portfolio Management, 33(2), 183–208. https://doi.org/10.1007/s11408-019-00331-6

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, Tel Aviv University, Tel Aviv, Israel
Joachim Meyer

Authors

Joachim Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joachim Meyer .

Editor information

Editors and Affiliations

Department of Geography, LMU Munich, Munich, Germany
Johannes Glückler
Institute of Management, University of Koblenz, Koblenz, Germany
Robert Panitz

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Meyer, J. (2024). On the Need to Understand Human Behavior to Do Analytics of Behavior. In: Glückler, J., Panitz, R. (eds) Knowledge and Digital Technology. Knowledge and Space, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-031-39101-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-39101-9_3
Published: 25 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39100-2
Online ISBN: 978-3-031-39101-9
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics

On the Need to Understand Human Behavior to Do Analytics of Behavior

Abstract

Similar content being viewed by others

Understanding Behaviors in Different Domains: The Role of Machine Learning Techniques and Network Science

Decision Intelligence Analytics: Making Decisions Through Data Pattern and Segmented Analytics

Tutorial: Lessons Learned for Behavior Analysts from Data Scientists

Keywords

Decision Quality and Data

The Human Role in Decision-Making When an Intelligent System Is Involved in the Process

The Analytics Process as a Human Activity

Considering the Data Generation Process

Big Data of Nonexisting Data

Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

On the Need to Understand Human Behavior to Do Analytics of Behavior

Abstract

Similar content being viewed by others

Understanding Behaviors in Different Domains: The Role of Machine Learning Techniques and Network Science

Decision Intelligence Analytics: Making Decisions Through Data Pattern and Segmented Analytics

Tutorial: Lessons Learned for Behavior Analysts from Data Scientists

Keywords

Decision Quality and Data

The Human Role in Decision-Making When an Intelligent System Is Involved in the Process

The Analytics Process as a Human Activity

Considering the Data Generation Process

Big Data of Nonexisting Data

Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation