1 Motivation for developing a new instrument for collecting information on job tasks

In recent years, the analysis of job tasks has become a field of growing scientific activity. Information on such tasks has been used to analyze various research questions, especially regarding changes in the overall structure of the economy and their implications for persons and firms. The most prominent examples include analyses of the potential importance of technological change for wage development and increasing wage inequality, the role of tasks for identifying jobs that bear a high risk of being transferred abroad or the importance of an occupation’s task composition for the job mobility of its incumbents.

Despite the strong and still growing importance of the task concept for various research questions in economics and the social sciences, the conceptual basis for measuring tasks (or more precisely, the task composition of jobs or occupations) seems to be somewhat underdeveloped. In general, two basic approaches for operationalizing job tasks may be distinguished. First, tasks may be identified with the help of expert judgments. This approach is used in the Dictionary of Occupational Titles (DOT), which is often employed to capture tasks in empirical analyses (Yamaguchi 2012; Poletaev and Robinson 2008; Autor et al. 2003). The downside of this approach is that the DOT only includes occupation level information. When merging this kind of information with micro-data, possible variations in job tasks among incumbents of the same occupation may remain undetected (cf. Autor and Handel 2013). As a consequence, it may not be possible to answer some specific types of research questions (e.g., questions requiring information on the homogeneity or heterogeneity of task profiles within occupations) when using this approach.

The second approach is, to measure different job tasks directly as part of a representative population survey. This is done by Handel (2008, 2007), for example. In our opinion, this latter approach should be the most promising one, because at least in principle it ought to provide task information not only at the level of aggregate occupations but also at the individual level. However, the survey approach is not without potential weaknesses, as well. First and foremost, asking respondents about a highly complex concept such as their job’s task composition is far from trivial. As a result, doing it properly will be very time-consuming. Nevertheless we would argue that the conceptual advantages of the survey approach more than compensate for the effort involved. Since the only German-language instrument currently available (featured in the BIBB/IAB and BIBB/BAuA studies),Footnote 1 did not meet all our requirements, in particular regarding its conceptual foundation and the ad-hoc procedures commonly used for assigning items to its “theoretical” dimensions,Footnote 2 we decided to develop a new instrument to measure job tasks. This instrument was included in the fourth panel wave of the National Educational Panel Study’s (NEPS)Footnote 3 adult stage (Adult Education and Lifelong Learning). Throughout this paper we will describe the instrument’s development and evaluate it using NEPS data.

In the following section, however, we begin by providing a detailed discussion of potential applications for the abovementioned task data in order to underscore the theoretical importance of the concept. In the third section, we outline the conceptual considerations guiding our development efforts. The fourth section features a detailed discussion of our instrument and the steps taken during its development. In section five, we evaluate the instrument using NEPS data to demonstrate that it may indeed be used to generate meaningful task profiles. We will show this comprehensively both on a more aggregate level (major groups of the International Standard Classification of Occupations) and for selected occupations. The final section provides a summary and a discussion of perspectives for future research using this instrument. In addition, the article has a detailed appendix, which includes all questionnaire items in their original German and translated English versions, as well as some basic information on distributions of item subscales from the NEPS survey, which—for reasons of brevity and readability—could not be included in the main part of the paper.

2 Applications for task-data

Arguably the most prominent research question that might be answered using job task information is the one connected to the discussion of wage development and the role that technological change may have played in this process. This line of research was inspired by an empirical observation: In many industrialized countries, most importantly the United States, different educational groups have been experiencing an unequal development of wages, which has lead to increased wage inequality. There are, in fact, various approaches for explaining this development (for an overview of different explanatory approaches, compare e.g. Kierzenkowski and Koske 2012; Lemieux 2008).

One of the early explanations, the original version of the skill biased technological change (henceforth SBTC) hypothesis, attributed these changes to technological development (e.g. Katz and Murphy 1992; Levy and Murnane 1992). It was assumed that technological change will increase the productivity of highly skilled workers more strongly than that of their low skilled counterparts, thereby increasing demand for workers with higher skills. Unless this increasing demand is offset by a larger number of highly skilled workers entering the labor market, this should result in increasing wages for this group of employees. Acemoglu and Autor (2011) call this the “canonical model” of SBTC.

However, soon it was argued that SBTC was not able to explain some more recent developments. Among these were: the diverging employment trends in different types of occupations, with a substantial increase in jobs requiring either a high level or a low level of education, accompanied by a simultaneous decline in the share of jobs with intermediate educational requirements (Acemoglu and Autor 2011: p. 1074). Moreover, the “U-shaped” development of wages, with those at the top end of the distribution gaining most strongly and especially those in the middle losing ground (cf. Autor and Dorn 2013 for recent data), did not fit well with SBTC. The same holds for the unequal development of residual inequality, which rose substantially in the upper half of the wage distribution (90–50 gap) while staying constant or even declining in the lower half (10–50 gap) (Lemieux 2006).

An alternative version of the SBTC argument was developed by Autor et al. (2003, henceforth ALM) and more recently expanded by Autor et al. (2008), Autor and Dorn (2013), and Autor (2013). It shared the central idea of the SBTC-approach—namely that technological progress should be the driving force behind changes in the demand for certain qualifications and, consequently, for the wages paid to the holders of such qualifications. The distinctively new aspect of this alternative approach was the notion that technological change, first and foremost computerization, had differing effects on different types of jobs. This is where job-tasks come into play, since the nature of these effects was assumed to depend on the specific tasks that incumbents of these jobs had to perform and on the degree to which technical means might substitute for human labor in the performance of these tasks (which is why the concept is often referred to as the “Task-Based Approach” or “Task Approach”, henceforth TBA).

Tasks were defined as routine if they could be performed more or less easily by computers or new types of (computerized) production technologies, and were considered nonroutine otherwise. Since routine tasks are disproportionally located in the middle of the occupational hierarchy, the TBA arguedFootnote 4 that this mechanism should result in reduced employment and lower wages for this middle group, whereas employment prospects and wages should improve (or remain stable, at least) in the higher and the lower qualified groups.

Even though the TBA led to a considerable amount of new research, for instance on the impact of technological change on job tasks and skill demand (Lindley 2012; Black and Spitz-Oener 2010; Ikenaga and Kambayashi 2010; Antonczyk et al. 2009; Dustmann et al. 2009; Goos and Manning 2007; Autor et al. 2006; Spitz-Oener 2006), or on the trends in task developments and their effects in different labor markets or educational systems (Fernández-Macías 2012; Goos et al. 2009), it is not undisputed and there are alternative approaches to explaining wage developments since the 1990s.Footnote 5

Some authors argue that specific developments or one-time events can explain or did at least substantially influence the observed development of wages. Thus Lee (1999), for example, suggested that in the United States the falling value of the minimum wage during the 1980s had a major influence on the development of wage inequality in the lower half of the income distribution. Although most authors seem to agree that this should have had some influence, there is disagreement regarding the consequences of this finding for the ALM version of SBTC. Some claim that this position cannot be reconciled with SBTC and thus constitutes a competing explanation (e.g. Card and DiNardo 2002). Autor et al. (2008), on the other hand, while agreeing that the reduced value of the minimum wage played some role in the development of wage inequality in the lower half of the income distribution during the 1980s, nevertheless argue that it should not be considered a proper explanation for the entire phenomenon, especially because it does not provide a rationale for the continuously rising wage inequality in the upper tail of the income distribution (90/50 wage gap).

A different perspective on the development of wage inequality is provided by Goldin and Katz (2008). They argue that the increase in inequality observed since about the late 1970s in fact coincides with another fundamental change: the slowing down of educational expansion (in contrast to earlier decades of the 20th century, which were characterized by both educational expansion and declines in wage inequality). Therefore, according to Goldin and Katz, changes on the supply side, namely a decreasing number of educated workers available, should be considered more important for the rising returns to higher education driving inequality than should be demand-side changes, as implied by SBTC.

Another explanation that refers to a specific development puts the focus of interest on the declining significance of labor unions (DiNardo et al. 1996; Card et al. 2004).Footnote 6 This argument is based on the earlier finding (e.g. Freeman 1980) that unions tend to have an inequality-reducing effect. Showing that such an effect is found mainly among male workers, Card et al. (2004) argue that it emerges because the wages of union members with lower qualifications benefit most strongly from unionization, whereas workers with higher qualifications may even experience lower wages than in non-unionized parts of the labor market. Thus unionization has—at least for male workers—an equalizing effect on wage dispersion, which obviously is reduced once unionization goes down (thus increasing inequality).

Without arguing in favor or against any of these positions, we believe it is obvious that a proper assessment of workplace tasks should be essential in order to test the plausibility of the alternative explanations, particularly the alternative version of SBTC brought forward by ALM.

A related line of research that also depends heavily on the availability of job task information focuses on the offshoreability of jobs, defining offshoreability as the likelihood or the risk of a particular job to be transferred abroad (Jensen and Kletzer 2010, Blinder 2009, 2006; Grossman and Rossi-Hansberg 2008). Gathering proper information on job tasks is crucial in this case as well, since at the core of this concept is the notion that a job is considered “offshoreable” if the tasks performed as part of it should—in principle—allow for it being moved abroad. This holds true regardless of whether such a movement has or has not yet occurred (Blinder and Krueger 2009). Therefore, just like in the ALM-version of the SBTC argument, offshoreability is not directly connected to qualification levels, but instead should be considered a characteristic of a job’s task composition: It is the specific composition of the so-called “impersonal service jobs’ ” tasks, which is responsible for the fact that they may be carried out abroad—even though the qualification levels required to perform them may range from rather low (e.g. in the case of manufacturing workers) to rather high (e.g. scientists). In contrast, it is argued that it is completely impossible to export (rather low-skilled) “personal service jobs” such as that of a taxi driver or day care worker (Blinder 2006). More generally speaking, if a job requires face-to-face interaction with customers or suppliers, involves delivering or transporting products or materials that cannot be transferred electronically (e.g. mail, foodstuffs, or other kinds of tangible goods), or if there is a need for “cultural sensitivity” (as e.g. for writers or newscasters), it is not offshoreable. Yet up until now, there has not been any systematic investigation into what tasks will be most difficult and which will be easiest to offshore (Pflüger et al. 2010).

A third line of research for which gathering task information is essential makes the task composition of jobs itself the center of research interests. This allows for some completely new perspectives on job mobility by assuming that the potential for mobility between jobs will largely depend on how different they are (or more precisely, the extent to which they do not differ) in terms of their task profiles (Yamaguchi 2012; Fedorets and Spitz-Oener 2011; Gathmann and Schönberg 2010; Janßen and Backes-Gellner 2009). Moreover, one might compare the core task requirements of a job—that is, the main activities that workers must accomplish in their work—and then consider the set of formal and informal skills required to carry out these tasks. This line of research has shown that tasks vary substantially within and between occupations, are significantly related to workers’ characteristics, and are robustly predictive of wage differentials both between occupations and among workers in the same occupation.

3 Conceptual considerations

There have been various attempts to adapt the TBA to the German situation. Methodologically, one of the main differences between these attempts and ALM’s own approach is that while the latter identified task characteristics of occupations using expert-based job descriptions from the DOT, the former approaches are mostly based on survey information from the BIBB/IAB and BIBB/BAuA studies, which up until recently have been the only major studies in Germany to feature information on job tasks. These studies include a variable number of items (e.g. 13 in the 1998/99 and 17 in the 2005/06 study).Footnote 7 These items, for one, often do not refer to individual tasks but to a number of tasks (e.g. “accommodating, serving, or caring” or “advertising, public relations, marketing, acquisitions” in 1998/99, or “securing, protecting, guarding, monitoring, directing traffic” in 2005/06; for details as to why this might be a problem, see the discussion in the following paragraph). Moreover, they are not consistent over time, since it is not necessarily the same set of tasks that are bundled together in an item in different panel waves.

Even though BIBB-IAB/BIBB-BAUA data are the ones most commonly employed to analyze job tasks, they are not without shortcomings.Footnote 8 First and foremost, since this dataset was developed as early as in the late 1970s, it was not designed to measure job tasks in the way they are conceptualized in the theoretical framework of the TBA. Allocations of survey items to the categories defined by ALM have been usually more or less ad hoc, a fact which can be seen for example from the different numbers of items assigned to measure these (see, for example, the overview in Spitz-Oener 2006: p. 243). In addition, items used to measure job tasks are often not sufficiently general in nature but might feature tasks or combinations of tasks that are typical of some (common or prevalent) occupations and as a result might do a better job adequately capturing task content for these occupations than they might do for others.Footnote 9 Therefore, providing a new instrument that has a stronger theoretical foundation and is less tailored to capture occupations with only specific tasks or task-bundles would in our opinion be a project worth undertaking.

There were several requirements and restrictions we had to adhere to when developing our instrument, some of which were of the conceptual kind and some of which were more practical in nature.

The most important conceptual requirements were the following. First, we wanted the questions to be sufficiently abstract, meaning that the task content addressed should be generally valid and not too close to any particular job profile. Second, the instrument should be equally suitable for all kinds of workplaces. Therefore, we concentrated on general job tasks rather than occupation-specific tasks. And third, item formulations should not refer to competencies or skills required to perform a job, nor to a subjective estimation of one’s own competencies, but to objective aspects of job tasks (see the discussion in the following paragraph for details).

The main practical restrictions were that we wanted the questions to allow for administering interviews by telephone (CATI), which meant that item formulations should not be too complex, and that response scales should feature a somewhat restricted number of response categories. In addition, since the instrument will be implemented in a multi-topic survey, we could allow for only a limited number of items (48 items maximum). Last but not least we wanted the items to be arranged in subgroups so that the instrument would allow for the calculation of subindices (e.g. for analytic or interactive tasks), which in turn could be combined to represent an occupation’s task profile.

3.1 Tasks vs. skills

As a first step, we need to clarify the concept of tasks, especially distinguishing it from the concept of skills. Both are closely connected but are by no means the same. Most generally speaking, a task is defined as “a unit of work activity that produces output (goods and services)” whereas a skill is “a worker’s endowment of capabilities for performing various tasks” (Acemoglu and Autor 2011: p. 1045). Tasks to be performed, therefore, are a feature of actual jobs or workplaces and might change as the latter are changing, for example (and most prominently) due to technological change (as e.g. noted by ALM), whereas skills are held by individuals performing such tasks. A job’s task profile and its incumbent’s skills may coincide, but the incumbent may also lack at least some of the skills necessary to perform the required tasks; likewise, he or she may have skills that are not necessary to perform job tasks (resulting in under- or overqualification, respectively). Throughout this paper, we are interested in the tasks respondents are required to perform in their given job, and it is these tasks that shall be captured by our survey instrument, not the skills or competencies respondents may or may not have in order to perform these tasks.Footnote 10

3.2 Theoretical baseline

From a theoretical perspective, we will mainly draw on the TBA developed by ALM, since this approach, as we have seen above, is at the heart of their alternative version of the SBTC argument. In the context of SBTC, the TBA was developed to categorize different types of workplaces in a way that allows for predicting the impact of technological change (in particular, computers and computer-controlled machines and their increasing availability at constantly falling prices) on these workplaces. In order to do so, ALM use two conceptual dualisms. First, they distinguish routine from nonroutine tasks. In this distinction routine tasks are characterized by the fact that they follow precise, well-understood procedures, which is why they can be (and increasingly are) codified in computer software and performed by machines. It is important to note that this concept of routine does not necessarily coincide with the colloquial meaning of the word, which especially in the German literature seems to have resulted in some conceptual ambiguities. Whereas the colloquial meaning of routine implies that an activity has become habitual as a result of repeating it over and over again, the TBA basically uses “routine” as a synonym for “might potentially be replaced by—more or less complex—technological means” (for a more detailed discussion of this problem, see Paragraph 3.3. below or, in a similar vein, the short discussion in Autor and Handel 2009, especially pp. 20ff.).

ALM’s second distinction is between cognitive tasks (meaning analytic and interactive, at some points also called “information processing” tasks) on the one hand and manual tasks on the other. For building their categories, ALM first combine these two distinctions (resulting in categories such as “routine manual”). In addition, for nonroutine cognitive tasks they distinguish between analytic and interactive ones, which finally results in five categories of workplace tasks: (1) nonroutine analytic, (2) nonroutine interactive, (3) nonroutine manual, (4) routine cognitive, and (5) routine manual. For each of these categories, they can identify quite different impacts of technological change and, as a consequence, different prospects for wage development (cf. Table 1 for details).

Table 1 Categories of workplace tasks according to Autor et al. (2003)

Nonroutine analytic tasks are defined as tasks that require highly specialized knowledge and the ability to solve problems using abstract thinking, whereas nonroutine interactive tasks contain complex interpersonal communication, such as negotiation, management, and consulting activities. Routine cognitive tasks, by contrast, include simple clerical tasks that can be accomplished following explicit rules. The main criterion for differentiating nonroutine from routine manual tasks is whether they involve service tasks requiring the worker visiting customers “in person” at their homes (e.g. plumbers, painters or other craftsmen), or whether the job consists of activities that require a flexible response to particular situations (e.g. drivers).Footnote 11

3.3 Capturing routine

Routine, as defined by ALM, is a concept that in our opinion is quite difficult to ask for in a survey. After all, you cannot just go ahead and ask respondents whether their job might just as well be done by a computer or some other machine. In addition, there are two further difficulties. First, the ALM concept of routine, as mentioned above, is at least in part inconsistent with the everyday definition of routine. A second difficulty is that the concept is not stable over time in terms of content. Although the latter difficulty should, in our opinion, be considered a core strength of the concept theorywise, it constitutes a major difficulty for developing a panel instrument—which by definition aims at asking the same questions in repeated panel waves.

We will briefly elaborate on these points. We will do so by referring to an example that has for quite some time been a standard reference for a nonroutine job: the driver of a motor vehicle (see Polanyi 1966; Autor et al. 2003). Usually this job is considered nonroutine because—even though it is a commonplace, everyday task—the procedures, by which it is actually accomplished, are not sufficiently understood in order to write computer routines to replace it. Nevertheless, we would argue that most people would consider such a task “routine” simply because it is a commonplace and everyday task. This would be particularly true when thinking of somebody driving the same route every day, like the driver of a bus or a delivery van might do. So what we want to argue is that when being asked about routine tasks, most respondents will classify jobs as routine according to their task repetitivenessFootnote 12 and that “routine” in that sense is not the same as “routine” in our theoretical definition, that is, potentially replaceable by technology (for a detailed discussion, see also Levy and Murnane 2004: pp. 41).

The example of the car driver also allows for illustrating the aforementioned instability of the concept. Even back in 2003, it went without saying that—given ALM’s definition of routine—the driver’s job should be considered nonroutine. Not even a decade later, however, as Brynjolfsson and McAfee (2012: pp. 12 ff.) have discussed in detail, this has changed fundamentally, as working prototypes for vehicles operating without a driver hit the road. Thus even though driving might not be considered a routine task quite yet, we will most likely see it turned into one in the foreseeable future, when these prototypes will have evolved into finished products that are available for purchase.

It is indeed a strength of the “routine” concept that it may easily account for such rather dramatic changes without requiring revision. This flexibility comes, however, at the cost of making the concept rather abstract. It is this high degree of abstraction, which, in addition to the aforementioned conflict with the everyday meaning of the word, makes capturing “routine” by means of survey items a challenging enterprise.

Instead of facing this problem head-on, we decided to define routine ex negativo, that is, by asking respondents whether their jobs were in some ways nonroutine. For that purpose, we focused on two job aspects we assumed should at least complicate their substitution by computers or machines. First, it should be more difficult to replace jobs that involve rather complex tasks in the sense that they require incumbents to react to unforeseen situations, to learn new things, or to deal with problems. A second dimension that should make a job hard to replace by technological means is autonomy. As Bresnahan et al. (2002) and others have argued, SBTC should not simply be considered a process in which firms invest in information technologies but rather as one in which the growing use of IT, organizational changes (resulting in higher discretion of workers and requiring higher levels of cognitive skills, flexibility and autonomy; ibid. pp. 345ff.), and changes in the companies’ products and services are intertwined.

What needs to be considered when discussing autonomy, however, is that there are two different perspectives from which to engage in such a discussion. First, from an organizational perspective, autonomy is a property of a firm’s work organization. By this we mean that internal procedures need to be organized in ways that allow employees to actually make autonomous decisions in the first place. So even though technological developments (like e.g. databases that grant easier and more comprehensive access to information) and the potential they provide as well as their requirements regarding incumbents’ abilities might complement the development of job autonomy, organizational choices should be a core factor for its development.Footnote 13 On the other hand, though, and independent of autonomy’s origin, from the perspective of the individual employee, autonomy brings with it a whole new set of tasks, such as defining goals, organizing the work necessary to achieve these goals (and doing so effectively), and so on. Dealing with these tasks requires a specific set of skills, which are not identical to general problem-solving skills. Moreover, and actually more important in the context of our current discussion, defining goals (for oneself or probably for others, too) and developing strategies to achieve these goals are tasks that should be particularly difficult to be carried out by computers. Thus having to perform such tasks should also be an important indicator of a nonroutine job.

3.4 Summary: a survey based approach to capture job tasks

Throughout this section we discussed in detail the conceptual background of our instrument. The core decision we made right at the start was to choose the TBA from among all the different perspectives on job content as the most relevant concept for our developing enterprise.

In addition, we made two decisions that had a strong influence on our concept. First, we decided to capture analytic and interactive tasks as separate dimensions (instead of focusing on a single cognitive dimension). The second decision concerns the way we capture the routine dimension. We argued that it might be easier to focus on specific job aspects that make those jobs nonroutine, because the concept of routine as defined by ALM (i.e. tasks that can be replaced by computers) is substantially different from everyday concepts of routine (which mainly refer to repetitiveness) and difficult to transfer directly into proper survey questions. Moreover, we argued that it might in particular be jobs involving complex tasks and jobs that are characterized by autonomy tasks that cannot be easily replaced by computers or other means of technology.

The resulting model of job contents to be measured includes five dimensions and is illustrated in Fig. 1. We included separate item groups for interactive and analytic tasks, as well as for (non-)routine tasks (task complexity) and autonomy tasks. The fifth dimension is directed at manual aspects of the job, which we operationalized by focusing on physical requirements.

Fig. 1
figure 1

Dimensions of job tasks captured by the survey module

We hope that by adequately capturing these five task dimensions of jobs, we will have sufficient information to allow for detailed analyses of various research questions concerning, for example, the changing task content of occupations in general and the impact of technological change on job tasks in particular, or the relation between job tasks performed and individual competencies or skills (and its development over time).

4 Operationalization and development of the instrument

In this section, we will provide a detailed discussion of the empirical implementation of the task dimensions identified above. In many instances, we opted for adapting existing items instead of developing all items from scratch. Thus even though we can by no means take full credit for the resulting instrument (though we surely bear full responsibility), we would think that the major contribution on our part lies in adapting and combining these items in such a way that they can be considered a concise implementation of our conceptual model as described above.

Among the major sources we drew on is the Survey of Workplace Skills, Technology, and Management Practices (STAMP, cf. Handel 2007, 2008), which we used in many instances, but most importantly as a blueprint for operationalizing analytic tasks.Footnote 14

4.1 Instruments to measure analytic tasks

Analytic tasks are tasks which involve thinking or reasoning. Examples are reading, writing, or calculating. In the theoretical conception by ALM, analytic tasks constitute one sub-category of cognitive tasks (interactive tasks being the other). In our approach, we tried to keep both aspects separate and capture them by quite different sets of items.

This part of our survey instrument is by and large a German-language adaptation of the corresponding items in the STAMP survey. What is characteristic about this instrument is that analytic tasks are measured by asking a sequence of simple yes/no questions capturing objective requirements instead of asking respondents for a subjective evaluation of task complexity. So people were asked first whether they, as part of their job, read texts that are at least one page long, followed, for those saying “yes”, by questions about reading 5 and (if answering “yes” again) 25 pages. Those answering “no” were asked whether they had to read anything at all. As a result, a five-point scale for reading requirements can be generated (none, <1 page, 1 to <5 pages, 5 to <25 pages, 25+ pages). Similarly structured item batteries were used for writing and mathematics. Because simply using the categories “more” and “less” to operationalize the latter did not seem feasible, another procedure was applied instead. Just like in the original STAMP items, we used questions asking for the use of methods representing an increasing degree of complexity (see the detailed list of items in the appendix). We can build an overall index value for analytic tasks by combining the results from these subscales and by standardizing the result to a range from 0 to 1.

From our perspective, this strategy has the major advantage of breaking down the complex and difficult judgments required when evaluating such abstract constructs into rather simple questions about facts (i.e. the respondent is not required to judge on a subjective basis whether he or she is doing a lot of reading, for example). This makes answering the questions rather easy and avoids potential distortions by subjective judgment.

4.2 Instruments to measure interactive tasks

A key characteristic of interactive tasks is that they require jobholders to communicate. Such requirements might range from dealing with external persons such as customers or clients, up to complex communication tasks like supporting, teaching, or dealing with candidates or applicants (cf. Spitz-Oener 2006). Again, we adapted item formulations (with some modifications) from the STAMP survey cited above. However, since interactive activities can indeed take up widely varying degrees of a person’s working time, we decided to use 5-point Likert scales instead of the original yes/no answers, asking respondents how often they had a particular type of interaction and allowing them to answer in five categories from “always/very often” to “very rarely or never”. The overall score for the interactive dimension is generated by first recoding items in such a way that high values will correspond with a high occurrence of interactive tasks and then averaging values and standardizing them again to a scale from 0 to 1.

4.3 Instruments to measure manual tasks

In our opinion, the operationalization of manual and routine tasks by ALM is rather unclear. The major weakness of their operationalization is that, in contrast to their theoretical conception, they did not offer independent and uniform measures for manual and routine tasks, but merely used indicators for particular combinations of these two dimensions.Footnote 15

In contrast to that, we chose to operationalize all components implied by the theoretical concept separately, but in an identical manner for all respondents. That meant that in addition to introducing items to measure routine (cf. the discussion in the following paragraph), we also needed to add an additional dimension to (uniformly) capture manual work. We did so by mainly operationalzing it as physical strain. This was measured by including items asking how often respondents, as a part of their job, were required to stand, walk, or lift something, to do work while assuming an uncomfortable body posture or while being exposed to great heat or great cold. There were two items on this in the STAMP survey we could use (walking, lifting), which were supplemented by additional items adapted from the BIBB-IAB study (cf. Parmentier and Dostal 2002).

4.4 Instruments to measure routine tasks

As discussed in the third section, our approach was to measure two dimensions characterizing nonroutine jobs rather than trying to capture routine directly. To measure the first of these dimensions, task complexity, we included questions about requirements to learn new things, to solve difficult problems, to react to unanticipated situations, or to work on varying assignments. Item formulations are based on a German-language instrument for the variety of job demands (Ulich et al. 1973), using slightly changed wordings and applying a modified response scale in order to make items comparable to the ones used elsewhere in our instrument.

The second dimension is autonomy. Most instruments to measure autonomy that we are aware of have been developed in the context of work psychology. Arguably the most influential description of the concept, as well as one of the first major attempts to capture it empirically, has been brought forward by Hackman and Oldham (1975, 1976). In general, autonomy in one’s job is considered to be of high importance for the development of positive affective states such as job satisfaction, motivation, and affective commitment (see e.g. Breaugh 1985). It has also been stressed that a high degree of autonomy is related to positive workplace behavior such as, first and foremost, high job performance (e.g. Christen et al. 2006; Hall et al. 2006; Langfred and Moye 2004; Parker et al. 2001). Authors in this field usually distinguish between various aspects of autonomy: that the order and pacing of job tasks is at the incumbents’ discretion, that they are free to make their own decisions or have a voice in decision-making processes, and that they are free to decide on the procedures and work methods they wish to apply (e.g. Breaugh 1985; Morgeson and Humphrey 2006).

The items we used to measure autonomy were adapted from a German instrument, the so-called “Questionnaire for Capturing Aspects of Job Tasks Relevant for Learning (Fragebogen zur Erfassung der lernrelevanten Merkmale der Arbeitsaufgabe, FLMA, cf. Richter and Wardanjan 2000). The Items from the FLMA were supplemented by an additional item from the STAMP-questionnaire, covering the extent to which the respondent is participating in his/her firms decision making processes, an aspect of autonomy not covered by the FLMA.

4.5 Development of the questionnaire

As described above, large parts of our survey instrument are adaptations of STAMP items (Handel 2008, 2007). Since STAMP is an English-language questionnaire, the first step was translating it into German. For quality reasons, we used a translation/retranslation approach, in which the German translation was retranslated into English, and this retranslation was compared to the original to detect potential translation errors.

In a second step, we developed a first draft version of the instrument. This involved, first and foremost, narrowing down the number of items, since the original STAMP instrument featured considerably more items (166) than we had available in our slot of the NEPS-questionnaire (48 items). Moreover, it involved modifying some of the response scales (as described in the discussion above). And last but not least, it involved adding new items, where conceptually necessary.

Even though the original STAMP instrument was properly tested, comprehension problems might occur that do not result from inaccurate translation but simply from transferring the instrument from the American to the German context. In order to safeguard against such problems, we conducted, in a third step, cognitive interviews with 34 persons (stratified by gender, age, education) using a specifically developed cognitive questionnaire.Footnote 16 As a result, we had to drop some problematic items altogether and change the wording of some others.

The next step was a larger development study, which featured paper-and-pencil interviews with a regular sample of 503 respondents. 348 of those were employed and hence were administered our instrument. This type of development study is carried out on a more or less regular basis in the NEPS context and also featured other newly developed instruments to be tested. Based on the data from this study, we did initial analyses to examine factor structures and the distribution of items. This pre-test also resulted in a major revision of the draft instrument.

After these revisions, the draft instrument was tested in the regular pilot study of the third panel wave of the NEPS adult stage. The main goal of this study is to test the correctness of item succession and filters. As part of this study, the final draft was tested with a pre-test panel population of 172 persons, featuring 138 employed respondents who actually received the instrument. After this pilot study, only minor adjustments in the instrument had to be made.

The finalized instrument was administered as part of panel wave 4 of the NEPS adult stage. Task questions were only presented to those currently holding a job and (if respondents held more than one job) only referred to their main activity.

5 Data and analyses

As a result of our development efforts, we came up with an instrument in which all of the five task dimensions were measured using between four and seven items. The only exception were analytic tasks, where the final scale included three subscales, each constructed using several items. Since all items (or subscales in case of the analytic dimension) use 5-point scales, overall scores for the dimensions can be calculated simply by taking the means. Table 2 provides an overview of the items used and the internal consistency of the scales for the five dimensions (for detailed item wordings, see the description in the appendix).

Table 2 Overview of items used to measure task dimensions and scores for internal consistency of the resulting scales (Cronbach’s Alpha)

Cronbach’s Alphas for internal consistency of the scales as well as the analyses presented below were calculated using preliminary data from the fourth wave of the NEPS adult stage. Because the goal of this paper is a methodological one, mainly evaluating the instrument and describing its development, we had the chance to use the data in advance.Footnote 17

The preliminary character of the data mainly affects occupational coding.Footnote 18 However, since occupation codes do not constitute a core aspect of the instrument but were merely used in the course of evaluating the resulting scale, potential differences between the coding used for the analyses presented here and the final occupation codes should only have a minor effect, if any, on the results presented. The analyses will be presented for major groups (i.e. for codes aggregated to the first digit) as well as for a few selected individual occupations (coded on the three-digit-level).

5.1 Task composition in ISCO major groups

In a first step, we analyzed the distribution of task constructs over the major groups of ISCO 08. For the sake of clarity, the graphs will only present results for four selected major groups.Footnote 19

Figure 2 displays the distribution of scale values as well as group means of the analytic task scale for the four occupational groups discussed above.Footnote 20 The figure quite clearly shows the expected task requirements for the selected groups of occupations. While managerial and professional occupations for the most part have high analytic requirements, requirements are at a medium level for clerical support workers and low for elementary occupations. All mean differences besides the (insignificant) difference between “Managers” and “Professionals” are significant at the 0.1 percent level.

Fig. 2
figure 2

Distribution of analytic tasks for selected groups of occupations

As Fig. 3 shows, differences are not as pronounced for interactive tasks. Nevertheless, we again see that managers and professionals—while not differing significantly from one another (all further mean differences are again significant at the 0.1 percent level)—have a rather high level of interactive tasks to perform, compared to medium levels for clerical support workers and mostly low levels for elementary occupations.

Fig. 3
figure 3

Distribution of interactive tasks for selected groups of occupations

Concerning manual task requirements (Fig. 4), we can see that all occupations except the elementary ones are characterized by a rather low level of such requirements, with between 20 and just over 40 percent of all employees in these occupations having no manual requirements at all.

Fig. 4
figure 4

Distribution of manual task requirements for selected groups of occupations

In contrast, as one might have expected, elementary occupations are characterized by a strong emphasis on manual tasks. The proportion of employees that have to perform no or only few manual tasks is extremely small in these occupations, while the majority of jobs in these fields feature medium or high manual requirements. Just as before, all mean differences besides the insignificant one between “Managers” and “Professionals” are significant at the 0.1 percent level.

In order to analyze routine task content, we first had to invert the item scales, since, as discussed above, in our instrument we asked for task complexity, that is, nonroutine aspects of respondents‘ jobs. Inverting the scales, therefore, allows us to interpret the scores as indicating routine (in the sense of non-complex) tasks. Looking at the results displayed in Fig. 5, we again find a different picture. While professionals’ and managers’ occupations, as before, do not differ significantly (the remaining differences being again significant at the 0.1 percent level) and could by and large be defined as nonroutine jobs, clerical support workers seem to face considerably stronger routine task requirements. This fits quite well with the classification by ALM, where typical clerical jobs like record-keeping or working in teller service are considered core examples of jobs characterized by non-manual routine tasks. An even stronger focus on routine tasks is found in elementary occupations, which indeed should to a large part feature exactly the type of routine-manual tasks with low qualification requirements in production or services that can be, and in fact often are, performed by machines.

Fig. 5
figure 5

Distribution of routine tasks for selected groups of occupations

Finally, when looking at autonomy tasks (Fig. 6), it is hardly surprising that management jobs in particular are characterized by high autonomy levels. Nevertheless, professionals and clerical support workers also display an above-average amount of autonomy tasks. In contrast, high autonomy levels are rarely found among elementary occupations. Most people in these jobs will have a medium or low level of autonomy. Mean differences between all groups are significant at the 0.1 percent level.

Fig. 6
figure 6

Distribution of autonomy tasks for selected groups of occupations

Summing up, we can conclude that our newly developed instrument for job tasks seems to be able to capture core differences in the task distributions of different occupational groupings. The differences captured are clearly consistent with what one would have expected of these occupational groupings’ task composition based on previous knowledge about these occupations as well as against the background of theoretical conceptions like the TBA.

5.2 Generating task profiles for individual occupations

Another major application we intended for the data collected using our task module is the identification of meaningful task profiles. Since generating such a profile requires a minimum of cases in each occupation, case numbers in our data were too small to use the detailed four-digit ISCO codes. Therefore, we decided to use minor codes (three-digit-level) instead. At this level, ISCO 08 distinguishes between 130 different occupational groups, which are still rather detailed, meaning that task profiles of the individual occupations aggregated should be close to each other. Instead of presenting individual task profiles for all 130 minor groups, we decided to illustrate the way these profiles are generated by using only a few selected occupations. These occupations were chosen by applying three criteria. First, we wanted examples of jobs at different hierarchy levels, that is, some with rather high requirements with respect to educational background (panels on the left) as well as some with medium (panels in the middle) and rather low educational requirements (panels on the right). Second, within these groups we aimed for two rather different examples, such as a typical office job (secretaries) and a potentially less routine personal service job (nurses/midwives). Third, case numbers in our data file should be sufficiently large in order to allow for a calculation of meaningful averages. On this basis, we selected the following six minor codes: “software and applications developers and analysts” (251) and “medical doctors” (221) as examples of high education jobs, “secretaries, general” (412) and “nursing and midwifery associate professionals” (322) as typical mid-level jobs, and “building finishers and related trade workers” (712) and “transport and storage laborers” (933) as examples of jobs usually not too demanding with regard to education.

Since all subscores for the five task domains were scaled to values ranging from 0 to 1, in order to generate task profiles we simply calculated occupation-specific mean scores for the task domains. Results are shown in Fig. 7. Looking at the profile for software and applications developers first (upper left panel), we almost find a prototype of a job characterized by performing analytic tasks with high autonomy task and low routine task requirements. Manual tasks, by contrast, are at a low level, whereas interactive tasks are at a medium level. Although the task profile of medical doctors is largely similar, what seems to be distinctive about them is the much higher level of interactive requirements they have to face in their job. This difference seems highly plausible, considering that doctors have to communicate a lot with patients and other medical personnel.

Fig. 7
figure 7

Task profiles for selected occupations (minor code)

At first glance, the secretaries’ profile does not seem to be all too different from that of programmers, displaying the same u-shaped form that we would consider characteristic of office jobs (also drawing on other examples not displayed here). When comparing the profiles more carefully, however, it is obvious that the two jobs are situated at different hierarchical levels. Whereas the degree of autonomy and analytic task content is much higher for software developers, the work of secretaries is to a much higher degree characterized by routine tasks.

Nursing and midwifery associate professionals, by contrast, have a completely different task profile. Regarding their interactive requirements, their profile is almost comparable to that of medical doctors, but they have a much higher manual task content, which is hardly surprising, considering that it is these associate professionals who have to do much of the physically challenging work in health care, such as moving patients. On the other hand, analytic task content in particular is considerably lower than it is for medical doctors, again indicating a profession located further down in the occupational hierarchy.

Building finishers appear to be a pretty good example of what ALM termed a job dominated by nonroutine-manual tasks. While manual task requirements are particularly high, routine is somewhat lower than it is for a (routine) non-manual occupation like secretaries.Footnote 21 Finally, looking at transport and storage laborers, we have an example of an occupation clearly characterized by manual and routine tasks, with low analytic and interactive task components.

Summing up, we think we were able to show that we can detect relevant differences between various types of jobs (with regard to their educational requirements) by using profiles based on task data collected with our survey-instrument. Moreover, we are able to identify variations in task profiles within these types. Therefore, we are confident that our new instrument will be quite useful in capturing core aspects of occupational task profiles and thus will prove to be a valuable tool for performing all types of analyses requiring such information.

6 Summary and conclusion

Summing up, we first want to note that information about job tasks can, as we have argued, be useful for answering a diverse array of research questions. However, when looking into the matter more closely, we found that none of the existing instruments for capturing such task information—and especially those for surveys administered in German—did fit our needs completely, a finding that eventually sparked the decision to try to develop a new instrument.

In this paper, we presented an extensive discussion of development procedures and analyses we performed. Our aim was to demonstrate that our instrument will indeed produce valid information on job tasks. Of the many arguments brought forward in the course of this discussion, we only want to highlight the one point we believe to be the core advantage of our instrument: In contrast to other available alternatives, which may have been designed with a somewhat different purpose in mind, we tried to develop an instrument that closely follows the major theoretical approaches in the field. So instead of having some items available and assigning them—more or less successfully—to theoretical dimensions developed independently, we started out with the theoretical concepts and were able to look for the best items to operationalize these empirically. If we did succeed in this enterprise, the resulting instrument should therefore be theory-driven in the best sense of the word.

Nevertheless, there is one thing a newly developed instrument should definitely not be able to accomplish, quite independent of its quality and that is monitoring past developments. So it will surely be more useful to e.g. analyze the potential influence of SBTC on future wage developments than it will be for providing additional evidence for or against its influence so far. Notwithstanding this restriction, we would argue that the NEPS adult survey is, in our opinion, a very fruitful environment for a task instrument datawise, since it will allow for various types of analyses that were not possible with the data currently available. One reason for this is the fact that NEPS is a panel survey, meaning that it will provide genuine longitudinal data once we will have been able to replicate our instrument. This type of information is particularly rare for information on job tasks—in fact we are not aware of any dataset replicating task measures in a panel survey. Therefore our data might in the long run allow some new perspectives on various research topics for which knowledge of these tasks is relevant, such as a detailed analysis of the influence of technological development on the task content of individual occupations, or the question to what extent task bundles will be altered or stay comparable when changing jobs. Moreover, NEPS includes other variables of interest such as, in particular, regular competence tests. This means that we should also be able to analyze topics such as the interrelations of competencies and tasks performed, and probably the interdependencies of this relationship over time.

Once the scientific use file of the fourth panel wave that features the task instrument described here is available,Footnote 22 we hope that we will be able provide examples of such analyses.