Introduction

Collecting comparable data on registered crime and on criminal justice across countries is a complex, almost impossible task (Harrendorf 2011: 126–128). Differences found between countries in levels of crime can not only or mainly be attributed to substantive factors, like actual differences in “real” crime levels, differences in the propensity to report offences etc. There are also statistical and legal factors influencing the outcome (Aebi 2008; Aebi 2010; von Hofer 2000). Thus, data collection is depending on different systems of statistical recording (cf. Aebi 2008 for the effect of counting rules on police level). But the individual specifications of criminal justice systems, as fixed in the legal provisions of the different countries, are very influential, too (von Hofer 2000). Therefore, even an approach using standard definitions can only foster comparability, but cannot establish it completely. However, using a system of elaborate and thoroughly validated definitions will not only maximize comparability within the restrictions the above-mentioned framework entails. It will also make the remaining differences transparent and visible (cf. Aebi et al. 2010: 341–375). Thus, the use and quality of offence definitions is a crucial issue for any approach to collect crime and criminal justice data on the different levels of the criminal justice process, from police to prisons.

The European Sourcebook of Crime and Criminal Justice Statistics (ESB) is a project collecting data on crime and criminal justice statistics across Europe that has already been around for 18 years now, with the first expert group meetings under the auspices of the Council of Europe dating back to 1993 (details on the group history can be found in Aebi et al. 2010: 15–18; Killias and Rau 2000). After the publication of the first edition 1999 (Council of Europe 1999), the Council of Europe could not support the project any longer, but the project was continued under varying financial circumstances (see Aebi et al. 2003; Aebi et al. 2006). Recently in 2010 the 4th edition ESB was published (Aebi et al. 2010), which featured a revised and expanded methodology, based on experiences made during a research project funded by the European Commission under the AGIS 2006 program (Jehle and Harrendorf 2010).

From the very beginning, the ESB project always focused on the importance of providing standard definitions for offences and collecting metadata on procedural and data collection rules affecting data comparability. Figures on a certain offence, like, for example, theft, do not mean much without any rules and metadata explaining what is meant by this word. Each and every criminal justice system will have its own legal provisions and high court decisions covering theft offences, predominating what is collected under the “theft” heading in national crime and criminal justice statistics and thus reducing data comparability. Therefore, simply comparing statistical categories bearing the same, yet translated, headline is like comparing apples with oranges disguised as apples.

In this article, the methodology connected with offence definitions in the ESB will be discussed and results of some quality checks of the definitions used in the ESB context will be shown. Future possible improvements of methodology and, based thereupon, data quality and comparability will be discussed.

An Overview over ESB Methodology

The ESB is created by an international group of experts from different European countries, some of them being researchers at universities, others working in ministries, statistical offices or national police services. This expert group is in charge of designing, revising and updating the questionnaires for each survey wave. An ESB questionnaire consists of various questions, not only collecting the data on police-recorded offences, suspects, police and prosecution staff, prosecution service decisions, convictions, sanctions and measures, prison population etc., but also gathering the necessary metadata on definitions, procedures and data collection methods in national statistics to further the understanding of the reported figures.

The final questionnaire for a survey wave is always sent out to the network of national correspondents the ESB group can rely on, all of them being experts for their own national system of crime and criminal justice statistics. In each responding country, the expert group has at least one, sometimes two or three, of these national correspondents.Footnote 1 They have similar professional backgrounds as the ESB group experts. These correspondents are in charge of filling the questionnaire and sending it back to the group after completion. While in some countries, all or most of the data requested in the ESB survey are published and can therefore be filled in by the expert himself, relying on these official publications, the process of data gathering is more complex in other countries, where sometimes different official bodies have to be contacted and asked for their assistance in filling a part of the questionnaire, providing data from unpublished sources. Obviously, there are also still some countries that show large gaps in the data they have available and can report.

Most members of the expert group are also acting as a regional coordinator for a group of countries. For these countries, they are in charge of contacting the respective correspondents, responding to any questions they might have, checking the filled questionnaires for odd or missing data and get back to the correspondents with questions or remarks regarding the problems found. Together with the national correspondent, identified problems shall be solved or at least minimized.

This is already one important part of the validation process. Validation continues afterwards, after the data have been entered to a central database, by checks made on a chapter-per-chapter basis, i.e. for the general definitions part and for the police, prosecution, convictions and prison parts. These additional validation tasks are also distributed among the experts. In principle, by carrying out this procedure, missing data should be completed as far as possible and clearly odd data should be explained with an accompanying footnote or be deleted.

However, due to the huge differences between the criminal justice systems compared, it is of course possible that mistakes or problems are not always identified, because these huge differences are also reflected in the figures, reducing validity checks mainly to trend and internal validity checks, i.e. checks for odd “jumps” in time series, clear outliers and disproportions between variables, like a total figure being lower than the figure for a certain sub-group of that total. Other, less obvious errors might slip by unnoticed, especially in case of misunderstandings on the side of the national correspondents, because the supervision of and cooperation with the correspondents by the regional coordinators during the process of filling and validating the questionnaire depends on one of them communicating problems to the other. The correspondent will, however, usually not be conscious about his or her misunderstandings. The questionnaire is already designed to reduce the possibility of misunderstandings, but errors can occur anyway, might it be due to still ambiguous rules in the questionnaire or due to language problems: Usually, the correspondents are required to read and answer an English questionnaire, though their mother tongue is another.

A third validity check might reduce this problem (at least in theory): This would be a check of other sources, to compare the data these sources have with the data in the questionnaire (also cf. Harrendorf and Smit 2010, 146–147, with more details on the different types of data checks). For the first time, some cross-checking with the variables of the United Nations Crime Trends Survey (CTS) had been carried out for the 4th edition, namely for homicide, theft of motor vehicles, drug offences total, drug trafficking and total prison stock. Though a few additional problems in ESB data were identified during the cross-checking process, use was still limited for ESB purposes, due to the fact that the CTS data used for the comparison (10th survey data) had not been validated at all before the data were provided to the group.

Therefore, usually in the case of differences between results the ESB values turned out to be the more reliable ones. However, e.g. for motor vehicle theft the cross-checking showed that the definition used so far seems to miss two important rules: The rule to include theft of motorcycles, mopeds etc. and the rule to include theft of trucks, lorries etc. Without these rules, some correspondents tend to only report car theft, though there are also data on the theft of other motor vehicles in their statistical systems.

For future ESB editions, a comparison with CTS data on certain key variables will be more useful, since UNODC started to validate the data, especially from the 11th survey on.Footnote 2 Such a comparison will be especially helpful if one takes into account that the definitions of some CTS variables are very close or even identical to some ESB variables.

Offence Definitions in the 4th Edition ESB

Offence definitions in the ESB, with some exceptions for drug offences (cf. Jehle and Harrendorf 2010: 85–103; Aebi et al. 2010: 371–375), follow in principle the same concept: First of all, a standard definition is provided, which is not a legal definition, but shall optimize international comparability by having been adapted to meet the core elements each respective crime consists of in the legal and statistical systems of the majority of countries and therefore can most conveniently be followed. The standard definition is used to make clear what kind of criminal behaviour is meant by a certain offence type.

This standard definition is then followed by a list of specific items, some of which shall be included in, others excluded from the data reported. These rules are meant to deal with special forms of criminal behaviour for which it might be arguable whether that behaviour can still be subsumed under the standard definition. For example, the standard definition for homicide is “intentional killing of a person”. The include list features assault leading to death, euthanasia, infanticide and attempts, the exclude list assistance with suicide.Footnote 3

Correspondents were asked to try to meet the standard definitions and follow the include and exclude rules as closely as possible. Metadata on the conformity of and deviations from the standard definitions were collected for both police and convictions statistics (Aebi et al. 2010: 341–375).

For the 4th edition ESB (also cf. Jehle and Harrendorf 2010; Aebi et al. 2010: 18–24 for details on 4th edition methodology), a couple of definitions have been revised or newly introduced to the questionnaire. The revision or design of a definition has always been based upon the expert knowledge assembled in the ESB group on the legal provisions and statistical categories structuring data in the different countries, as well as on comparative issues. International studies regarding these topics (like Savona and Vettori 2006) have also been analyzed. Based on these sources, new and revised offence definitions have been designed. In the 4th edition, the definitions have then been tested with a trial questionnaire sent out only to the members of the expert group. For some of the new or revised definitions, the trial questionnaire has been supported by some additional, in-depth questions on these offences, their definition, the respective legal provisions and the way of statistical recording. Based on the results of this trial phase, the definitions have been partially modified.

Due to this, the 4th edition ESB questionnaire featured more offence groups and subgroups than any of the earlier ESB editions. The extension was due to the above-mentioned AGIS project, which, inter alia, focused on the introduction of further crime types, especially those subject to EU-harmonized definition (Jehle and Harrendorf 2010). In total, the 4th edition covers 27 offence groups and subgroups on police level. This means that the number of offence groups and subgroups was more than doubled compared to the earlier editions. In the other chapters of the 4th edition questionnaire, some of the new subgroups were not included. In these other chapters (prosecution, conviction, corrections) 21 offence groups and subgroups were covered.

Table 1 shows the offence groups and subgroups for which data were collected in the 4th edition ESB. As can clearly be seen from the table, there were many changes and additions in definitions of offences. Only for six offence groups, the definitions remained completely unchanged. In addition, for three offence groups there were only minor changes, probably not affecting the data reported. On the other hand, four offence groups or subgroups were revised significantly and 14 have been newly introduced, six of these on police level only.

Table 1 Changes in offence groups and sub-groups introduced with the 4th edition ESB

However, not all of the offence subgroups mentioned were addressed by separate definitions. While all offence groups were defined separately, the same is only true for a few of the subgroups, namely for aggravated bodily injury, motor vehicle theft, burglary and domestic burglary. The drug offence subgroups were also subject to some kind of definition, though not in the standardized way used everywhere else in the ESB (cf. Jehle and Harrendorf 2010: 85–103; Aebi et al. 2010: 371–375). All other subgroups for which data were collected were only separate items on include lists of definitions. The standard definitions used for the 19 offence groups and subgroups that were defined separately and the respective include and exclude lists can be looked up in detail in Table 2.

Table 2 Offence definitions and include/exclude rules used in the 4th edition ESB

Results: Quality of the Definitions Used

Overall Conformity with Definitions on Police Level

The quality of the definitions used is crucial for data comparability and validity. One indicator of quality might be the level of conformity with a standard definition that could be reached. The conformity is measured by the degree to which include and exclude rules could be followed. Figures 1 and 2 therefore show the overall conformity rates with the standard definitions on police level for all responding countries. Offence definitions are split into “new”Footnote 4 (Fig. 2) and “classic” (Fig. 1) in order to allow an easy comparison between these two groups, too. A breakdown of these results by separate countries can be found in Table 3.

Fig. 1
figure 1

Overall conformity with definitions on police level for the “classic” ESB offences (Also see Table 3 with detailed results per country.)

Fig. 2
figure 2

Overall conformity with definitions on police level for the new ESB offences (Also see Table 3 with detailed results per country.)

Table 3 Standard definitions met by country and criminal offence on police level (Source: Aebi et al (2010), p. 343.)

A definition is considered to have been followed in all respects by a correspondent, if he or she was able to include in the data all items on the include list and exclude from it all items on the exclude list. Otherwise, the definition was considered not to be followed. For a few countries, there were ambiguities in the answers given, preventing a clear assessment. These responses were marked “unclear” in the figures. Finally, a fourth category covers countries which were unable to provide quantitative data for the respective offence type.

To give an example, let me return to the standard definition for homicide: It is “intentional killing of a person”. The include list features assault leading to death, euthanasia, infanticide and attempts, the exclude list assistance with suicide. This definition is considered to have been followed in all respects, if a correspondent was able to include assault leading to death, euthanasia, infanticide and attempts in the reported data, but exclude assistance with suicide.

Of course it should also be taken into account that the validation system used cannot fully eliminate errors on the side of correspondents (see above). It is known that there were also some misunderstandings regarding the include/exclude system used for the definitions, but it can be expected that these errors will not distort the global results presented here.

The main quality indicator used in the following is the rate of countries which were able to follow a given definition completely and provide data for it. Not only is the provision of data not fully in accordance with the standard definition an issue that might hint at a quality problem of a certain definition. The same might be true for high rates of missing data.

The highest overall conformity rates can be found for robbery (78%) and theft (64%), closely followed by the newly introduced offence of money laundering (53%) and by drug offences total (50%), both of which being subject to several EU-wide and even worldwide combating initiatives (an overview is given in UNODC 2009b for drug crime and Alldrige 2008 for money laundering). The lowest conformity rates can be found for sexual assault (17%) and bodily injury (19%), corruption in the public sector (25%) and fraud (28%). Low conformity rates can be found more often for the newly introduced offence types than for the “classic” ones.

That a definition was not closely followed by many countries does, however, not necessarily mean that this definition is problematic. It can also be understood to mean that in a certain field of crime conformity of definitions as reflected by the national statistics is low. This means that there is not necessarily a “better” definition than the one used. But a close examination of the answers provided and the patterns of deviation can help to identify the potential for future improvements.

For example, for bodily injury the low conformity levels can be explained by the diverse concepts of this offence that can be found across Europe: While in some countries a wide concept (better reflected by the term “assault”) is in use, which sometimes even covers mere threats, but more often assault only causing pain and slapping and punching, other countries follow a more restricted concept requiring actual injury (cf. Jehle and Harrendorf 2010: 33–53). The ESB always followed a restricted concept of assault/bodily injury, asking countries to exclude mere threats, assault only causing pain and slapping and punching. Metadata, however, show that 12 countries or, if counting the constituent countries of the UK separately, 14 countries were not able to follow such a narrow definition (see Aebi et al. 2010: 352). It should be tested for future editions, if higher conformity rates could be achieved by turning to a wider definition of assault, e.g. still excluding mere threats, but including assault only causing pain and slapping and punching.

Differences can also be explained by a mere “technical” issue. Conformity rates shown in Figs. 1 and 2 are not fully comparable between offence groups: Since overall conformity rates are shown, a standard definition with a long include/exclude list is more likely not to reach 100% conformity than a definition with only very few items on that list. This is a possible explanation for the low overall conformity rates for fraud, an offence which features twelve items on that list, compared to, for example, only four for money laundering. However, there are also offences with a comparably long include/exclude list and a relatively high overall conformity rate, namely drug offences total with 11 items on the list and a conformity rate of 50%. Legal conformity of definitions in the area of drug crime is quite high, since there are international conventions defining minimum levels of criminalization and reducing legislative discretion of individual countries mainly to the question how to deal with minor offences by drug consumers (see Malby 2010b).

When interpreting these results, it should also be kept in mind that a medium or even low overall conformity rate does not necessarily mean that the differences affect the “core” of an offence. For example, overall conformity for intentional homicide is quite low, though international comparisons show that homicide is an offence with a relatively high international comparability (see recently Malby 2010a; Geneva Declaration Secretariat 2008, 67 – 88; both also on methodological problems of comparison). The deviations for homicide mainly do not affect the core of the offence, but just the fringes, as can also be seen if looking at the items on the include (assault leading to death, euthanasia, infanticide, attempts) and exclude list (assistance with suicide). Only one rule, the rule to include attempts, clearly affects the “core” of homicide. This single rule, however, shows a very high conformity (94% of countries were able to include attempts on police level). To a lesser extent, the rule to include assault leading to death is of (statistical) importance, too; it shows conformity of 75% on police level. The comparatively low overall conformity rates for homicide are therefore due to a combination of specific differences.

There are some offences for which data availability even on police level is quite low, namely sexual assault with 33% of all responding countries not being able to provide data, corruption in the public sector (28%) and burglary, computer offences and money laundering (each 25%). This might hint at a problem in statistical recording, like offences with only relatively few cases not being recorded separately (which might be the case for computer offences, money laundering or corruption), but might also be directly connected with the concept that is defined. This might be the case for burglary, a concept unknown by the penal law in many countries of continental Europe, which instead know different types of aggravated theft offences. The same can be true for sexual assault, an offence which cannot always easily be separated from rape, sexual abuse of minors and sexual harassment and therefore might not in all countries be reflected in separate statistics.

While the offences of drug trafficking and aggravated drug trafficking are mainly not covered here due to a differing, non-standardized, concept of definition (see Table 2), data availability can be checked for them, too. Doing so shows that data availability for aggravated drug trafficking is quite low, too. Obviously, this is not a separate concept in many statistical and/or legal systems.

Overall Conformity with Definitions on Convictions Level

We now turn to look at the conformity rates on the convictions level. Figure 3 shows the overall conformity on that level for the “classic” ESB offences, Fig. 4 for the new ones (detailed results per country can be found in Table 4). For most offences, the results are quite similar to the results on police level, with only minor differences. Usually, conformity rates are a bit lower on convictions level. Since an offender can only be convicted for a certain offence as set down in the criminal code, conviction statistics are even more dependent on legal definitions of crimes than police statistics are, which often also reflect criminological concepts (Aebi et al. 2010: 342). However, there are also a few offences for which conformity rates on convictions level are higher than on police level, namely for total offences, rape, aggravated bodily injury, sexual assault and corruption.

Fig. 3
figure 3

Overall conformity with definitions on convictions level for the “classic” ESB offences (Also see Table 4 with detailed results per country.)

Fig. 4
figure 4

Overall conformity with definitions on convictions level for the new ESB offences (Also see Table 4 with detailed results per country.)

Table 4 Standard definitions met by country and criminal offence on convictions level (Source: Aebi et al ( 2010 ), p. 345.)

For aggravated bodily injury, sexual assault and corruption this difference can at least partially be explained by an increase of data availability, i.e. data availability for these offences is better on convictions level than on police level. The same is also true for sexual abuse of minors. For aggravated bodily injury, this will be connected with the fact that aggravation will usually be a certain legal concept that is assessed by the court and therefore might be better reflected in court statistics. Sexual assault and sexual abuse of minors will also be legal concepts, though their definition will be subject to some variation. However, this improves recording on convictions level, while on the other hand police statistics might not be detailed enough in each and every country to record all types of sexual offences in a differentiated way. Corruption finally is an offence which will not be investigated by the police in some countries, but instead by some specialized administrative unit. This might explain the remarkable difference in data availability between police and conviction statistics.

Apart from the data availability issue, the definitions of total offences, rape and aggravated bodily injury also showed lower rates of non-conformity on convictions level among those countries which provided data. This might be explained by these definitions being quite near to a legal concept, which is not fully reflected in all police statistics. This is especially plausible for the total of offences. For this category, it is (inter alia) explicitly asked to include major traffic offences, but exclude minor ones, and include all other offences defined as criminal by the law. While traffic offences, tax offences, customs offences etc. are often not investigated by the police and thus not recorded in police statistics, these offence types will appear on convictions level as far as they are defined as criminal by the law.

But the most striking difference between the police and convictions level is the fact that for the offences of motor vehicle theft, burglary and domestic burglary data availability on convictions level is much lower than on police level. For domestic burglary, only about 17% of all responding countries were able to provide data on convictions level, for motor vehicle theft the rate is 31%, for burglary 42%. This makes clear that the definitions used for these offences are not reflected in the law of the majority of responding countries and can therefore not be separately identified on convictions level. It will be one of the important tasks for future editions to improve conformity in this area.Footnote 5

Item Conformity

As discussed above, overall conformity is a quality indicator that is not fully comparable between definitions, because a definition followed by an extensive include/exclude list is more likely not to reach 100% conformity compared to a definition with a shorter list.

Therefore, another quality indicator that should be checked is item conformity, i.e. the rate of countries that were able to follow a single include or exclude rule. Different from the overall assessment presented before, I will now only consider countries which reported data for the respective offence on the respective level (police or convictions). The item conformity rate is therefore the percentage of countries that followed the rule and provided data within all countries that provided data and gave an unambiguous answer for that item in the definitions section of the questionnaire.

To give an example: If 75% of all responding countries were able to include assault leading to death in homicide data on police level, the item conformity rate for that single item is 75%. The following text also uses mean and median item conformity rates per offence and for all offences together. A mean item conformity rate of 82% for homicide on police level means that on average 82% of all countries were able to follow a single rule of the homicide definition. Table 5 gives an overview over the results per offence definition and for all offence definitions together.

Table 5 Conformity with definition sub-categories (include/exclude rules) on police and convictions level

Altogether, there are 268 individual items on all include/exclude lists of all definitions: 134 for police and 134 for convictions statistics. The median item conformity rate is 91%, the mean 87% with a standard deviation of 12 percentage points. Of the 268 items 51 (19%) reached full conformity, i.e.: all countries providing data were able to follow these rules.

If looking at the results on an offence-by-offence basis, the highest mean conformity rate can be found for robbery (95%). Robbery also shows the lowest variation in results for the different items with a standard deviation of 3 percentage points. The rule to include muggings on convictions level (88% conformity) turned out to have the lowest conformity rate of all robbery items. High conformity rates with a mean of more than 90% can also be found for drug offences total (94%), theft total (93%) and computer offences (92%). For drug offences and computer offences, the median of conformity rates is even 100%, i.e. the majority of rules were followed by all countries. This is due to the fact that the extent of criminalization of drug offences is, as said before, predefined by international conventionsFootnote 6 and many computer offences have been subject to EU-harmonized definition.Footnote 7 The lowest conformity rate within the definition of drug offences can be found for the rule to include consumption on convictions level (67%). This is due to the fact that the extent to which drug offences connected with personal use are criminalized varies significantly as international conventions leave room for legislative discretion on this issue (Malby 2010b). For computer offences, the lowest conformity rate can be found for the rule to exclude illegal downloading on police level (70%).

The offences of burglary and theft of motor vehicles clearly showed the lowest item conformity rates: The mean item conformity for burglary is 77% with the rule to exclude theft from a car from data on convictions level reaching the lowest conformity (54%). For theft of motor vehicles, mean item conformity is 81% with the lowest conformity for the rule to exclude theft of motorboats on convictions level (46%). This is also the lowest conformity rate of all items. Such low conformity rates on convictions level show that for many countries that have a separate legal provision on burglary or motor vehicle theft, respectively, the rule used in the ESB is contradicting the law.

In order to assess how problematic nonconformity with a certain rule is for the comparability of data, it has also got to be taken into account how important that rule is with respect to the expected total of offences of the respective category. For motor vehicle theft, such an assessment would lead to the assumption that failing the rule to exclude theft of motorboats would usually not have a high impact on data comparability, since clearly most of the stolen vehicles will be cars, anyway. Failing to include consumption in the data for drug offence will, on the other hand, have a greater impact on data comparability, at least if its functional equivalent, the possession of small quantities, is also excluded.

Mean item conformity rates below 85% can also be found for fraud (81%), intentional homicide (82%) and sexual assault (84%). The most problematic rules for these offences were the rule to exclude tax and customs offences from fraud data on convictions level with only 58% conformity and to include assault leading to death in intentional homicide data on convictions level (64% conformity). For sexual assault, the lowest conformity rates were found for two separate items, both on police level: These are the rules to exclude acts committed without violence and acts committed against persons under the age of consent (both 67% conformity). These are both very important rules, the first to exclude less intense sexual acts like sexual harassment and the second to allow a clear distinction from sexual abuse of minors.

The low conformity rates for these two, important rules make comparability of results on sexual assault dubious, since they affect the “core” of the offence: Sexual assault in the ESB is defined as a violent sexual offence, close to rape, but without penetration. Sexual abuse of minors is a separate, third category of sexual offences in the ESB. Less intense sexual offences are not collected at all. The results show now that many countries are not able to follow such a restrictive definition with respect to sexual assault, but report data on a broader concept instead.

Variation of Data Reported

A final quality check that shall be carried out on the ESB definitions involves a look at the data reported for the different offence categories and subcategories. In this section, all offence categories and subcategories that were collected throughout the questionnaire are included, not only those that have a separate standard definition. But the six subcategories that were only collected on police level (see Table 1) were excluded here, too.

These six sub-groups all refer to two revisions in offence definitions, one regarding the total of criminal offences and the other bodily injury (assault): The new definition for offences total demanded to include property and violent offences, even if subject to proceedings outside the criminal justice system. It was then felt that this rule should be backed up by separate sub-categories on these types of minor offences on police level (as two sub-categories of offences total and one sub-category of both bodily injury and theft), as far as subject to proceedings outside the criminal justice system, in order to know where the new definition broke the trend. For similar reasons, the rules to include domestic violence and injury of a public servant in bodily injury were backed up by sub-categories.

All in all, these sub-categories turned out to be very problematic and mainly showed very low data availability rates (see Aebi et al. 2010: 38, 45). Most of them will probably not be covered in future ESB editions and shall also not be covered here. But let us take a look at the remaining 21 offence categories and subcategories:

A definition that is clear, understandable and easy to follow should also lead to a higher conformity of data reported. However, due to differences in reporting rates, police and criminal justice efficiency, legal systems etc., the data reported cannot directly be compared between countries, even if one calculates figures per 100,000 population. The “natural” variation is remarkably high. To give an example: For the total of criminal offences recorded on police level in 2006, Albania shows the lowest rate of about 300 cases per 100,000 population, while in Sweden the rate is about 13,500 (Aebi et al. 2010: 37).

But one might hypothesize that there are certain rules that in principle govern the relation of criminal offences to each other in every system. For example, grave offences leading to dire consequences will be less common than offences leading to medium damages, which will again be less common than petty offences. Therefore, homicide rates will be much lower than, for example, theft rates, and rates for rape or robbery will be in between. Apart from that, there are certain offences which are more common than others, because they can be easily committed by everyone, while others need a more specialized offender in a special situation. Therefore, theft or major traffic offences will be much more common than, for example, corruption or money laundering. Of course, these relations are not identical in every system, e.g. due to differing approaches in the reaction to petty offences, like the extent of decriminalization and diversion. But these differences will more or less be evened out if looking at a larger group of countries.

This leads to the assumption that the variation of country results for the different offence rates follows certain rules: The relation between the mean and the standard deviation of offence rates per 100000 population for police-recorded crimes can be approximated by a proportional function:

$$ {{\text{y}}_{\text{i}}}\;{{\text{m}}^{{*}}}{{\text{x}}_{\text{i}}} $$

with yi being the standard deviation found for offence type i, xi being the mean found for offence type i and m being the variation coefficient. The variation coefficient is therefore assumed to be approximately constant across offence types, but it will surely depend on the countries covered and can also be subject to change from year to year. The same assumption can be made for police-recorded suspects and for convictions.

Figure 5 shows this relation for police-recorded crimes. The scatterplot almost resembles a line and therefore confirms the assumption. The scatterplots for police-recorded suspects and for convictions, which are not reproduced here, look the same. Correlation coefficients are 0.990 for police-recorded crimes, 0.955 for police-recorded suspects and 0.928 for convictions. All correlations are therefore very strong and also highly significant,Footnote 8 based on a t-test. Therefore, it is safe to assume that this relation is not simply due to the trivial fact that of course there will always be a tendency towards a higher standard deviation for higher means. The correlations also remain strong if one removes the offence categories clearly interrelated with other categories, like criminal offences total and offence subgroups.

Fig. 5
figure 5

Standard deviation per mean by offence categories for police-recorded crimes (log scales) (See Table 6 for the exact values.)

Table 6 Variation in recorded crime levels per 100,000 population by offence categories (Rates per 100,000 population may differ from rates presented in the printed version of the ESB, since they have been calculated by the author using updated population data retrieved on 17 March 2011. Source: crime data: raw data for Aebi et al. (2010); population data, unless stated otherwise: US Census Bureau, International Database, http://www.census.gov/ipc/www/idb/index.php (total mid-year population). Additional data used: Population figures of France metropolitaine for police data taken from Institut national de la statistique et des études économiques, www.insee.fr (data for January 1st interpolated to mid-year values); population figures of former Western Germany and Berlin for convictions data 2003 to 2006 taken from Statistisches Bundesamt, www.destatis.de (data for December 31st interpolated to mid-year values); population figures for England & Wales, Northern Ireland and Scotland taken from Office for National Statistics, http://www.statistics.gov.uk/STATBASE.)

A regression has been calculated for all three distributions, using the ordinary least squares method. The (based on an f-test) highly significantFootnote 9 regression coefficients are 0.746 for police-recorded crimes, 0.946 for police-recorded suspects and 1.320 for convictions.

All variation coefficients of the different offence types found for police-recorded offences and suspects and for convictions are very high, as can be seen in (Fig. 6 and Table 6). They are clearly above 50% and therefore the mean does – as expected – not represent the individual country results. The median variation coefficients for these relations are very similar, being 109% for police recorded crime, 110% for police-recorded suspects and 107% for convictions.

Fig. 6
figure 6

Variation coefficients by offence categories for police-recorded crime, for police-recorded suspects and for convictions (See Table 6 for the exact values.)

A variation coefficient much higher than those usually found shows that a certain offence variable varies much more strongly across Europe than expected. This might be a sign for a problematic, equivocal definition. Or the high variation might be understood to reflect larger than usual differences in the legal provisions or the practical importance of the respective offence across Europe. Therefore, an offence with an unusually high variation coefficient is not necessarily defined in a problematic way, but it might be.

As can be seen in Fig. 6 and Table 6, there are three offence types which clearly show unusually high variation coefficients: major traffic offences, computer offences and money laundering. For major traffic offences, this will be due to the fact that the extent to which traffic offences are considered criminal varies significantly between countries. However, there will be a core of major traffic offences that are considered criminal everywhere in Europe. The ESB questionnaire aimed at collecting only these core criminal traffic offences, but the high variation still to be found in the data shows that the attempt was not very successful so far. Since the ESB questionnaire does not feature a standard definition on major traffic offences, the feasibility of introducing such a separate definition for the 5th edition questionnaire should be tested.

The high variation for computer offences is astonishing when taking into account that large parts of this area of crime have been subject to EU-harmonized definition.Footnote 10 Though there might be bigger differences in the “reality” of computer crime than for other offences due to the dependency on computerization and hard- and software equipment, which will not yet be distributed evenly across Europe, the high variation might also be explained by the fact that “computer offences” is a general term, increasing the possibility that different correspondents understand the extent of that category very differently. Therefore, it might be useful to change to a more focused standard definition.

The variation coefficients for money laundering are also striking. It is noticeable that the high variation is mainly restricted to the police level, especially to the number of suspects, with a much lower, though still above average, variation for the convictions. Money laundering might therefore in some countries be used as some kind of proxy offence for investigations, since it on the one hand provides a vague and wide definition of criminalized behaviour and on the other hand (due to its theoretical connection with organized crime) usually provides the police and prosecution service with special investigative powers, like telephone tapping etc., which will not be available for every offence type. The money laundering charge might then afterwards be replaced by a charge for a (usually more severe) offence that is connected to the money laundering activity, explaining why variation on convictions level is much lower.

Individual Country Results

Finally, let us take a look at the individual country results. Figure 7 shows the distribution of the overall conformity rate on police level across Europe.Footnote 11 For this section, the overall conformity rate has been calculated in a slightly different way: Now, all offences for which a country was unable to provide data were excluded before calculating the rate. The rate in this section is therefore only calculated taking into account offences for which data were provided. Apart from this, the overall conformity rate is defined as above, 4.1 (see there for a detailed explanation).

Fig. 7
figure 7

Rate of definitions followed in all respects on police level per country

Overall conformity rates vary significantly across Europe, with rates between 14% and 82%. Countries showing high overall conformity rates can be found everywhere across Europe, with the highest rates to be found for the Republic of Ireland, Northern Ireland, Denmark, Croatia, Turkey and Georgia. But the lowest rates are concentrated in Northern and Eastern Europe and the Eastern parts of Central Europe. Countries outside this area showing low overall conformity rates are England & Wales, Scotland, the Netherlands and Switzerland.

Figure 8 shows the item conformity for all crimes together per country on police level (see also Table 7 with the exact individual country results). For an explanation of this ratio, cf. section 4.3. As could be expected, item conformity rates are higher than overall conformity rates. With the exception of an outlier (Slovakia with an item conformity rate of 57%), item conformity rates range from 72% to 98%. Countries with lower item conformity rates can once again mainly be found in Northern and Eastern Europe and the Eastern parts of Central Europe plus England & Wales and Scotland. The distribution therefore resembles the one that was found for overall conformity.

Fig. 8
figure 8

Rate of definition sub-categories (include/exclude rules) followed on police level per country

Table 7 Include and exclude rules followed on police and convictions level

If looking at the distribution of overall conformity and item conformity on convictions level, the geographical distribution is once again nearly the same (cf. Tables 4 and 7). One might theorize that offence definitions in the ESB are until so far mainly influenced by continental Western, Southern and Western Central European criminal law, leading to lower conformity rates in the other countries. This is a striking result that should be further investigated and, if the suspicion can be confirmed, should lead to an attempt to find more universal definitions.

However, it has to be kept in mind that it is always necessary to make a compromise. Therefore, regardless of how offence definitions are set, it is not likely to reach 100% conformity. But this is not a problem in itself. The ESB definition system not only has the function to optimize conformity and thus comparability, it also has the function to make the remaining differences clearly visible and thus clarify the limits of comparability.

Conclusions

Altogether, definitions in the ESB seem to be well-defined and thus show high conformity levels throughout Europe. This is true for both overall and item conformity, and equally applies to the police and convictions levels. The definitions that work best according to all checks executed here are robbery, theft total and drug offences total.

There are, however, some definitions and items that pose bigger problems. This is especially true for sexual assault, which showed problematic results in three areas: The offence turned out to have low overall and item conformity rates, with item conformity being lowest for crucial exclude rules which make differentiation from sexual harassment and sexual abuse of minors problematic. Apart from that data availability on police level was also quite low.

The only other offence showing clear problems in three areas was burglary, showing low item conformity rates and also low data availability both on police and, much more pronounced, on convictions level. The other subgroups of theft performed slightly better, but still showed serious problems: Motor vehicle theft showed problems with respect to item conformityFootnote 12 and especially to data availability on convictions level. For domestic burglary, finally, the lowest data availability on convictions level was found with less than 20% of countries being able to report any data.

A low overall and item conformity was also found for fraud, while assault mainly showed a low overall conformity, though the item conformity for the rule to exclude slapping or punching was also quite low (65%). For homicide, only the mean item conformity was quite low, mainly due to a comparatively low conformity with the rule to include assault leading to death; but the differences found do not have a large impact on the comparability of this offence. Another offence with a low overall conformity rate was corruption. This offence also had quite low data availability rates on police level.

Apart from the crimes already mentioned, data availability was also a problem for computer offences and money laundering. Both offences also showed a remarkably high variation on all levels, with most variation coefficients clearly above 200%, even reaching 320% for police-recorded suspects of money laundering. Very high variation coefficients were also found for traffic offences, an offence group which had not yet been subject to separate definition in the ESB. On the other hand, low data availability was also found for aggravated drug trafficking.

All of these results show problems connected with the respective offences and their definition. This does, however, not necessarily mean that there exists a better definition than the one already in use. But it should be tried out for future editions whether improvements are still possible in the areas identified. Legal and statistical systems across Europe are, however, too diverse as to expect full conformity and full data availability for all offence types.

Another important issue that was found out refers to the geographical distribution of conformity rates. It turned out that countries showing the lowest conformity rate were usually located in Northern, Eastern or Eastern Central Europe, plus most parts of the UK. The reasons for this striking result should be thoroughly scrutinized. The potential to revise definitions in order to reach higher conformity rates in these parts of Europe, too, should be tested for future editions.