Access to European Statistical System Microdata

The chapter presents the European microdata access system. This system allows eligible researchers to analyse detailed data transmitted to Eurostat by national statistical offices in the European Union. Eurostat is a single entry point of access to such data. Individual data collected by national statistical offices to produce official statistics are strictly confidential. The data are anonymised and further processed before they can be made available for scientific purposes. Statistical offices are legally obliged to protect information received from individual respondents. They use this information solely to produce official statistics. The entities collecting data for other purposes (e.g. administrative, commercial or health) fall into the scope of personal data protection legislation. Statistical confidentiality measures are stricter than those resulting from personal data protection measures.


The European Statistical System and European Statistics
The European Statistical System (ESS) is a partnership between Eurostat and the national statistical institutes (NSIs) and other national authorities responsible in each Member State for the development, production and dissemination of European statistics. National statistical authority (NSA) is a generic term for NSIs and other national data providers (e.g. regional statistical offices, ministries providing administrative data, etc.); a list of NSAs is available on the Eurostat website. 1 European official statistics are important for EU. They are produced and disseminated by Eurostat in partnership with NSAs. Usually, national official statistics are based on microdata, collected or accessed by NSAs. Microdata are then aggregated, transmitted to Eurostat and published. Where necessary for the production of European statistics, NSAs also transmit microdata to Eurostat (see Fig. 1). Whenever microdata are transmitted, Eurostat may consider granting access to these for scientific purposes. In this way, almost all microdata received by Eurostat are released for scientific purposes.

Microdata Access Terms and Concepts
Microdata are a form of data where sets of records contain information on individual persons, households or business entities. Traditionally, statistical offices use microdata only to produce aggregated information such as tables. Publication of individual information (microdata) is generally not allowed because it may easily lead to identification of the data subject (person, household or business entity) and therefore to a breach of statistical confidentiality.
Statistical confidentiality is one of the fundamental principles of official statistics. It is the obligation of the statistical offices to protect confidential data. 2 In the context of European statistics, confidential data are data that allow the identification of statistical units (individual persons, households or business entities), thereby disclosing individual information. The statistical unit may be identified in the different forms of statistical output, e.g. the contribution of largest companies may be approximated in business statistics. To prevent this, statistical offices check each output from the point of view of statistical confidentiality. This check is called statistical disclosure control (SDC).
The SDC methodology helps to identify confidential data in these various output forms and to hide such data, taking into account relationships between the data (e.g. additivity of the tables).
In general, official statistics are available in the form of tables where confidential data are not visible and the data are highly aggregated. But many statistical offices also make available their data in the form of microdata, namely as (see Fig. 2): • Public-use files accessible to everybody (sometimes upon registration or licence signature) • Confidential microdata files accessible to researchers satisfying specific access conditions Confidential microdata files are invaluable for the research community as they allow deep analysis of relationships in the data, i.e. causalities, dependencies, convergences, etc. Microdata access systems were developed by statistical institutes to allow legitimate access to confidential data for scientific purposes.  Tables  Microdata   Confidential  microdata Public use files

Elements of the Generic Microdata Access System
Microdata access systems define under which conditions access to confidential microdata can be granted for external persons, such as researchers. These conditions are normally outlined in legal acts. In the European Statistical System, access to microdata may be granted to researchers carrying out statistical analysis for scientific purposes. 3 Microdata files may have different levels of detail. The more detailed the data, the easier it is to identify individuals. Original statistical records can be easily identifiable as they contain unique direct identifiers such as names, address, social security number or identification number (ID number). These confidential records with direct identifiers are available to the statistical offices only under strict confidentiality protocols.
Microdata without direct identifiers are called 'de-identified' or 'pseudonymised' microdata (if direct identifiers are replaced by pseudo-identifiers: unique codes replacing all direct identifiers). De-identified microdata with pseudoidentifiers are more and more important for the production of official statistics, as they allow linking data collected from different sources, thus fostering the use of, for example, administrative sources and derivation of further results on the basis of already collected data. Pseudo-identifiers also allow the creation of longitudinal files, following individuals over time. These microdata are still confidential, as the combination of some rare characteristics may lead to identification of unique statistical units.
De-identification is a subprocess of anonymisation. In general, anonymisation is the process of making the data anonymous. However, approaches to this process differ between countries. In some countries, making the data anonymous is defined as removal of names, i.e. de-identification. In the European law, anonymisation is defined as the process aiming at complete protection of microdata, such that the records are no longer identifiable (the records cannot be linked to any 'real' person, household or business entity). The different stages of microdata anonymisation/protection are (see Fig. 3): • De-identification or pseudoanonymisation: process of removing direct identifiers (such as name, ID number and address) from the confidential data, and replacing them with pseudo-identifiers. Pseudo-identifiers can be used to link datasets. • Partial anonymisation: application of a set of SDC methods to microdata in order to reduce the risk of identification of the statistical unit. Scientific-use files are the result of partial anonymisation. • Complete anonymisation: application of SDC methods that completely eliminate the risk of identification of the statistical unit (directly or indirectly). Public-use files contain completely anonymised records. Table 1 compares all basic types of microdata files and access conditions. The terms secure-use files and scientific-use files are specific to the European microdata access system. In the EU countries, there exist similar files but with different names, e.g. scientific-use files are often called 'microdata files for research'. The basic characteristics of these files remain the same: • Secure-use files are files to which no further methods of statistical disclosure control have been applied. Researchers access these files in the secure environment provided by NSAs (local or remote access). The final results of the work of researchers are checked by NSAs to ensure that they do not reveal confidential data. Each output is checked separately. • Scientific-use files are files to which methods of statistical disclosure control have been applied to reduce (not to eliminate!) the risk of identification to an appropriate level (partial anonymisation). Researchers have access to such files outside the controlled NSA environment. There are usually no ex post controls by NSAs; researchers need to follow the confidentiality instructions and are responsible for making the published results non-confidential.
Secure use files are the richest form of microdata for research. However, the services related to provision of access are usually expensive for statistical offices. This is because of infrastructure (dedicated environment for on-site or remote access) and operational costs related to output checking.
For statistical offices, scientific-use files seem to be more efficient in terms of cost-benefit ratio. For researchers, the advantage is that they can be used without having to travel to the premises of the statistical offices (or without logging in to a remote, secure system).
Scientific-use files may be standard or tailor made, i.e. adapted to the particular needs of the research project. The risk of a breach confidentiality is smaller if standard files are released than if specific files are produced on request. For researchers, however, the standard files are often not sufficiently detailed (e.g. the researcher may not need regional details but is interested in the exact age of individuals, whereas the standard files usually provide a medium level of regional details and age in bands). The scientific-use files released by Eurostat are standard, i.e. they are prepared once for all access requests. Production of tailor-made files would be too burdensome, as the SDC protection measures must be always agreed with the NSAs.
Example of partial anonymisation methods for EU Labour Force Survey (LFS) scientific-use files: AGE-by 5-year bands NATIONALITY/COUNTRY OF BIRTH-up to 15 predefined groups NACE (economic activity)-at 1-digit level ISCO (occupation)-at 3-digit level INCOME-provided only as (national) deciles and from 2009 HHNUM-household numbers are randomised per dataset, so that respondents cannot be tracked across time The most common SDC methods to anonymise (partially or completely) the microdata files are: • Recoding: provision of information at the more general level (e.g. age bands instead of exact age). • Micro-aggregation: replacement of the original value of the variable (e.g. income) with the average of some (usually 3-5) similar units. • Record swapping: swapping of, for example, persons between similar households. Swapping adds uncertainty about the identity of the unit in a microdata file. • Rounding: replacement of original value with rounded figure. • (Local) suppression: removal of identifying variables in the record or the entire record (e.g. a very large household). • Sampling: provision of sampled microdata to increase uncertainly about identification as a record referring to particular individual may but does not have to be included in the sample.
The modes of access to secure-use files and scientific-use files are presented in Table 2.
The modes of access listed in Table 2 are complementary and some NSAs provide all options. As the operational costs may be high, the NSA services are sometimes payable.

Table 2
Modes of access to confidential data and respective protection measures Microdata type used Mode of access

Confidentiality protection
Secure-use files -On site (separate room, usually in the premises of the NSA where access is provided; researchers can see the data but have no internet access and cannot download or copy the data) The final results of data analysis are checked for confidentiality (output checking). Each output is checked separately. In some systems, only a sample of output is checked; in others, researchers do it themselves -Remote access (same functionalities as for on-site access but facilities provided online, no need to travel to the NSA) -Remote execution (authorised users submit codes that are executed on the data, but users do not see the data) The results are checked automatically ('on the fly') or manually -There might be also combinations of the above modes of access Scientific-use files Files are sent to authorised researchers and used in the premises of research entities The data are protected (partial anonymisation) before being sent to researchers. Researchers must ensure that the published results do not contain confidential data

Use Case: Access to European Statistical System Microdata (European Microdata)
How does the microdata access system work in practice? Eurostat applies a twostep procedure to grant access to microdata for research purposes. In the first step, organisations interested in accessing European microdata submit an application for recognition to Eurostat. In the second step, researchers from recognised research entities submit their concrete research proposals. 4

Step 1 Recognition as a Research Entity
The recognition of research entities aims at identifying those organisations (or specific departments of the organisations) that carry out research and can be entrusted with confidential data. The assessment criteria refer to the purpose of the entity, its available list of publications and scientific independence. The entities must also describe security measures in place for microdata protection. The content of the application is evaluated by Eurostat. Upon positive assessment, the head of a recognised research entity signs the commitment that the microdata will be used and protected according to the terms agreed. Eurostat publishes the list of recognised research entities on its website. 5 To date (2017) more than 700 research entities were recognised. The majority of them are universities and research organisations (see Fig. 4).
Recognition of research entities was introduced by Eurostat to provide a contractual link with the legal entities, rather than with individual researchers. 6 Step 2 Submission of Research Proposal In the second step, researchers from recognised entities submit their concrete research proposals to Eurostat. Eurostat then consults all national statistical authorities that provided the data. If an NSA refuses the access, the data of that country are removed from the microdata file.
To be eligible, the research proposal must specify the scientific purpose of the research in sufficient detail, justify the need to use microdata and present the expected outcomes of the research. The results of the research must be made public. Each researcher named in the research proposal as a potential user of the microdata signs an individual confidentiality declaration, in which he or she commits to respect the specific terms of use of confidential data.
In the research proposal, researchers choose the microdata collections they are interested in. In 2017 Eurostat granted access to microdata to 12 data collections (see 4 The legal basis for access to ESS microdata is Commission Regulation (EU) No. 557/2013 on access to confidential data for scientific purposes. The Regulation defines criteria for eligible research entities and research proposals. It also describes how the microdata shall be made available to researchers (modes of access). 5 http://ec.europa.eu/eurostat/documents/203647/771732/Recognised-research-entities.pdf. 6 However, in some national systems, only individual researchers are 'recognised'. Annex 1). Most of the European microdatasets are released as scientific-use files. 7 The datasets most frequently demanded by researchers are EU Statistics on Income and Living Conditions (EU-SILC) and Labour Force Survey (LFS). Together they account for more than 70% of all access requests. When the research proposal is accepted, the data are made available to the researchers. Researchers may access the data for the period specified in the research proposal. If so requested, researchers receive new releases of the approved microdatasets.

61%
Once the project is finalised, researchers send Eurostat the resulting publications, which are made available on the dedicated website. 8 Researchers must also destroy the confidential data received.
Eurostat receives around 350 applications for access to microdata per year.

Conclusions
The ESS microdata access system is specific as it creates a single entry point of access to European microdata owned by the NSAs. NSAs agree on the general access conditions (Regulation 557/2013) and are directly involved in decisions on the release of particular datasets in particular ways (anonymisation method and mode of access), and for particular projects (all NSAs are consulted about each access request).
For Eurostat, access to microdata has become a well-established process. Recently, Eurostat worked on modernising the microdata access system, e.g. launching online forms for microdata access applications and piloting online transmission of scientific-use files. The future plans aim to develop remote execution and to publish more public-use files. 9 Closer collaboration with organisations such as CESSDA (Consortium of European Social Science Data Archives) should contribute to the improvement of microdata access services provided by Eurostat. On site (safe centre in Eurostat) and off site a The years covered by the MMD datasets vary from one country to another and are subject mainly to the availability of the Community Innovation Survey and Survey on ICT Usage and e-Commerce in Enterprises data Aleksandra Bujnowska is a Statistical Officer in Unit B1 'Methodology and Corporate Architecture' at Eurostat. She is leading a team 'Statistical confidentiality and access to microdata'. For many years, she has been contributing to the development of the European microdata access system and has made several interventions on this subject at various events. She has also coordinated numerous European projects aiming at wider access to confidential data for scientific purposes and at efficient way of micro-and tabular data protection.

Annex 1: European Microdatasets Available for Scientific Purposes
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.