2.1 Introduction and Scope

2.1.1 Scope

The goals on this chapter are to:

  • discuss the fundamentals of educational data management, including issues related with data cleaning methods, metadata, data curation and storage for preserving educational data, and

  • introduce the key Ethical Principles that govern the use of educational data, especially in terms of privacy, security of data and informed consent that should be addressed via transparent and well-defined ethical policies and codes of practices.

2.1.2 Chapter Learning Objectives

Learning Objectives

Learn2Analyse

Educational data literacy

Competence profile

Know and Understand the most common quality issues of raw educational data

1.2

Understand data cleaning methods for educational datasets

2.1

Understand the advantages of enhancing educational data through data description

2.2

Understand the need for data curation in educational data management

2.3

Be able to identify storage issues for preserving educational data

2.4

Understand the importance of informed consent as a key Ethical Principle of Educational Data

6.1

Understand the significance of educational data protection policies

6.2

2.1.3 Introduction

This chapter will introduce the second key competence of educational data literacy, namely, Educational Data Management.

The first step in this imperative process is Data Cleaning. Since educational data comes from various sources, it could be really messy. It may come in diverse formats and it may contain various types of inaccuracies. Thus, it is essential to know the most common quality issues of raw educational data and understand the data cleaning methods for educational datasets.

In order to add value to the datasets, educators need to understand the advantages of enhancing educational data through data description by using Metadata, usually defined as “data about data”.

Data Curation is attributed with great importance in educational data management, in order to transform raw data into consistent data that can then be analysed.

Moreover, to ensure continued and reliable long-term access there are many important aspects we need to consider and manage, when it comes to an effective digital preservation process for the educational data.

Special focus should be given on key technical elements of digital preservation. The selected storage solution is of prime importance for digital preservation, since security and privacy issues are significant concerns.

Along with the emerging opportunities offered, education data-driven practice and assessment raise challenges such as ethical issues and implications especially in terms of privacy, security of data and informed consent that should be addressed via transparent and well-defined ethical policies and codes of practices.

Several frameworks, policies and guidelines have been developed to help institutions and educators to identify potential ethical issues and to apply clear ethical policies that govern the use of educational data.

New regulations, like the GDPR (General Data Protection Regulation) have raised awareness of data ethics issues that can arise from data misuse.

Informed consent is declared by most international guidelines as one of the pivotal principles in Data Ethics. The way individuals are informed is crucial for the informed consent process. Educators should ensure that individuals fully realize the expected consequences of granting or withholding consent.

With regards to the collection of personal data about children, additional protection should be granted since children are less aware of the risks and consequences of sharing data and of their rights.

As mentioned, in the light of rapid development of Educational Data Analytics on a global basis, new challenges to privacy and data protection have also emerged.

Do educational data analytics challenge the principles of data protection? Is privacy a show-stopper? How privacy is guaranteed/secured, especially if minors and/or sensitive data is involved?

Education professionals need to pay extra attention to sensitive data (special category of personal data) since an organisation can only process this data under specific conditions (explicit consent may be needed).

Moreover, the protection of the rights and freedoms of natural persons with regard to the processing of personal data require that appropriate technical and organisational measures are taken. In order to identify sensitive data, assess and respond to data risks and monitor implemented security processes, a Data Protection Impact Assessment (DPIA) may be required whenever processing is likely to result in a high risk to the rights and freedoms of individuals (IT Governance UK, 2016).

2.2 Adding Value to Educational Datasets (Educational Data Management)

2.2.1 Making Data Tidy (Data Cleaning)

We are surrounded by a sea of data. As per BrightBytes (2017) “The widespread availability of accurate and usable data has the potential to unlock a universe of information for educators.” We could add, that without the appropriate process of getting data ready to use (whether you call it wrangling, cleansing or simply cleaning), “data is simply a scatter of numbers”. You may also review the video “Data Wrangling for Faster, More Accurate Analysis” (in the useful video resources) showing that “Data discovery is a critical step when working with complicated data”.

In this topic, we will continue studying the language of data. It is time for the second key area of data literacy vocabulary, Educational Data Management. The first step in this imperative process is Data Cleaning. Figure 2.1 depicts the framework of data cleaning as defined by Maletic and Marcus (2000) in Data Cleansing: Beyond Integrity Analysis.

Fig. 2.1
A horizontal flow diagram exhibits the three components of the data cleaning framework, namely, error types, error instances, and correct.

Data cleaning framework. (based on Maletic & Marcus, 2000)

As mentioned, educational data comes from various sources. There is data from online learning environments, data from state tests, demographic data, data from management information systems, from open educational resources and much more. It would be really useful if we could unify all these little pieces to reveal the big picture and realize the untapped potential.

All this data could be really messy. It may come in diverse formats and it may contain various types of inaccuracies like missing values, outliers, duplicate instances. To obtain an integrated and consistent database that is free from any sort of discrepancies, data clean-up is required.

As Romero et al. (2014) describe in A Survey on Pre-Processing Educational Data, the data cleaning task concerns the detection of erroneous or irrelevant data and how to discard it.

Let’s move on and find out the most common discrepancies in data, like:

  • missing data,

  • outliers,

  • inconsistent data,

  • double instances,

and how to handle them (Fig. 2.2).

Fig. 2.2
A hexagonal diagram exhibits missing data, where an empty hexagon has no sequences of binary numbers in comparison to the five others.

Missing data

Missing values occur when no value is stored for the variable in the current observation (Little & Rubin, 2002).

When using an e-learning environment, it is very common for learners to study at their own pace, to follow their own learning path. They usually skip some activities and complete only a part of the tasks in the course. Sometimes they even drop out and never come back. Thus, missing data is very common when collecting educational data.

Romero et al. (2014) suggest several ways to handle missing data:

  • Use a label, like “null” (unspecified), or “?” (missing)

  • Use a substitute value like the attribute mean or the mode

  • By determining what is the most probable value to fill the missing value, using regression.

  • In some extreme cases, in order to clean data and ensure their completeness, learners who have all or almost all their values missed can be removed from data.

An outlier is an observation that has values which deviate from the expected, either too large or too small from most other observations (Fig. 2.3). They may be caused by typographical errors or errors in measurement. Remember when NASA lost a Spacecraft due to a Metric Math mistake (Harish, 2019)?

Fig. 2.3
A hexagonal diagram exhibits an outlier, where a relatively huge hexagon with binary numbers is observed among other hexagons of the same size.

Outliers

In datasets, different scales of numerical values are often used to make it easier for humans to read. For example, in budget datasets, the units are often in the millions. 1,500,000 often becomes 1.5 m. However, smaller amounts like 400,000 are still written in full. As a result, 1.5 m looks like it is an outlier, while it is an inconsistency in data types and formats.

However, Romero et al. (2010) indicate that “outliers may be phenomena of interest in a dataset, it could be correct and represent real variability for the given attribute.”

In the context of educational data, outliers can be often true observations (Romero et al., 2014). For example, there are always exceptions among learners, who succeed with little effort or fail against all expectations. In another example, very high values are often recorded for time-spent because the learner had not signed-out before leaving the digital learning environment.

It is clear that not all outliers are errors. It depends on the aims of the analysis, whether these outliers should be eliminated or not, and requires knowledge of the context in which the data was produced and collected.

Fig. 2.4
A hexagonal diagram exhibits inconsistent data, where a hexagon contains sequences of letters unlike the binary numbers in the other hexagons.

Inconsistent data

Inconsistent data (Fig. 2.4) appears when a data set or group of data is dramatically different from a similar data set (conflicting data set) for no apparent reason (Romero et al., 2014).

For example, imagine negative values for the age of a person or height data measured either in meters or in centimetres. In fact, some incorrect data may also result from inconsistencies in naming conventions or data codes in use, or inconsistent formats for input fields, such as a date (Chakrabarti et al., 2009). The most common error is the mixed use of American (MM/DD/YYYY) and European (DD/MM/YYYY) formats (see Date formats around the world).

People often try to save time when entering data by abbreviating terms. If these abbreviations are not consistent, it can cause errors in the dataset. Differences in capitalisation, spacing, and genders of adjectives can all cause errors. There can be numerous inconsistencies. We have to deliberately deal with them. At the same time, it is in every case better to log the details of our procedure cautiously for future reference.

Fig. 2.5
A hexagonal diagram exhibits double instances in data, where two hexagons with identical sequences of binary numbers are observed.

Double instances

Data deduplication is a process that reduces storage overhead by eliminating redundant copies of data and, ensuring that storage media retain only unique instances of data. A duplicate record is where the same piece of data has been entered more than once (Fig. 2.5). Duplicate records often occur when datasets have been combined or because it was not known there was already an entry.

In educational organisations, data integration and correlation are essential activities related to data collection. Information obtained from multiple sources usually leads to duplicated data observations and inaccurate data. This duplicate elimination is one of the most important steps in the data cleaning process. The procedure of detecting and eliminating duplicates from a particular data set is called Deduplication.

Fig. 2.6
A diagram presents 6 activities. Data scientists spend most of their time cleaning and organizing data at 60 percent while building training sets is the least at 3.

What data scientists spend the most time doing

According to Crowdflower Data Science Report 2016, scientists spend the most time collecting and cleaning data (Fig. 2.6). Messy data is by far the most time-consuming aspect of the typical data scientist’s workflow.

The point with data is that it needs to be regularly maintained to ensure that data remains clean and crystal clear Ronald van Loon (2018).

Much of the data may be unstructured, noisy and in need of thorough cleansing and preparation before it is ready to yield working insights Big Data expert, Bernard Marr (2017).

Questions and Teaching Materials

  1. 1.

    Finally, after Alice collected the necessary parental consent for her intervention, the flipped classroom course is up and running.

    After running the online course for three weeks, Alice tracks her students’ activity in the online learning environment. Thus, she also collects data related to students’ engagement, behaviour and performance in the LMS e.g. time spent in the platform, the videos her students watched, their progress in the online course, downloaded files, their online quiz scores, their participation in the forum as well as interaction among them.

    Before proceeding further, Alice confirms that the collected data meets basic quality characteristics. She watches the video “Data Wrangling for Faster, More Accurate Analysis”. Thus, she examines and verifies the educational data against different quality measures. Inconsistences in data, like missing pieces, errors, even differences in how the same value is expressed, produce inaccurate results.

    • True

    • False

Correct answer: True

  1. 2.

    Alice has collected educational data from various sources (data from online learning environments, data from state tests, demographic data, data from management information systems, from open educational resources and much more) and she wants to unify the datasets in order to reveal the big picture.

    Alice soon realizes that the data coming from various sources in diverse formats, is quite messy, containing missing values, outliers, and duplicate instances. To obtain a consistent database, free from any sort of discrepancies, data cleaning is required so as to detect erroneous or irrelevant data and discard it.

    In the framework of data cleaning, as defined by Maletic and Marcus (2000) and presented in fig. 2.1, the following three phases define a data cleansing process.

    Help Alice to arrange the phases in the right order:

    1. A.

      Correct the uncovered errors

    2. B.

      Define and determine error types

    3. C.

      Search and identify error instances

Correct answer: B – C – A

  1. 3.

    Alice has collected data from the Learning Management System and she realizes that some users accessed her course just once (in error or in order to see one specific resource or to do an activity) but never returned to the course later.

    What would you suggest Alice to do in order to handle the missing values?

    1. A.

      to use a label, like “null” (unspecified), or “?” (missing)

    2. B.

      to use a substitute value like the attribute mean or the mode

    3. C.

      by determining what is the most probable value to fill the missing value, using regression.

    4. D.

      by removing these learners from the dataset.

Correct answer: D

  1. 4.

    Alice has extracted the following dataset containing file downloads data from the school’s Learning Management System.

 

File1.pdf

File2.pdf

File3.pdf

File4.pdf

File5.pdf

File6.pdf

File7.pdf

File8.pdf

File9.pdf

Student1

2

1

0

2

1

1

0

1

2

Student2

1

3

2

1

1

1

2

1

1

Student3

1

1

2

1

1

0

1

2

3

Student4

12

14

18

20

16

15

14

12

9

Student5

1

0

1

2

1

2

1

0

2

Student6

1

2

1

1

1

1

3

2

1

Student7

0

1

2

3

1

1

1

2

1

Student8

1

1

0

1

2

2

1

0

2

Student9

1

1

2

1

1

1

3

2

1

Student10

1

0

1

2

3

1

1

2

1

Student11

16

15

14

12

9

12

11

10

8

Student12

1

2

1

0

2

1

0

1

2

Student13

1

1

3

2

1

1

2

1

1

Student14

1

1

1

2

1

0

1

2

3

Student15

1

0

1

2

1

2

1

0

2

Student16

1

2

1

1

1

1

3

2

1

Student17

0

1

2

3

1

1

1

2

1

Student18

1

0

1

2

1

2

1

0

2

Student19

1

2

1

1

1

1

3

2

1

Student20

0

1

2

3

1

1

1

2

1

Student21

2

1

0

2

1

0

1

2

1

Student22

1

3

2

1

1

2

1

1

1

Student23

1

1

2

1

0

1

2

3

1

She can easily identify two outliers (Student4 and Student11). Help Alice to decide what to do with these outliers, in order to proceed with the data analysis. These outliers:

  1. A.

    are errors and should be eliminated in order to proceed.

  2. B.

    are true observations and should not be eliminated.

Correct answer: B

  1. 5.

    Alice participates in an International Conference on Teaching and Learning. Therefore, she must prepare a review of students’ performance from 6 different countries in three main subjects, namely Maths, English, and Science.

    Students’ performance data from 6 different countries are collected in the following table.

 

Date of Birth

Student

Maths

English

Science

Country

1

4/9/2008

Richard

95

68

96

USA

2

9/10/2007

David

65

78

70

UK

3

12/12/2009

Mary

59

55

53

USA

4

6/12/2010

Ann

97

99

98

France

5

8/13/2011

Elen

100

97

98

Greece

6

11/14/2010

Catherine

67

59

70

UK

7

9/14/2005

James

54

67

63

USA

8

5/17/2006

Martha

79

83

88

Italy

9

4/17/2007

Bill

84

78

90

UK

10

8/18/2007

Phil

45

78

55

USA

11

9/18/2008

James

75

83

88

Itally

12

10/19/2009

Tom

85

89

92

Greece

13

6/19/2010

Joe

9,4

9,7

9,1

UK

14

9/20/2029

Jill

49

60

53

Canada

15

5/17/2006

Martha

79

83

88

Italy

16

12/12/2009

Mary

59

55

53

USA

17

24/10/2010

Tony

96

79

100

Italy

18

8/24/2006

Lisa

79

−75

69

UK

19

5/25/2004

Robert

97

83

90

USA

20

4/25/2029

Michael

100

89

55

Italy

21

25/6/2007

Rose

67

97

88

Greace

22

8/26/2008

Sofia

54

60

92

UK

23

9/26/2009

Jim

97

88

67

Greece

24

4/26/2006

Betty

60

92

54

France

Alice soon realises that the key to finding the inconsistencies is to create a filter. The filter will allow her to see all of the unique values in the column, making it easier to isolate the incorrect values. (Source: https://edu.gcfglobal.org/en/excel-tips/a-trick-for-finding-inconsistent-data/1/).

After examining carefully this table, please help Alice to select the inconsistencies you have identified

  1. A.

    negative values for students’ grades

  2. B.

    different data formats

  3. C.

    typos in dates

  4. D.

    differences in spaces

  5. E.

    different grades’ scale

  6. F.

    typos in country data

  7. G.

    differences in capitalisation

Correct answers: A, B, C, E, F. In our example, we can identify the following inconsistencies: In row 21 Greece is misspelled and in row 11 Italy has double l; In row 18 there is a negative value for the grade in English; In row 13 grades are in different scale; In rows 14 and 20 dates are out of range; and In rows 17 and 21 dates are in different format (DD/MM instead of MM/DD).

  1. 6.

    Alice participates in an International Conference on Teaching and Learning. Therefore, she must prepare a review of students’ performance from 6 different countries in three main subjects, namely Maths, English, and Science.

    Students’ performance data from 6 different countries are collected in the following table.

 

Date of Birth

Student

Maths

English

Science

Country

1

4/9/2008

Richard

95

68

96

USA

2

9/10/2007

David

65

78

70

UK

3

12/12/2009

Mary

59

55

53

USA

4

6/12/2010

Ann

97

99

98

France

5

8/13/2011

Elen

100

97

98

Greece

6

11/14/2010

Catherine

67

59

70

UK

7

9/14/2005

James

54

67

63

USA

8

5/17/2006

Martha

79

83

88

Italy

9

4/17/2007

Bill

84

78

90

UK

10

8/18/2007

Phil

45

78

55

USA

11

9/18/2008

James

75

83

88

Italy

12

10/19/2009

Tom

85

89

92

Greece

13

6/19/2010

Joe

94

97

91

UK

14

9/20/2009

Jill

49

60

53

Canada

15

5/17/2006

Martha

79

83

88

Italy

16

12/12/2009

Mary

59

55

53

USA

17

10/24/2010

Tony

96

79

100

Italy

18

8/24/2006

Lisa

79

75

69

UK

19

5/25/2004

Robert

97

83

90

USA

20

4/25/2009

Michael

100

89

55

Italy

21

6/25/2007

Rose

67

97

88

Greece

22

8/26/2008

Sofia

54

60

92

UK

23

9/26/2009

Jim

97

88

67

Greece

24

4/26/2006

Betty

60

92

54

France

After searching the web for answers, Alice finds out that she can identify duplicate rows by selecting Home-Conditional Formatting-Highlight Cell Rules-Duplicate Values in MS Excel.

Help Alice identify the duplicates. How many duplicates can you identify?

  1. A.

    None

  2. B.

    One pair of rows

  3. C.

    One triplet of rows

  4. D.

    Two pairs of rows

Correct answer: D

  1. 7.

    After reading the Crowdflower Data Science Report, Alice realises that mining data for patterns and refining algorithms are the two most time-consuming tasks of a data-scientist’s workflow.

    • True

    • False

Correct answer: False.

  1. 8.

    ACTIVITY/PRACTICE QUESTION (Reflect on)

    We encourage you to elaborate on your response about data cleaning in the following reflective task. You may reflect on:

    1. 1.

      Identify factors that contribute to inconsistencies to educational datasets generated from online courses

    2. 2.

      How can we explain the existence of outliers in educational data?

2.2.2 Data to Describe Data (Metadata)

Metadata is usually defined as “data about data”. Johnson et al. (2018) provide the following definition about metadata “It is information about a data set that is structured (often in machine-readable format) for purposes of search and retrieval. Metadata elements may include basic information (e.g., title, author, date created) and/or specific elements inherent to data sets (e.g., spatial coverage, time periods).”

However, in the context of education, metadata can more aptly be defined as tags used to describe educational assets.

Metadata helps:

  • to organize,

  • find and

  • understand data

Metadata answers the following questions about data:

  • Who created it?

  • What is it?

  • When was it created?

  • How was it generated?

  • Where was it created?

  • How may it be used?

  • Are there restrictions on it?

Practical examples of metadata: https://dataedo.com/kb/data-glossary/what-is-metadata Kononow (2018), Fig. 2.7)

Fig. 2.7
A diagram exhibits the transfer of text, image, video, and audio files from raw data to annotated data.

Examples of metadata

In Understanding Metadata 2017, from the National Information Standards Organization, Riley (2017) distinguishes the three types of metadata (see Fig. 2.8):

  • Descriptive metadata

  • Administrative metadata

  • Structural metadata

Descriptive metadata can describe a learning asset or resource related to education — including learning standards, lessons, assessment items, books, etc. — for purposes such as identification, search and discovery. Descriptive metadata can be thought of as a keyword or tag on an asset that makes it easier to find. Examples include subject, grade level, and related skills and concepts.

Fig. 2.8
A diagram exhibits the types of metadata, namely, descriptive, administrative, and structural. Their corresponding components are listed on the right.

Types of metadata

Administrative metadata is used to manage a learning asset. Examples of this type of metadata include status, disposition, rights and licensing.

Structural metadata describes how data is organized or formatted and is often governed by a widely-adopted standard that ensures the data is accurately represented when exchanged and presented. Structural metadata enables content to be machine readable.

Metadata are used for the purposes of:

  • Discovery of information

  • Identification of a resource

  • Interoperability, exchange of content between systems

  • Digital-object management i.e., deliver the appropriate version.

  • Preservation helps signalling when preservation actions should be undertaken

  • Navigation within parts of items

Primary uses of various metadata types are presented in the Table 2.1 below (adapted from Understanding Metadata, 2017).

Table 2.1 Primary uses of various metadata types

The video from the National Archives of Australia “Meta… What? Metadata” (in the useful video resources) helps us understand the importance of metadata in order to describe, use, find and manage content and data.

The National Information Standards Organization describes “data interoperability, as the effective exchange of content between systems. Interoperability relies on metadata describing that content so that the systems involved can effectively profile incoming material and match it to their internal structures.” You may also review this video “Learn More About Data Interoperability” (in the useful video resources).

Questions and Teaching Materials

  1. 1.

    Alice has heard of “metadata”, but she is not quite sure what it means or why she might need it. She downloaded this photo from pxhere.com an online community sharing copyright-free images.

A photograph of a greater flamingo in a flamboyance standing in a body of water surrounded by vegetation. The 2 windows below exhibit the properties.

Photo’s properties

What information can Alice gather from photo’s metadata? Match the questions from the first column with the values in the second column.

Question

Value

A. Who created the photo?

1. Greater Flamingo

2. CC0 Public Domain

3. 12/1/2020 7:38 PM

B. What is it?

4. 7/11/2020 5:27 PM

5. Alice

C. When was it created?

6. Canon EOS 6D Mark II

7. 219 mm

D. How was it generated?

8. MARTIN TRNKA

9. sRGB

E. What are the photo’s copyrights

10. ISO-200

11. Digital Photo Professional

Correct answer: A8 – B1 – C4 – D6 – E2

  1. 2.

    Open educational resources (OER) are freely accessible, openly licensed text, media, and other digital assets that are useful for teaching, learning, and assessing as well as for research purposes. The term OER describes publicly accessible materials and resources for any user to use, re-mix, improve and redistribute under some licenses.

    OER Repositories are repositories of open educational resources covering most of educational disciplines. Open Repositories are websites which house open books, textbooks, lectures, tutorials, quiz/test, case studies, assessment tools, images, syllabi, simulations, online courses and other resources of educational value.

    Photodentro OER repositories is the Greek National Learning Object Repository (LOR) for primary and secondary education. It hosts reusable learning objects (small, self-contained reusable units of learning). It is open to everyone, pupils, teachers, parents, as well as anybody else interested. The URL for accessing Photodentro LOR is http://photodentro.edu.gr/lor.

    For the purpose of collecting learning material for the flipped classroom initiative, Alice has found the following Learning Object (LO) in Photodentro OER repositories:

A monochromatic photograph of the British Museum with the text Lost... in the Museum in bright color. Information and volume buttons are observed.

Alice is studying the Learning Object’s metadata page (http://photodentro.edu.gr/lor/r/8521/2705?locale=en) to find answers to the following questions:

  1. 1.

    What is the Subject Area of the LO?

    1. A.

      English Language > Literature – Art – Culture > Reading

    2. B.

      FOREIGN LANGUAGE

    3. C.

      B1-medium knowledge

    4. D.

      Lost in the Museum (mystery game)

Correct answer: A.

  1. 2.

    What are the Licence Terms of the LO?

    1. A.

      Creative Commons Attribution-NoDerivatives Greece 3.0

    2. B.

      Creative Commons Attribution-ShareAlike 3.0 International License.

    3. C.

      Creative Commons Attribution-NonCommercial-ShareAlike Greece 3.0

    4. D.

      Creative Commons Attribution-NonCommercial-NoDerivatives Greece 3.0

Correct answer: C.

  1. 3.

    What is the Date of Publication?

    1. A.

      02/09/2019

    2. B.

      03/09/2019

    3. C.

      7/12/2020

    4. D.

      19/05/2013

Correct answer: D.

  1. 4.

    What is the File Size?

    1. A.

      4.91 MB

    2. B.

      12–15 MB

    3. C.

      25 MB

    4. D.

      8125 MB

Correct answer: A.

  1. 5.

    After watching the video “ Meta… What? Metadata! ” Alice realises one of the most common uses of metadata, which is to group content, making it more efficient to retrieve it during a search.

    • True

    • False

Correct answer: True.

  1. 6.

    Alice watches the video from the League of Innovative Schools “ Learn More About Data Interoperability ” promoting the movement to advance data interoperability in public education.

    In this video, data interoperability is defined as the seamless, safe and controlled exchange between applications, with clear standards for how to send and receive student information, privately and securely.

    • True

    • False

Correct answer: True

  1. 7.

    ACTIVITY/PRACTICE QUESTION (Reflect on)

    We encourage you to elaborate on your response about metadata, in the following reflective task. You may reflect on:

    The advantages of enhancing educational data through data description.

2.2.3 The Significance of Data Curation

According to ICPSR (2018), “Through the curation process, data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future. Without curation, however, data can be difficult to find, use, and interpret” (Fig. 2.9).

Fig. 2.9
A diagram presents organize, enhance, and reuse as the 3 processes involved in the data curation cycle. Their corresponding subprocesses are depicted.

Data curation

Michael Stonebraker (2014), defines data curation as the process of turning independently created data sources (structured and semi-structured data) into unified data sets ready for analytics, using domain experts to guide the process. It involves:

  • Identifying data sources of interest (whether from inside or outside the enterprise)

  • Verifying the data (to ascertain its composition)

  • Cleaning the incoming data (for example, 99,999 is not a legal zip code)

  • Transforming the data (for example, from European date format to US date format)

  • Integrating it with other data sources of interest (into a composite whole)

  • Deduplicating the resulting composite data set.

Castanedo (2015), on the other hand, describes data curation as the process that involves data cleaning, schema definition/mapping, and entity matching to transform raw data into consistent data that can then be analysed. Schema definition/mapping is making associations among data attributes and features. Entity matching is finding data in different data sources that refer to the same entity. Entity matching is essential to remove duplicate records.

In this video, “ICPSR 101: What is Data Curation?” (in the useful video resources), ICPSR explains the intricacies of the work data processors do every day to find and fix issues in the data, ensuring their long-term availability and value to the research community.

According to The Digital Curation Centre (DCC) Fig. 2.10 provides a graphical, high-level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt through the iterative curation cycle.

Fig. 2.10
A model exhibits conceptualization and disposal as the origin and endpoint of the sequential actions of the four full life cycle actions. Migration and reappraisal are depicted.

The DCC curation lifecycle model. (Source: diagram from Higgins, 2008)

We can identify four full life cycle actions:

  • Description and Representation

  • Preservation Planning

  • Community Watch and Participation

  • Curate and Preserve

The outer cycle represents the sequential actions of the data curation process:

  • Conceptualise

  • Create or Receive

  • Appraise and Select

  • Ingest

  • Preservation Action

  • Store

  • Access, Use and Reuse

  • Transform

Digital curation is all about maintaining and adding value to a trusted body of digital information for future and current use; specifically, the active management and appraisal of data over the entire life cycle (Jisc, 2006).

You may also review the video “Data Curation @UCSB”, (in the useful video resources) to watch how UCSB Library eyes digital curation service to help preserve research data created across campus.

Now that we have completed the hard work to make our data tidy and meaningful, we will put in a little extra effort to preserve our valuable results.

Thus, we will discuss Digital Educational Data Preservation which is considered a key task in the data curation process, to safeguard our unique educational data from getting stolen, destroyed or simply lost.

Questions and Teaching Materials

  1. 1.

    Alice is studying the Data Curation Process to ensure that data is reliably retrievable for future reuse, and to determine what data is worth saving and for how long.

    Help Alice match the following Data Curation processes to the appropriate Data Curation Phase.

Data curation process

Data curation phase

A. Cleaning

Phase 1: Organize

B. Presenting

C. Annotating

D. Preserving

Phase 2: Enhance

E. Collecting

F. Tagging

Phase 3: Reuse

G. Deduplicating

H. Publishing

Correct answer: A1-B3-C2-D3-E1-F2-G1-H3.

  1. 2.

    Data Curation is not quite clear to Alice, so she watches the video from ICPSR (“ ICPSR 101: What is Data Curation? ”) explaining what data curation is all about. According to this video, the purpose of data curation is to ensure that people can find data now and in the future. This can be achieved by following the 5 steps of data curation.

    Please help Alice to arrange the following steps in the right order:

    1. A.

      Find and fix issues with data

    2. B.

      Identify data in the scope of the archive

    3. C.

      Ensure that data will last forever (or at least for a very long time)

    4. D.

      Make data findable and usable

    5. E.

      Get data (convince the data owners to share it)

Correct answer: B-E-A-D-C

  1. 3.

    Alice studies the Digital Curation Centre’s (DCC) Curation Lifecycle Model . According to this complex diagram, there are four full lifecycle actions and eight sequential actions of the data curation process.

    Please help Alice to select only the full lifecycle data curation actions from the following list.

    1. A.

      Create or Receive

    2. B.

      Description and Representation

    3. C.

      Access, Use and Reuse

    4. D.

      Appraise and Select

    5. E.

      Preservation Planning

    6. F.

      Curate and Preserve

    7. G.

      Transform

    8. H.

      Community Watch and Participation

Correct answers: B, E, F, H

  1. 4.

    The last step of Data Curation Cycle is to ensure that data will last forever (or at least for a very long time). Alice is anxious, how can digital records last “forever”? What if the technology becomes obsolete?

    Thankfully, in the “Data Curation @UCSB” video Alice just watched Greg Janee, a Digital Library Research Specialist claims that digital information is far more robust than paper.

    Is Alice’s understanding correct?

    • Yes

    • No

Correct answer: No.

  1. 5.

    ACTIVITY/PRACTICE QUESTION (Short answer)

    Name some of the data curation actions described in this session.

    An empty rectangle in horizontal orientation.
  2. 6.

    ACTIVITY/PRACTICE QUESTION (Reflect on)

    We encourage you to elaborate on your response in the following reflective task. You may reflect on:

    The significance of data curation in educational data management.

2.2.4 Storage Issues for Preserving Educational Data

As explained in the short Library of Congress video “Why Digital Preservation is Important for Everyone” (in the useful video resources), traditional information sources such as books, photos and sculptures can easily survive for years, decades or even centuries but digital items are fragile and require special care to keep them useable. Rapid technological changes also affect digital preservation. As new technologies appear, older ones become obsolete, making it difficult to access older content.

This video explores the complex nature of the problem, how digital content, unlike content on traditional media, depends on technology to make it available and requires active management to ensure its ongoing accessibility.

Preservation is no longer simply a concern for memory institutions in the long term but for everyone interested in using and accessing digital materials. The greater the importance of digital materials, the greater the need for their preservation: digital preservation protects investment, captures potential and transmits opportunities to future generations and our own. Digital materials – and the opportunities they create – are fragile ((Digital Preservation Handbook), Digital Preservation Coalition (2015). 

Jisc, 2006 defines Digital Preservation as “the series of actions and interventions required to ensure continued and reliable access to authentic digital objects for as long as they are deemed to be of value. This encompasses not just technical activities, but also all of the strategic and organisational considerations that relate to the survival and management of digital material”.

According to Principles and Good Practice for Preserving Data, “A sustainable preservation programme addresses organisational issues, technological concerns and funding questions” (Interuniversity Consortium for Political and Social Research (ICPSR), 2009). The simple questions to be answered:

  • Organisational Issues: “What are the requirements and parameters for the organisation’s digital preservation programme?”

  • Technological Issues: “How will the organisation meet defined digital preservation requirements?”

  • Resources Issues: “What resources will be needed to develop and maintain the digital preservation programme?”

Figure 2.11 is based on Digital Preservation Handbook (Digital Preservation Coalition, 2015), and presents the most important aspects we need to consider and manage, so as to ensure an effective digital preservation process for our educational data.

Fig. 2.11
A diagram lists organizational, technological, and resourcing issues that should be tackled in the pursuit of digitally preserving educational data.

The most important aspects we need to consider and manage, so as to ensure an effective digital preservation process for our educational data

Even though our main focus is not to drill down deep into technical details and aspects of digital preservation issues, which are not part of educators’ main role, however it is essential to get an overview and understanding so as to be able to collaborate effectively with the responsible technical team, using a common language. Thus, next we will discuss briefly such issues for the effective educational data digital preservation.

The first steps that need to be undertaken in order to begin to build or enhance the needed digital preservation activities are summarized in Fig. 2.12. You may further review detailed information in Digital Preservation Handbook (Digital Preservation Coalition, 2015).

Fig. 2.12
A diagram of 7 digital preservation activities starting with understanding the nature and extent of your digital collections and ending with documenting the process.

Digital preservation activities

Special focus should be given on these key technical elements of digital preservation, as specified under USGS Guidelines, 2014:

  • Storage & Geographic Location – Storage systems, locations, and multiple copies to prevent loss of data.

  • Data Integrity – Procedures to prevent, detect, and recover from unexpected or deliberate changes to data.

  • Information Security – Procedures to prevent human-caused corruption of data, deletion and unauthorized access.

  • Metadata – Documentation of the data to enable contextual understanding and long-term usability.

  • File Formats – File types, data structures, and naming conventions to aid long-term preservation and reuse.

  • Physical Media – Reduce obsolescence risks that can threaten the readability of physical media.

To assess an organization’s readiness, it is recommended that these components are checked against the National Digital Stewardship Alliance (NDSA) ‘Levels of Digital Preservation’ (Phillips et al., 2013):

  • Level 1 – protect your data

  • Level 2 – know your data

  • Level 3 – monitor your data

  • Level 4 – repair your data

With regards to the storage technology, it has changed dramatically over the last twenty years. Initially, the norm was storing data using discrete media items, such as CDs/DVDs and hard-disk drives. Today, it has become common practice to use IT storage systems for the increasingly large volumes of digital material that needs to be preserved and to be easily and quickly retrievable (Digital Preservation Coalition, 2015).

At this point it is important to clarify the difference between backup and digital preservation process. Backup refers to “short-term data recovery solutions following loss or corruption” (Jisc, 2006). Preservation storage systemsrequire a higher level of geographic redundancy, stronger disaster recovery, longer-term planning, and most importantly active monitoring of data integrity in order to detect unwanted changes such as file corruption or loss” (Digital Preservation Handbook).

The selected storage solution is of prime importance for digital preservation. When selecting the storage strategy there are several options we need to consider, such as Cost and Scalability, required Capacity, Security, Remote Access, Collaboration and Disaster Recovery. Legal provisions due to privacy or confidentiality may also influence our decision. Figure 2.13 summarizes the pros and cons of each of the two basic storage methods, on-premises servers (local infrastructure/data centres) and Cloud-based storage, as well as recommended actions to comply with the latest regulations (COMPARE THE CLOUD, 2018). You may also review the video “Public Cloud vs Private Cloud vs Hybrid Cloud” (in the useful video resources), which compares and contrasts public, private and hybrid clouds: the basic elements of each, the features and benefits that each delivers, and how each type meets specific business needs.

Fig. 2.13
A diagram presents the respective benefits and drawbacks, along with accompanying guidance, of on premises servers and cloud based storage.

Two storage methods

In their 2018 report, Data Management Life Cycle Final report, Miller and his colleagues recognise the demand for cost-effective storage technologies. “More and more organizations are considering outsourcing storage services or cloud storage options because the availability of cloud computing resources opens up possibilities for users to purchasing access to computing power and storage space as a service instead of maintaining it themselves. This way, providers are responsible for the performance, reliability, and scalability of the computing environment, while users can concentrate on data analysis and production”.

Nevertheless, security and privacy are significant concerns holding back use of the cloud, particularly for confidential, sensitive, or personally identifiable information. Let’s not forget what happened at Code Space, which led to data deletion and the eventual shutdown of the company.

The most common risks we need to consider include: Downtime and service outages since cloud computing systems are internet based, vulnerability to external cyber-security attacks, compliance and legal issues depending on the applied regulation, lifetime costs that could end up being higher than you expected as well as limited control and flexibility since the cloud infrastructure is owned, managed and monitored by the service provider.

Despite these concerns, the potential of cloud storage seems to be more promising than the associated risks which are expected to diminish over time. As per Gartner “Through 2025, 99% of cloud security failures will be the customer’s fault” (Panetta, 2019). and “Organizations that do not have a high-level cloud computing strategy driven by their business strategy will significantly increase their risk of failure and wasted investment” (Cearley, 2017).

Whichever is our choice, even a hybrid storage solution, we need to realize that storage technologies present several risks to long-term preservation of data. Moreover, “Many cases of content loss are not necessarily due to technical faults but can come from human error, lack of budget, or a failure to regularly monitor the integrity of the stored data” (Digital Preservation Coalition, 2015) (Fig. 2.14).

Fig. 2.14
A diagram exhibits five good practice characteristics of a storage strategy. Keeping multiple independent copies of digital materials is one of them.

Characteristics of good practice for storage strategy

Let’s now take a closer look at security issues and particularly cybersecurity.

According to Digital Preservation Handbook, security issues relate to:

  • system security (e.g., protecting digital preservation and networked systems / services from exposure to external / internal threats),

  • collection security (e.g., protecting content from loss or change, the authorisation and audit of repository processes), and

  • the legal and regulatory aspects (e.g. personal or confidential information in the digital material, secure access, redaction).

When it comes to cybersecurity, protecting educational data requires both administrative and technological security measures, in order to prevent unauthorized parties from accessing it. In the below Fig. 2.15, you may review some of these countermeasures to create an effective defence against cyber-attacks.

Fig. 2.15
A diagram presents the technology, breach, and people preparedness, incident detection and response, and asset management as cyberattack countermeasures.

Countermeasures against cyber-attacks

In order to help school protect against cyberthreats and develop effective security programs, there is also a really useful Report about K-12 Security Risk Methodology (Woody, 2004), emphasizing that while technology “is broadly used in the K-12 environment by many participants including administrators, teachers, parents, students, school board members, etc.” “while this enables a wide range of useful activities, the risk for inappropriate and illegal behaviour that violates privacy, regulations, and common courtesy is increasing exponentially”.

The thing that kept me awake at night (as NATO military commander) was cybersecurity. Cybersecurity proceeds from the highest levels of our national interest ... through our medical, our educational, to our personal finance (systems). (Admiral James Stavridis, Ret.Former-NATO Commander in Cybersecurity and Digital Business Risk Management, 2020).

To this point we have provided an overview of the key issues of digital preservation and realized its importance to maintain usable our educational data over time. You may also review in this video “How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTV” (in the useful video resources), the (mostly) true story of how ‘Toy Story 2’ was almost deleted from Pixar Animation’s computers during the making of the film. And how the film was saved by one mom’s home computer!

Let us move forwards to identify good practices and appropriate actions to collect the needed data, as well to protect this data and safeguard its privacy, especially when it comes to sensitive educational data.

After all, “Data protection is all about protecting people – not just files and computer systems” (Moore Barlow, 2018).

Questions and Teaching Materials

  1. 1.

    Following the discussion with the DPO about the school’s preservation strategy and policies, Alice starts wondering. Is digital content so fragile, after all? Should I find more about preservation issues to protect my course’s digital content?

    Alice accesses the video “Why Digital Preservation is Important for Everyone”.

    She now understands that though traditional information sources can easily survive for years, decades and even centuries, digital items require special care to preserve them. More specifically, the digital items are fragile as they require special care to keep them usable, they are dependent as they depend on technology to make them available and require active management to ensure their ongoing accessibility.

    Is this assumption True or False? Please select the right answer.

    • True

    • False

Correct answer: True

  1. 2.

    Alice soon realises that she needs to seek “guidance on key issues and actions to consider when creating digital materials to ensure their longevity of active use and potential for long-term preservation” ( Digital Preservation Handbook ).

    Please mark the correct key elements corresponding to each category of issues that Alice needs to address for digital preservation.

 

Organisational issues

Technological issues

Resources issues

Integrity of Data over time

 

X

 

Legal Compliance

X

  

Budgets and Costs

  

X

Balancing Security and Access

X

  

Staffing and needed Skills

  

X

Information Security

 

X

 

Collaboration

X

  

Facilities Required

  

X

Metadata Standards

 

X

 

Selection of Data to be Preserved

X

  

Sustainable File Formats

 

X

 

Correct answers: as marked with X above

  1. 3.

    Alice is presently at the point of investigating on the key technical elements of digital preservation.

    It’s a bit hard for her to deal with such technical issues. Are you ready to help her?

    You may review the definitions of the key technical elements of digital preservation, presented in page 2 of the USGS Guidelines, 2014.

    Please match the appropriate definition (from the right column), to the respective technical element (in the left column).

1. Metadata

A. Basic recommendations to reduce obsolescence risks that can threaten the readability of physical media

2. Physical Media

B. Storage systems, locations, and multiple copies to prevent loss of data

3. Information Security

C. File types, data structures, and naming conventions to aid long-term preservation and reuse

4. File Formats

D. Procedures to prevent human-caused corruption of data, deletion, and unauthorized access

5. Storage & Geographic Location

E. Documentation of the data to enable contextual understanding and long-term usability

Correct answers: 1-E, 2-A, 3-D, 4-C, 5-B

  1. 4.

    Let’s go back to Alice. She gets informed by the responsible colleague about the hybrid storage solution used by the school. It’s a combination of local infrastructure/data centre and cloud-based storage. Moreover, as per her school guidelines for data storage good practice strategy, she needs to create multiple independent copies to stabilize her files. The copies are geographically separated in different locations, using different storage technologies and are actively monitored to ensure any problems are detected and corrected.

    She wonders about the criteria that influenced the school’s decision making for the selected storage solution for digital preservation. Can you help her specify these selection criteria?

    Please select the right answers.

    1. A.

      Collision

    2. B.

      Security

    3. C.

      Disaster Recovery

    4. D.

      Redundancy

    5. E.

      Cost

Correct answers: B, C, and E.

  1. 5.

    Alice is now interested in learning more about cost-effective storage technologies and more specifically about storing data on the cloud. What is a cloud and why there are different types of clouds? She decides to watch again the video “ Public Cloud vs Private Cloud vs Hybrid Cloud ”.

    Can you assist Alice in getting a deeper understanding of cloud-based storage?

    Please select the right answer(s). You may select more than one answer.

    1. A.

      Clouds are smart, automated and adaptive

    2. B.

      Clouds are less efficient and cost effective that traditional Data Centers.

    3. C.

      Public clouds are hosted by a cloud service provider and tenants pay for services they actually use.

    4. D.

      Private Clouds provide higher scalability and lower control.

    5. E.

      Hybrid clouds are a combination of both private and public clouds enabling the creation of new innovative apps with uncertain demand.

Correct answers: A, C, E

  1. 6.

    After reading the article “ Murder in the Amazon cloud ”, Vadali ( 2017 ), presenting the story of Code Space, which led to data deletion and the eventual shutdown of the company, Alice is more concerned about storage security.

    What are the needed tasks for the school and herself personally, to keep the students ‘data safe?

    You may review again Fig. 2.15, as well as the Techniques for protecting information according to Digital Preservation Handbook.

    Please select the right answer(s). You may select more than one answer.

    1. A.

      Strengthen software and operating systems.

    2. B.

      Do not abandon software when it becomes obsolete, you may need to reuse it.

    3. C.

      Use access controls to specify who is allowed to access digital material and the type of access that is permitted

    4. D.

      Train only the people whose security awareness is part of their duties.

    5. E.

      Built a short-term plan for security

    6. F.

      Use Encryption, a cryptographic technique which protects digital material by converting it into a scrambled form.

Correct answers: A, C, F

  1. 7.

    Alice watches the video “ How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTV ” and thinks “What an unbelievable story!”

    She then starts laughing. The director could have avoided this “almost disaster” if he.

    Please select the right answer.

    1. A.

      had not typed the command RM*

    2. B.

      had multiple independent copies of the digital material of the movie

    3. C.

      had used a combination of online and offline storage techniques for the copies of the digital material of the movie

    4. D.

      had kept the copies of the digital material of the movie geographically separated into different locations

    5. E.

      All the above.

Correct answer: E.

  1. 8.

    ACTIVITY/PRACTICE QUESTION (Short answer)

    Name some types of educational data that need long term preservation.

    An empty rectangle in horizontal orientation.
  2. 9.

    ACTIVITY/PRACTICE QUESTION (Reflect on)

    We encourage you to elaborate on your response in the following reflective tasks. You may reflect on:

    1. 1.

      Storage issues for preserving educational data

    2. 2.

      Good practices when preserving educational data

2.3 Educational Data Ethics

2.3.1 Informed Consent

The video “Introduction to data ethics” (in useful video resources) introduces the basic principles of data ethics.

As Pentland states when describing Big Data, “the ability to track, predict and even control the behaviour of individuals and groups of people is a classic example of Promethean fire: it can be used for good or ill” (Pentland, 2013).

New regulations, like the GDPR (General Data Protection Regulation) (Regulation (EU), 2016) that we will discuss later on, along with recent events such as the Cambridge Analytica and Facebook scandal, have raised awareness of data ethics issues that can arise from data misuse (Open Data Institute, 2018a).

Open Data Institute (ODI) (Broad et al., 2017), defines Data Ethics as.

a branch of ethics that evaluates data practices with the potential to adversely impact on people and society – in data collection, sharing and use.

Several frameworks, policies and guidelines have been developed to address data ethics issues, including JISC’s code of practice (Shacklett, 2016), updated in 2018, the LACE (Learning Analytics Community Exchange) framework in 2016 and the ICDE (International Council for Open and Distance Education) Global guidelines (Slade & Tait, 2019). To help identify potential ethical issues associated with a data project or activity and the steps needed to act ethically, Open Data Institute has also designed the Data Ethics Canvas in 2018 (Open Data Institute, 2018b).

We will further discuss the basic common principles of these practices in Chap. 3.

As emphasized by Shacklock (2016)“Institutions should put in place clear ethical policies and codes of practices that govern the use of educational data. These policies should, at a minimum, address privacy, security of data and consent.”

Before proceeding further, the brief video “What is the GDPR?” (in useful video resources) provides an overview of the European Union data protection rules, also known as the EU General Data Protection Regulation (or GDPR), that apply since 25 May 2018 to all entities who collect, store and process any personal data belonging to EU citizens and residents (even organisations that are not EU-based). GDPR has strengthened the conditions for consent (GDPR.eu, 2019).

We will soon discuss this new regulation and how should be applied by the various entities. First, let’s see what informed consent is all about.

Informed consent is declared by most international guidelines as one of the pivotal principles in Data Ethics and “is explicitly mentioned as a principle in article 7 of the International Covenant on Civil and Political Rights (1966), a United Nations Treaty” (European Commission, 2013).

According to Griffiths et al. (2016) “Informed consent refers to the requirement for an individual to give consent for the collection and analysis of the data which they generate.” While “Transparency refers to the degree to which users can observe the ways in which the data they generate is used”.

As per European Commission’s report (2013) regarding Ethics for Researchers “Informed consent consists of three components: adequate information, voluntariness and competence.

Thus, prior to consenting, individuals should be clearly informed of the data collection goals, possible adverse impacts and the means available to them to refuse or withdraw consent, without consequences, at any time.

Moreover, individuals must be competent to understand the information and should be fully aware of the consequences of their consent. Greater attention is required for some special categories of people, such as children, vulnerable adults and people with certain cultural or traditional backgrounds.

At this point, it is important to understand the distinction between consent and informed consent. For informed consent, we need to ensure that individuals genuinely understand how we intend to use their data e.g., by running focus groups and/or publishing explanatory documents.

As per European Commission guidelines about GDPR, “when a company or organisation asks for consent to collect or reuse personal information, the data subjects have to make a clear action agreeing to this, for example by signing a consent form or selecting yes from a clear yes/no option on a webpage”…“It is not enough to simply opt out, for example by checking a box saying they don’t want to receive marketing emails. They have to opt in and agree to their personal data being stored and/or re-used for this purpose.”

European Commission emphasizes that informed consent means that before you consent, you must be given information about the processing of your personal data, including at least:

  • the identity of the organisation processing data;

  • the purposes for which the data is being processed;

  • the type of data that will be processed;

  • the possibility to withdraw consent;

  • where applicable, the fact that the data will be used solely for automated-based decision-making, including profiling;

  • information about whether the consent is related to an international transfer of your data, the possible risks of data transfers to countries outside the EU if those countries are not the subject of a Commission adequacy decision and there are no adequate safeguards.

The way individuals are informed is crucial for the informed consent process. We should ensure that they fully realize the expected consequences of granting or withholding consent (Fig. 2.16).

Fig. 2.16
A branching diagram exhibit needs to, means, must be, and has to as important aspects that should be kept in mind in requesting informed consent.

Conditions for informed consent

With regards to the collection of personal data about children, additional protection should be granted since children are less aware of the risks and consequences of sharing data and of their rights.

In U.S., the foundational federal law on student privacy, the Family Educational Rights and Privacy Act (FERPA), establishes student privacy rights by restricting with whom and under what circumstances schools may share students’ personally identifiable information. DQC has developed a tool that summarizes some of the main provisions of FERPA and can be used as a guide to help interested parties to understand when they need to take a closer look at the law or consult an expert.

Under GDPR, any information addressed specifically to a child should be adapted to be easily accessible, using clear and plain language.

For most online services (social networking sites) the consent of the parent or guardian is required in order to process a child’s personal data on the grounds of consent up to a certain age.

The age threshold for obtaining parental consent is established by each EU Member State and can be between 13 and 16 years, according to National Data Protection Authority.

As per European Commission clarifications for the Rights for Citizens, “Companies have to make reasonable efforts, taking into consideration available technology, to check that the consent given is truly in line with the law. This may involve implementing age-verification measures such as asking a question that an average child would not be able to answer or requesting that the minor provides his parents’ email to enable written consent”.

Within the context of education, there are quite different approaches relating to the consent in collecting learners’ data, according to national guidelines (when available).

Figure 2.17 depicts the main principles and challenges that should be taken under consideration to comply with GDPR. As presented, data-related activity can still be lawful, by complying with legal obligations e.g. GDPR, even though it may be considered that data is not treated ethically. Sclater (2017) also argues that “consent is required for use of sensitive data and in order to take interventions directly with students on the basis of the analytics. This implies that if the data in question are not considered ‘sensitive’, and do not form the basis for any intervention, consent is not required (on the basis that this may be considered as of legitimate interest)”.

Fig. 2.17
A diagram lists personal data, anonymous information, the lawfulness of processing, sensitive personal data, and automated decision-making and profiling for G D P R compliance.

The main principles and challenges that should be taken under consideration to comply with GDPR

Moreover, as per the ICDE’s recent report (2019), many institutions seek for consent to collect student data for additional purposes, beyond institutional reporting and basic student support, at the point of registration. As emphasized, “expectation that users should consent to uses of personal data unknown at the point of registration seems to be an unreasonable and unethical one.”

An alternative approach supported by most of the existing guidelines (Higher Education Commission, JISC’s code of practice, ICDE Global guidelines) might be to differentiate between the granting of initial consent for the collection of data and the obtaining of additional consent at the point where a specific personal intervention is proposed, or in the case where new data is incorporated into the institution’s system, or existing data is used in new ways.

As concluded in ICDE report (2019)national legislation will influence positions taken, but generally this principle (of consent) should be built around a minimum of informed consent (that is, transparency before registration).”

You may also review this video “Why develop a data science code of ethics?” (in useful video resources) where experts from the data science community explain why it’s important to have a code of ethics.

Questions and Teaching Materials

  1. 1.

    After watching the video introducing Data Ethics Principles “ Introduction to Data Ethics ”, Alice is really concerned. Companies are collecting so much data every day. According to the video, Google can track your searches on your individual devices, even if you are not logged in to your account, up to:

    1. A.

      7 days

    2. B.

      2 months

    3. C.

      6 months

    4. D.

      3 years

Correct answer: C

  1. 2.

    Before using the flipped classroom initiative, Alice wants to study Grade 9 students’ perceptions of technology, using an online questionnaire she made with Google Forms.

    Alice wants to prepare an informed parental consent form for her students (as they are under 15) in order to participate in the students’ perceptions of technology survey, but she is a bit confused with all this information.

    Can you help Alice to have a better understanding?

    1. A.

      Prior to consenting, individuals should be clearly informed of how the data will be used

      • True

      • False

Correct answer: True

  1. B.

    When individuals give consent for the collection and analysis of the data which they generate, they cannot refuse or withdraw their consent

    • True

    • False

Correct answer: False

  1. C.

    EU General Data Protection Regulation (or GDPR) , apply since 25 May 2018 even to organisations that are not EU-based, as long as they collect, store and process any personal data belonging to EU citizens and residents.

    • True

    • False

Correct answer: True

  1. 3.

    You give some advice to Alice in order to help her prepare the consent form for the students’ perceptions of technology study. Select all that apply.

    A consent request must:

    1. A.

      Include contact details of the company processing the data

    2. B.

      Be anonymized

    3. C.

      Include information about the possibility of withdrawing consent

    4. D.

      Be freely given

    5. E.

      Be included in the terms and conditions

    6. F.

      Be presented in a formal language

    7. G.

      Specify the purpose of the data process

    8. H.

      Specify the type of data that will be processed

Correct answers: A, C, D, G, H

  1. 4.

    Alice has a colleague, Betty, who has just come on board and wants to conduct an online survey with her 17-year-old students about their eating habits. Betty asks Alice if it is necessary to collect parental consent in order to process her students’ personal data.

    Help Alice decide if a consent as a parent or guardian is required in order to process students’ personal data

    • Yes

    • No

Correct answer: No.

  1. 5.

    Alice’s Secondary High-School relies upon the sixth lawful basis (public task basis) to justify the processing of personal data (according to GDPR) where processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller.

    Is this lawful basis (public task basis) appropriate for Alice in order to take interventions directly with students on the basis of the participation data recorded within the Learning Management System?

    Help Alice find the correct answer

    • Yes

    • No

Correct answer: No.

  1. 6.

    In the video “ Why develop a data science code of ethics? ”, Paula Goldman, VP/Head of Omidyar Network’s Tech and Society Solutions Lab, claims that data and algorithms are neutral.

    • True

    • False

Correct answer: False.

  1. 7.

    ACTIVITY/PRACTICE QUESTION (Reflect on)

    We encourage you to elaborate on your response in the following reflective task. You may reflect on:

    1. 1.

      What information must be given to individuals, whose data is collected. You can search for additional information on the European Commission’s website.

    2. 2.

      Using information from the European Commission website, create an infographic presenting the General Protection Data Regulations.

2.3.2 Sensitive Educational Data Protection

Balancing digital learning with privacy and security is essential to fostering a successful digital culture (iKeepSafe, 2017).

Privacy is a fundamental human right and a core value in the functioning of democratic societies. As already discussed in the previous topics, with the exponential progress in the field of information and communication technologies and in the light of rapid development of Educational Data Analytics on a global basis, new challenges to privacy and data protection have emerged.

The “Privacy Overview for K12 Teachers and Administrators” video (in useful video resources) provides us with an overview of the privacy issues that may arise and growing concerns about educational data privacy. Is educational data privacy over in the digital age?

In the Quantified Student infographic you may see what a day in the data-driven life of most measured and monitored student in the history of education, looks like.

The data collection begins even before he steps into the school,” says Khaliah Barnes, director of the Student Privacy Project at the Electronic Privacy Information Center. “The issue is that this reveals specifically sensitive information,” says Barnes (Hill, 2014).

Moreover, as Jose Ferreira CEO at Knewton (one of the biggest actors in the field of educational technology software), points out “We literally know everything about what you know and how you learn best, everything.” Ferreira calls education “the world’s most data-mineable industry by far” (Hill, 2014).

Do educational data analytics challenge the principles of data protection? Is privacy a show-stopper? How privacy is guaranteed/secured, especially if minors and/or sensitive data is involved?

The European position has been expressed in the European Commission’s report: “New Modes of Learning and Teaching in Higher Education” (European Commission, 2014). In recommendation 14, the Commission clearly stated: “Member States should ensure that legal frameworks allow higher education institutions to collect and analyse learning data. The full and informed consent of students must be a requirement and the data should only be used for educational purposes”, and in recommendation 15: “Online platforms should inform users about their privacy and data protection policy in a clear and understandable way. Individuals should always have the choice to anonymise their data.” This is a widely accepted framework mirrored in the laws of multiple nations and international organisations including many U.S. states (Drachsler & Greller, 2016).

Thus, it is essential that all educators understand how learners’ personal information is used and adequately protect learners’ data in order to strengthen the trust of all parties involved and encourage their participation in digital learning.

In the video by the Data Quality Campaign “Who Uses Student Data?” (in useful video resources), it is emphasized that most personal student information stays local. Districts, states, and the federal government all collect data about students for important purposes like informing instruction and providing information to the public. But the type of data collected, and who can access them, is different at each point.

As clearly stated in Foundational Principles for Using and Safeguarding Students’ Personal Information developed by a coalition of US national education organisations “Everyone who uses student information has a responsibility to maintain the privacy and the security of students’ data, especially when these data are personally identifiable.

The basic information security techniques, as specified by Digital Preservation Handbook, include:

Encryption

  • Encryption is a cryptographic technique which protects digital material by converting it into a scrambled form. The use of a key is required to unscramble the data and convert it back to its original form.

Access Control

  • Access control enables an administrator to specify who is allowed to access digital material and the type of access that is permitted (for example read only, write).

Redaction

  • Redaction refers to the process of identifying and removing or replacing confidential or sensitive information, using anonymisation or pseudonymisation.

Now that we have a better understanding of the different types of data as categorized in terms of privacy, we will further review the levels of data as specified under GDPR.

The Fig. 2.18 presents the main categories of personal data as defined by GDPR.

Fig. 2.18
A diagram exhibits personal data, sensitive data, anonymous information, and pseudonymized data as 4 main categories of personal data under G D P R.

The main categories of personal data as defined by GDPR

We need to pay extra attention to sensitive (special category of personal data) since an organisation can only process this data under specific conditions (explicit consent may be needed). Even personal data, as clarified under GDPR, “should only be processed where it isn’t reasonably feasible to carry out the processing in another manner. Where possible, it is preferable to use anonymous data. Where personal data is needed, it should be adequate, relevant, and limited to what is necessary for the purpose (‘data minimisation’).

Once data is truly anonymised and does no longer contain any identifying elements, the anonymisation is irreversible and individuals are no longer identifiable, the data will not fall within the scope of the GDPR and it becomes easier to use.

Before anonymization, we should consider the purposes for which the data is to be used. Anonymisation may devalue the data, so that it is no longer useful for specific purposes.

The ICO’s Code of Conduct on Anonymisation provides further guidance on anonymisation techniques (UCL, 2018). Unlike anonymisation, in pseudonymised data personally identifiable material is replaced with artificial identifiers. Pseudonymised personal data can still fall within scope of the GDPR, depending on how difficult it is to attribute the pseudonym to a particular individual.

Whether ‘de-identified’ or pseudonymised data is in use, there is a residual risk of re-identification. For example, anonymisation is often seen as the “easy way out” of data protection obligations. However, experts around the world are adamant that 100% anonymisation is not possible. Anonymised data can rather easily be de-anonymised when they are merged with other information sources. (Drachsler & Greller, 2016).

L. Sweeney (2000) presented that it’s possible to personally identify 87% of the U.S. population based on just three data points: five-digit ZIP code, gender and date-of-birth (Wes, 2018). Later on, in 2006, the AOL release of users’ search logs (Hansell, 2006) and the case of the Searcher No. 4417749, as recorded in “A Face Is Exposed for AOL Searcher No. 4417749“by M. Barbaro and T. Zeller (2006) of New York times, was one of the first widely known cases of re-identification. In 2007, the Netflix case (Narayanan & Shmatikov, 2008), followed when researchers de-anonymized some of the Netflix data by matching rankings and timestamps with public information on the Internet Movie Database. As per Hill (2012), in 2012 the retail company Target, using behavioural advertising techniques, managed to identify a pregnant teen girl from her web searches and sent her relevant vouchers at home. (D’Acquisto et al., 2015).

Thus, though de-identification techniques can reduce the risks to the data subjects concerned and help organisations to meet their data-protection obligations, we need to assess properly the adequacy of these methods so as to decide whether further steps to de-identify the data are necessary (UCL, 2018).

The GDPR introduces two new principles: data protection by design and data protection by default, whose definitions are presented in Fig. 2.19.

Fig. 2.19
A diagram presents the respective definition of the two principles of data protection under G D P R, namely, by design and by default.

Data protection by design and data protection by default

As specified in GDPR (Regulation (EU), 2016), the protection of the rights and freedoms of natural persons with regard to the processing of personal data require that appropriate technical and organisational measures be taken which meet in particular the principles of data protection by design and data protection by default.

“Data protection by design minimises privacy risks and increases trust”, while “Data protection by default entails ensuring that your company always makes the most privacy friendly setting the default setting” (European Union, 2018).

An example of Data protection by design is the use of pseudonymisation & encryption and examples for Data protection by default include “data minimisation” (only the data necessary should be processed), the limited accessibility as well as the short storage period.

Let’s now review further the privacy by design strategies and the storage privacy (Data protection by design), as well as the Storage Limitation (Data protection by default).

Figure 2.20 depicts eight Privacy By Design Strategies, as proposed by the European Union Agency for Network and Information Security (D’Acquisto et al., 2015). These strategies enable us to identify the data protection and privacy requirements early in the educational analytics value chain and subsequently to implement the necessary technical and organizational measures. One of the most significant privacy enhancing technologies that can be used for implementing such strategies, is storage privacy.

Privacy challenges should be, seen as opportunities that, if appropriately handled, can build trust in the big data ecosystem for the benefit of both users and big data industry (D’Acquisto et al., 2015).

Danezis et al. (2014), in this report “Privacy and Data Protection by Design”, defines Storage Privacy as “the ability to store data without anyone being able to read (let alone manipulate) them, except the party having stored the data (called here the data owner) and whoever the data owner authorises.”

Fig. 2.20
A diagram presents minimize, hide, separate, inform, aggregate, control, enforce, and demonstrate as eight privacy by design strategies.

Eight privacy by design strategies, as proposed by the European Union Agency for Network and Information Security (D’Acquisto et al., 2015)

As specified further in the report, “a major challenge to implement private storage is to prevent non-authorised parties from accessing the stored data. If the data owner stores data locally, then physical access control might help, but it is not sufficient if the computer equipment is connected to a network: a hacker might succeed in remotely accessing the stored data. If the data owner stores data in the cloud, then physical access control is not even feasible.

A straightforward option for storage privacy is storing the data, either locally or in cloud storage, in encrypted form. One can use full disk encryption (FDE) or file system-level encryption (FSE). As clarified in the report, “encryption and decryption operations must be carried out locally, not by remote service, because both keys and data must remain in the power of the data owner if any storage privacy is to be achieved. The report specifies that outsourced data storage on remote clouds is practical and relatively safe as long as only the data owner, not the cloud service, holds the decryption keys. Such storage may be distributed for added robustness to failures.”

When it comes to Data protection by default, Storage limitation is one of the key conditions for processing personal data under GDPR. It replies to a simple question “For how long can data be kept and is it necessary to update it?” Regulation’s answer is straightforward “You must ensure that personal data is stored for no longer than necessary for the purposes for which it was collected”. There are 6 basic guidelines, specified clearly by GDPR, which you need to take under consideration when storing personal data (Fig. 2.21).

Fig. 2.21
A diagram exhibits the six basic G D P R guidelines to properly store personal data. Store data for the shortest time possible is one of them.

Six basic guidelines, which you need to take under consideration when storing personal data

Before closing this chapter, it is essential to analyse the individuals’ rights. The main reason for the introduction of GDPR is to allow European Union citizens to better control their personal data. More specifically is designed to:

  • Harmonize data privacy laws across Europe,

  • Protect and empower all EU citizens’ data privacy

  • Reshape the way organisations across the region approach data privacy.

GDPR applies to “all companies operating in the EU, wherever they are based” (European Commission, 2018). The GDPR introduces stronger rights for data subjects (Intersoft Consulting, 2018), and creates new obligations for data controllers (the person or body handling the personal data).

Figure 2.22 presents individuals’ rights so as to have control over their personal data, under GDPR. To exercise individuals’ rights they should contact the company or organisation processing their personal data, also known as the controller. If the company/organisation has a Data Protection Officer (‘DPO’) they may address their request to the DPO. The company/organisation must respond to their requests without undue delay and at the latest within 1 month.

Fig. 2.22
A circle diagram presents eight rights of an individual under G D P R. Some of the mentioned rights are the right to be informed and the right of access.

Individuals’ rights so as to have control over their personal data, under GDPR

When the personal data, for which a company/organisation is responsible, is disclosed, either accidentally or unlawfully, to unauthorised recipients or is made temporarily unavailable or altered, a data breach occurs. In case a data breach occurs and the breach poses a risk to individual rights and freedoms, the company/organisation should notify its Data Protection Authority (DPA) within 72 hours after becoming aware of the breach. Depending on whether or not the data breach poses a high risk to those affected, a business may also be required to inform all individuals affected by the data breach (European Commission, 2018h).

Whenever processing is likely to result in a high risk to the rights and freedoms of individuals, as specified by GDPR, a Data Protection Impact Assessment (DPIA) is required. A DPIA is required at least in the following cases:

  • a systematic and extensive evaluation of the personal aspects of an

    individual, including profiling;

  • processing of sensitive data on a large scale;

  • systematic monitoring of public areas on a large scale.

National Data Protection Authorities, in collaboration with the European Data Protection Board, may provide lists of cases where a DPIA would be required. As emphasized, “the DPIA should be conducted before the processing and should be considered as a living tool, not merely as a one-off exercise. Where there are residual risks that can’t be mitigated by the measures put in place, the DPA must be consulted prior to the start of the processing”.

Figure 2.23 provides the 3 Basic Steps to Identify and Protect Sensitive Data, as per Krueger (2017).

Fig. 2.23
A diagram lists identify sensitive data, assess and respond to data risks, and monitor implemented security processes as steps to boost data security.

The 3 Basic Steps to Identify and Protect Sensitive Data

A DPIA should be conducted as early as possible in the project lifecycle, so that its findings and recommendations can be incorporated into the design of the processing operation (itgovernance).

You may also review the video “Protecting Student-Data Privacy: An Expert’s View (see useful video resources) where Fordham University Law Professor Joel Reidenberg talks with Education Week Correspondent John Tulenko about student data and the best ways to keep it secure.

Questions and Teaching Materials

  1. 1.

    Alice is a bit confused. Several state and federal laws require privacy protection for students and children. In the video she just watched, “ Privacy Overview for K12 Teachers and Administrators ”, what laws are mentioned concerning data privacy for children?

    There is more than one correct answer. Help Alice select the right ones

    1. A.

      FERPA

    2. B.

      CIPA

    3. C.

      COPPA

    4. D.

      CAPTA

Correct answers: A, C

  1. 2.

    From watching the “ Who Uses Student Data? ” video, Alice understands that teachers have access only to de-identified data (i.e. information about individual students but with identifying information removed).

    Is Alice’s understanding correct?

    Please select the correct answer:

    • Yes

    • No

Correct answers: No.

  1. 3.

    For the purposes of research, Alice intends to release student data.

    Alice asks to be informed by the responsible DPO on school’s policy and guidelines to protect students’ data privacy, confidentiality, integrity and security. She becomes aware of personal and sensitive data handling and the use of anonymisation and pseudonymisation to remove personally identifiable information.

    As student data might be released for the purposes of research, all names, postal codes and other identifiable data are removed. Completely removing fields that could be used in any way to identify a person is considered a strong form of

    1. A.

      data pseudonymisation

    2. B.

      data anonymisation

      Please select the correct term to complete the sentence.

Correct answer: B

  1. 4.

    Alice has concerns about her students’ records, and more specifically about medical reports related to student’s learning difficulties being accessed by unauthorized third persons. She contacts the responsible DPO and is informed about the appropriate technical and organisational measures taken by the school, so as to secure data protection by design and by default.

    More specifically the DPO explains to Alice that the School Information System (SIS) has a mechanism for comprehensively logging who consulted the medical reports and preventing unauthorized access to these sensitive data. Moreover, personal and sensitive data are pseudoanonymized and “data minimization” (only the data necessary should be processed) is used.

    Alice feels secure because the technical and organisational measures being taken meet in particular the principles of data protection by design and data protection by default.

    Is Alice correct in feeling secure?

    Please select the correct answer:

    • Yes

    • No

Correct answer: Yes

  1. 5.

    Storage privacy is about preventing non-authorized parties from accessing the stored data. This can be achieved only when encryption and decryption operations are carried out locally, not by remote service, because both keys and data must remain in the power of the data owner.

    Alice assumes that if any storage privacy is to be achieved, then data must be stored locally and cloud storage should be avoided.

    Do you agree with the assumption of Alice?

    Please select the correct answer:

    • Yes

    • No

Correct answer: No.

  1. 6.

    Alice’s institution runs a recruitment office and for that purpose it collects CVs and keeps records of persons seeking employment. They keep recruitment application forms and interview notes (for unsuccessful candidates) for 5 years in case they need them without taking any measures for updating the CVs

    Alice doubts that the storage period is proportionate to the purpose of finding employment and thinks that this is not compliant with GDPR. Do you agree with Alice?

    You may review “For how long can data be kept and is it necessary to update it? | European Commission (europa.eu)”.

    Please select the correct answer:

    • Yes

    • No

Correct answer: Yes.

  1. 7.

    Alice is trying to understand the rights for data subjects described in GDPR. She reviews “Data protection and online privacy – Your Europe (europa.eu)” and “It’s your data – take control – Data protection in the EU (europa.eu)”.

    Help Alice match the cases to the appropriate individual right.

Case

Individual Right

A. You’ve bought goods from an online retailer. You can ask the company to give you the personal data they hold about you, including: your name and contact details, credit card information and dates and types of purchases.

1. Right to object

B. You bought two tickets online to see your favorite band play live. Afterwards, you’re bombarded with adverts for concerts and events that you’re not interested in. You inform the online ticketing company that you don’t want to receive further advertising material.

2. Right to rectification

C. You apply for a new insurance policy but notice the company mistakenly records you as a smoker, increasing your life insurance payments.

3. Right to be forgotten

D. When you type your name into an online search engine, the results include links to an old newspaper article about a debt you paid long ago.

4. Right of Access

E. You apply for a loan with an online bank. You are asked to insert your data and the bank’s algorithm tells you whether the bank will grant you the loan and gives the suggested interest rate.

5. Right to data portability

F. You’ve found a cheaper electricity supplier. You ask your existing supplier to transmit your data directly to the new supplier, if it’s technically feasible or to return your data to you in a commonly-used and machine readable format so that it can be used on other systems.

6. Rights related to automated decision making

Correct answer: A4 – B1 – C2 – D3 – E6 – F5.

  1. 8.

    Alice’s institution recruitment office decides to implement an innovative recruitment procedure which includes e-recruitment tools automatically pre-selecting/excluding candidates without human intervention. Alice thinks that a Data Protection Impact Assessment (DPIA) is required.

    Study the “Decision of the European Data Protection Supervisor of 16 July 2019 on DPIA Lists issued under Articles 39(4) and (5) of Regulation (EU)” and select the “Criteria for processing ‘likely to result in high risk’”, that will trigger DPIA in the case of Alice’s institution new recruitment procedure (select 3 criteria).

    Which are the criteria for processing “likely to result in high risk”?

    1. 1.

      Systematic and extensive evaluation of personal aspects or scoring, including profiling and predicting.

    2. 2.

      Automated-decision making with legal or similar significant effect: processing that aims at taking decisions on data subjects

    3. 3.

      Systematic monitoring: processing used to observe, monitor or control data subjects, especially in publicly accessible spaces. This may cover video-surveillance but also other monitoring, e.g. of staff internet use.

    4. 4.

      Sensitive data or data of a highly personal nature: data revealing ethnic or racial origin, political opinions, religious or philosophical beliefs, trade-union membership, genetic data, biometric data for uniquely identifying a natural person, data concerning health or sex life or sexual orientation, criminal convictions or offences and related security measures or data of highly personal nature.

    5. 5.

      Data processed on a large scale, whether based on number of people concerned and/or amount of data processed about each of them and/or permanence and/or geographical coverage

    6. 6.

      Datasets matched or combined from different data processing operations performed for different purposes and/or by different data controllers in a way that would exceed the reasonable expectations of the data subject.

    7. 7.

      Data concerning vulnerable data subjects: situations where an imbalance in the relationship between the position of the data subject and the controller can be identified.

    8. 8.

      Innovative use or applying technological or organisational solutions that can involve novel forms of data collection and usage. Indeed, the personal and social consequences of the deployment of a new technology may be unknown.

    9. 9.

      Preventing data subjects from exercising a right or using a service or a contract.

Correct answer: 1, 2, 8.

  1. 9.

    According to Professor Joel Reidenberg, in the video “Protecting Student-Data Privacy: An Expert’s View”, the worst that could happen because of bad data practices is:

    1. A.

      Students being used as guinea pigs for the development of commercial products

    2. B.

      Educational harm to children, where they are being improperly labelled

    3. C.

      The development of programs that assess teachers’ performance

    4. D.

      The development of flexible mechanisms so parents can consent and opt-in to additional uses of data

Correct answer: B.

  1. 10.

    ACTIVITY/PRACTICE QUESTION (Reflect on)

    We encourage you to elaborate on your response in the following reflective task. You may reflect on:

    1. 1.

      Privacy issues for preserving educational data

    2. 2.

      Educational data protection

2.4 Concluding Self-Assessed Assignment

2.4.1 Introduction

Both Alice and you have come a long way in your understanding of the power of educational data as a key success factor for online and blended teaching and learning, as well as of the fundamentals of Educational Data Collection and Management, including issues related to ethics and privacy.

You are now ready to develop further your Educational Data Literacy Competences focusing on Educational Data Analysis, Comprehension and Interpretation.

In order to proceed, you are requested to complete a concluding self-assessed assignment. This self-assessed assignment is a real life scenario activity (based on the use case of our teacher Alice), using a rubric across three proficiency levels and an exemplary solution rating. When you have completed this assignment, you will assess it yourself, following the rubric which will list the criteria required and give guidelines for the assessment.

This self-assessed assignment procedure consists of 5 steps:

  • Step 1. Real life scenario

  • Step 2. Getting familiar with the assessment rubric

  • Step 3. Prepare your answer

  • Step 4. Review a sample solution

  • Step 5. Self-evaluate your answer

2.4.2 Step 1. Real Life Scenario

Alice is an enthusiastic English Language teacher who has just been appointed in an Experimental High School, in Athens, Greece. She wants to use student data to gain insights and plan her teaching activities accordingly, so as to improve this year’s Grade 9 students’ academic performance.

Alice contacts Mr. Adams, appointed as school’s Data Protection Officer (DPO), to secure all necessary approvals for the sources handled by her school or by the corresponding district. As soon as Alice signs the required data protection consent form, she gets permission and downloads the datasets from the several sources.

Alice also requests to grant her access to the LMS used by the school (a new teacher account is created by the LMS administrator). Before implementing her flipped classroom strategy, she contacts the school’s DPO again to discuss any legal and ethical issues she needs to pay attention to. As advised by the DPO, she accesses the LMS and via the “User agreements page”, she reviews the existing user agreements and confirms that signed informed consent has been given for all participating students (either parental consent on behalf of minors or directly by the students, as defined by National Data Protection Authority).

Alice realizes that she must update the current consent form based to the new General Data Protection Regulation Policy.

You need to help Alice to prepare a new consent form for the students participating in her flipped classroom model.

2.4.3 Step 2. Getting Familiar with the Assessment Rubric

Alice reviews the Initial Consent Form.

Please help Alice to evaluate this Initial Consent Form using the Rubric for assessing the Consent Form and to identify potential issues.

ACTIVITY/PRACTICE QUESTION (Reflect on)

We encourage you to elaborate on your response about the evaluation of the Initial Consent Form created by Alice, in the following reflective task. You may reflect on:

  1. 1.

    Does this consent form comply with GDPR consent requirements?

  2. 2.

    If not, what would you advise Alice to modify, so that this consent form is GDPR compliant and limits her school’s exposure to regulatory penalties?

2.4.3.1 Initial Consent Form

2.4.3.1.1 Introduction

Welcome to Athens Experimental High School (the “School” or “We”) Learning Management System (LMS). The School provides this LMS to you subject to the following Terms of Use and Privacy Policy (together, the “Terms”). When you use this LMS, you agree to abide by these Terms. If you do not agree to abide by these Terms, you may not use this LMS. Please read the Terms carefully.

The School reserves the right to make changes to this LMS and to modify the Terms at any time at its sole discretion. We encourage you to review the Terms frequently for modifications. By your use of this LMS, you agree to abide by any such modifications to the Terms, which are binding on you.

2.4.3.1.2 Privacy Policy

This Privacy Policy describes the School’s agreement with you regarding how we will handle certain information on the LMS. This Privacy Policy does not address information obtained from other sources such as submissions by mail, phone or other devices or from personal contact. By accessing the LMS and/or providing information to the School on the LMS, you consent to the collection, use and disclosure of certain information in accordance with this Privacy Policy.

2.4.3.1.2.1 Information Collected on Our LMS:

If you merely download material or browse through the LMS, our servers may automatically collect certain information from you which may include: (a) the name of the domain and host from which you access the Internet; (b) the browser software you use and your operating system; and (c) the Internet address of the website from which you linked to the LMS. The information we automatically collect may be used to improve the LMS to make it as useful as possible for our visitors; however, such information will not be tied to the personal information you choose to provide to us.

We do collect and keep personally identifiable information when you choose to voluntarily register to the LMS and submit such information. After your registration, we retain the information you submit for our records and to contact you from time to time. Please note that if we decide to change the manner in which we use or retain personal information, we may update this Privacy Policy, at our sole discretion.

2.4.3.1.2.2 Disclosure of Personal Information to Third Parties:

The School does not rent or sell personal information that you choose to provide to us nor does the School disclose credit card or other personal financial information to third parties other than as necessary to complete a credit card or other financial transaction or as required by law. The School does engage certain third parties to perform functions and provide services, including, without limitation, hosting and maintenance, customer relationship, database storage and management, payment transaction and direct marketing campaigns. We will share your personal information with these third parties, but only to the extent necessary to perform the functions and provide the services, and only pursuant to binding contractual obligations requiring such third parties to maintain the privacy and security of your data.

2.4.3.1.2.3 Receiving Promotional Materials:

We may send you information or materials such as newsletters, ebooks, whitepapers by e-mail or postal mail when you submit your address via the LMS. By your registration in the LMS, you are consenting to our sending you such information or materials.

If you do not want to receive promotional information or material, please send an email with your name, mailing address and email address to athens.expschool.online@gmail.com. When we receive your request, we may take reasonable steps to remove your name from such lists.

2.4.3.1.2.4 Cookies

A cookie is a small text file that a website can place on your computer’s hard drive for record-keeping or other administrative purposes. Our LMS may use cookies to help to personalise your experience on the LMS. Although most web browsers accept cookies automatically, usually you can modify your browser setting to decline cookies. If you decide to decline cookies, you may not be able to fully use the features of the LMS. Cookies may also be used at certain sites accessible through links on the LMS.

2.4.3.1.2.5 Links to Other Websites:

The School is not responsible for the practices or policies of the websites linked to or from the LMS, including without limitation their privacy practices or policies. If you elect to use a link that accesses another party’s website, you will be subject to that website’s practices and policies.

2.4.3.1.3 Terms of Use
2.4.3.1.3.1 For Informational Purposes Only

The School makes available the information on this Website for informational purposes only. You are solely responsible for the information you provide on this Website and for the information you use that you view on this Website. Information on this Website is not intended to be a replacement for direct consultation with the School; if you have questions or concerns, please contact the School directly.

2.4.3.1.3.2 Copyright and Trademark Information

The content included on this LMS, such as data, text, graphics, logos, images and software and its compilation is the property of the School and/or its content suppliers and is protected by copyright and trademark laws. In the event you upload any content including, without limitation, photographs or videos to this LMS, you (i) represent to the School and its affiliates that you have all rights necessary to upload the content; (ii) agree to indemnify the School and its affiliates for any third party infringement or other claims related thereto; and (iii) hereby license to the School and its affiliates a perpetual non-cancellable royalty-free license to use such uploaded content for any purposes in any media now existing or hereafter developed.

2.4.3.1.3.3 License for Your Use

For any period of time that you use this LMS and abide by these terms, the School grants to you a limited, revocable and nonexclusive license to access this LMS for your use but not to copy, download or modify it, or any portion of it, except with the express written consent of the School. This LMS or any portion of this LMS may not be reproduced, duplicated, copied, sold, visited or otherwise exploited without the express written consent of the School. You may not utilize framing to enclose any trademark, logo, content or other proprietary information contained on this LMS without the express written consent of the School. You may not use any meta tags or any other “hidden text” utilizing the School or its affiliates’ name or trademarks without the School’s express written consent.

You agree to use this LMS only for lawful purposes, and you acknowledge that your failure to do so may subject you to civil or criminal liability. You are responsible for ensuring that any materials you upload, post or submit to this LMS do not violate the copyright, trademark, trade secret or other personal or proprietary rights of any third party and you hereby agree to indemnify the School for any third party infringement or personal rights claims. You agree not to disrupt, modify, or interfere with this LMS or its associated software, hardware and servers in any way and you agree not to impede or interfere with others’ use of this LMS. You further agree not to alter or tamper with any information or materials on or associated with this LMS. Any unauthorized use or violation of these terms automatically terminates any permission or license granted by the School to access and use this LMS.

2.4.3.1.3.4 External Links

This LMS may provide links or references to third party websites or applications, including without limitation, third party websites or applications of advertisers or of providers of informational articles or other users. The School is not responsible for any information you choose to provide to those third party websites or applications; any information, products or services you acquire from those third party websites or applications, or any damages arising from your access to or use of those third party websites or applications.

Any links to third party websites and applications are provided as a convenience to the visitors of this LMS and any inclusion of any such links in this Website does not imply an endorsement or warranty of the third party websites or applications or their security, content, products, offerings or services. You are cautioned that any third party websites or applications are governed by their own terms of use and privacy policies, so when linking you should make sure to visit the appropriate pages of those third party websites or applications to determine what terms of use and privacy policies will apply to your use.

  • YES, I GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.

  • NO, I DO NOT GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.

Adapted from: https://www.whitbyschool.org/privacy-policy

2.4.3.2 Rubric for Assessing the Consent Form

Criteria

1 Unacceptable

3 Good/Solid

5 Exemplary

Language

The consent request is presented neither in a clear, nor in a concise way, using language that is not easy to understand

The consent request is presented in a quite clear and concise way, using language that is quite easy to understand

The consent request is presented in a very clear and concise way, using language that is very easy to understand

Explicit and Distinguishable

The consent request is not explicit or distinguishable from other pieces of information.

The consent request is quite distinguishable from other pieces of information but is not given via a positive act.

The consent request is clearly distinguishable from other pieces of information, given via an electronic tick-box that the individual has to explicitly check online

Freely given consent

The individual does not have a free choice.

The individual has a free choice and it is quite clear how to refuse consent without being at a disadvantage.

The individual has a free choice and it is very clear how to refuse consent without being at a disadvantage.

Possibility to withdraw the given consent

The consent form does not include the possibility to withdraw consent

The consent form includes the possibility to withdraw consent, but does not explain how to do it.

The consent form includes the possibility to withdraw consent and explains clearly how to do it.

Rights of the data subject

The individuals are not informed about their rights as a data subject (GDPR Art.12 to 23)

Rights of the data subject (GDPR Art.12 to 23) are somehow stated but the modalities to exercise these rights are not clear.

Individuals are clearly informed about their rights as a data subject (GDPR Art.12 to 23) and they can effectively exercise these rights

Identity of the organisation processing data

The consent form does not include the identity of the organisation processing data

The consent form includes quite clearly the identity of the organisation processing data

The consent form includes very clearly the identity of the organisation processing data

Purposes for which the data is being processed

The consent form does not explain the purposes for which the data is being processed

The consent form explains quite clearly the purposes for which the data is being processed

The consent form explains very clearly the purposes for which the data is being processed

Describes the type of data that will be processed

The consent form does not describe the type of data that will be processed

The consent form describes the type of data that will be processed

The consent form describes in detail the type of data that will be processed

International transfer of data

The consent form does not include information about whether the consent is related to an international transfer of your data

The consent form includes quite clearly information about whether the consent is related to an international transfer of your data

The consent form includes clearly information about whether the consent is related to an international transfer of your data

2.4.4 Step 3. Prepare Your Answer

Please assist Alice in preparing a consent form for the students participating in the online course for the flipped classroom initiative.

ACTIVITY/PRACTICE QUESTION (Reflect on)

We encourage you to elaborate on your response about the preparation of the consent form for Alice’s students participating in the online course for the flipped classroom initiative, in the following reflective task. You may reflect on:

  1. 1.

    How should the consent form be formulated so that Alice can obtain consent compliant with GDPR requirements?

  2. 2.

    What are the key features to create an effective opt-in consent form that works under GDPR?

2.4.5 Step 4. Review a Sample Solution

Please review a sample of an Exemplary solution that follows the criteria specified in the Rubric for assessing the Consent Form.

ACTIVITY/PRACTICE QUESTION (Reflect on)

We encourage you to elaborate on your response about the Exemplary solution that follows the criteria specified in the Rubric for assessing the Consent Form, in the following reflective task. You may reflect on:

  1. 1.

    Do you identify any GDPR requirements that you did not take under consideration when creating your consent form?

2.4.5.1 Exemplary Sample Solution

Consent Form to Register and Participate in the Online Course for the English Language Course of the ninth Grade of Athens Experimental High School.

In order to register and participate in the online course that will be offered for the English Language Course of the ninth Grade, you are invited to indicate your consent for the collection and processing of your personal data for the purposes of the online course, administered by Athens Experimental High School.

Athens Experimental High School (or “we”) uses a variety of resources to support student learning. Moodle™ software has been adopted as Athens Experimental High School’s Learning Management System (LMS). Moodle™ software is free and open source, and allows educators to create a private space online, filled with tools that easily create courses and various activities, all optimised for collaborative learning. In order to provide access to our students to the online course for the English Language Course of the ninth Grade on this platform/site, we need to collect and store personal information about them. You may also refer to https://moodle.com/privacy-notice/.

Please note:

  1. 1.

    The online course for the English Language Course of the ninth Grade will be carried out from 15/09/2021 to 15/06/2021.

  2. 2.

    Before you proceed to the registration to this online course, you will be asked to indicate your consent for the collection and processing of your personal data for the purposes of the course.

  3. 3.

    For the purposes of GDPR Regulation: ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); profiling’ means any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements; ‘controller’ means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law.

  4. 4.

    The Data Controller for data processed under this Notice is:

    Athens Experimental High School (VAT 021 27 76 45).

    20 Makrygianni Road.

    11,676 Athens.

    Greece.

    email: athens.expschool.online@gmail.com

Legal basis for processing the personal and sensitive data:

Personal Data:

In connection with this online course, the Athens Experimental High School’s collection and processing of the following Personal Data is lawful based on.

Article 6.1(a), GDPR, Consent.

Article 6.1(b), GDPR, Contract.

Article 6.1(c), GDPR, Legal Obligation.

Article 6.1(f), GDPR, Legitimate Interest:

□ Name, Surname, Email Address.

□ User activity and contribution data.

Sensitive Data:

In connection with this research, the Athens Experimental High School’s collection and processing of the following Sensitive Data is lawful based on consent (Article 9.2(a), GDPR):

□ Gender.

Potential Benefits:

The participation in this online course enables data subjects (students) to effectively collaborate with their peers, and tutor(s) to collect data, efficiently provide resources, timely feedback and differentiated learning opportunities.

Potential Risk or Discomforts:

We do not perceive of any risk or discomfort in participating in the online course.

Storage of Data:

The installation of the Moodle™ software platform is hosted in a secure server at Athens Experimental High School’s premises. The collected data is also stored in this secure server for the time required by the purposes described in this notice, for maximum 5 years.

Data transfer outside the European Union:

We may share some of the data collected with services located outside the European Union, in particular through the aforementioned Moodle™ software services.

Right to Withdraw:

Your participation in this online course is voluntary. You are under no obligation to participate in this online course and you may withdraw consent at any time, without being at a disadvantage, by contacting the Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com.

Rights of Data Subject:

Whilst Athens Experimental High School is in possession of or processing your personal data, you, the data subject, have the following rights:

  • Right of access – you have the right to request a copy of the information that we hold about you.

  • Right of rectification – you have a right to correct data that we hold about you that is inaccurate or incomplete.

  • Right to be forgotten – in certain circumstances you can ask for the data we hold about you to be erased from our records. The erasure of your information shall be subject to the Athens Experimental High School’s need to retain certain information pursuant to any other identified lawful basis.

  • Right to restriction of processing – where certain conditions apply to have a right to restrict the processing.

  • Right of portability – you have the right to have the data we hold about you transferred to another organisation.

  • Right to object – you have the right to object to certain types of processing such as direct marketing.

  • Right to object to automated processing, including profiling – you also have the right to be subject to the legal effects of automated processing or profiling.

  • Right to judicial review: in the event that Athens Experimental High School refuses your request under rights of access, we will provide you with a reasonable explanation.

by contacting the Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com.

If the Athens Experimental High School’s use of your information is pursuant to your consent, you have the right to withdraw consent without affecting the lawfulness of the Athens Experimental High School’s use of the information prior to receipt of your request.

If you think your data protection rights have been breached you have the right to lodge a complaint with Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com and/or your national Data Protection Authority (DPA).

Data Subject Concerns and Reporting:

If you have any questions concerning the online course or experience any discomfort related to the online course, please contact the Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com.

Conflict of Interest

We do not perceive any conflicts of interest in the development of this online course.

Compensation:

There is no compensation for data subjects in this online course.

Confidentiality:

The only people processing your data will be the tutor(s) involved in the Athens Experimental High School’s online course(s). The tutor(s) undertake to keep any information provided herein confidential, not to let it out of our possession and to report on the findings from the perspective of the entire participating group and not from the perspective of an individual. Please note that confidentiality cannot be guaranteed while data is in transit over the Internet.

Purposes for which the data is being collected and processed:

The data which is collected and processed via the online course in the Course Management System (Moodle) is being used by the Athens Experimental High School to facilitate teaching and learning. For this, online teaching resources are uploaded where the data subjects (students) enrol and study the lecture material at home. The material is in the form of videos, small activities with automatic feedback (online quizzes), and forum discussions. The data subjects (students) can undertake some additional homework online to further check their understanding and extend their learning. Though this online course and via the usage of CMS tools the tutor(s) monitor the data subjects (students) learning process, discover patterns, find indicators for success and indicators for poor marks or drop-out and proceed with recommendations and revisions of the course’s online learning activities and educational resources, aiming to improve data subjects’ (students’) academic performance.

We ensure that the information we collect, process and use is appropriate for these correspondence purposes.

By indicating consent to participate in this online course you also indicate consent for the possible use of data for automated decision making, such as profiling, to identify data subjects’ (students’) progress against a range of indicators and activities identified to have an impact on data subjects’ (students’) success in the online course.

Consent to register and participate in the Online Course for the English Language Course of the ninth Grade of Athens Experimental High School.

Selecting “YES, I AGREE” below indicates that:

  • You have read the above information;

  • You voluntarily agree to participate in this online course;

  • You understand the procedures described above;

  • You give consent for the use of your Personal Data for the purposes outlined in this notice;

  • You give consent for the use of your Sensitive Data for the purposes outlined in this notice;

  • You are at least 15 years of age.

  • YES, I AGREE

  • NO, I DO NOT AGREE

For students who are less than 15 years of age, consent from a parent or guardian is necessary

  • YES, I GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.

  • NO, I DO NOT GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.

2.4.6 Step 5. Self-Evaluate Your Answer

Now that you have seen the Exemplary sample solution, please rate your initial answer (evaluate the consent form you created), using the criteria in the Rubric for assessing the Consent Form.

Language

  1. 1.

    The consent request is presented neither in a clear, nor in a concise way, using language that is not easy to understand

  2. 2.

    The consent request is presented in a quite clear and concise way, using language that is quite easy to understand

  3. 3.

    The consent request is presented in a very clear and concise way, using language that is very easy to understand

Explicit and Distinguishable

  1. 1.

    The consent request is not explicit or distinguishable from other pieces of information.

  2. 2.

    The consent request is quite distinguishable from other pieces of information but is not given via a positive act.

  3. 3.

    The consent request is clearly distinguishable from other pieces of information, given via an electronic tick-box that the individual has to explicitly check online

Freely given consent

  1. 1.

    The individual does not have a free choice.

  2. 2.

    The individual has a free choice and it is quite clear how to refuse consent without being at a disadvantage.

  3. 3.

    The individual has a free choice and it is very clear how to refuse consent without being at a disadvantage.

Possibility to withdraw the given consent

  1. 1.

    The consent form does not include the possibility to withdraw consent

  2. 2.

    The consent form includes the possibility to withdraw consent, but does not explain how to do it.

  3. 3.

    The consent form includes the possibility to withdraw consent and explains clearly how to do it.

Rights of the data subject

  1. 1.

    The individuals are not informed about their rights as a data subject (GDPR Art.12 to 23)

  2. 2.

    Rights of the data subject (GDPR Art.12 to 23) are somehow stated but the modalities to exercise these rights are not clear.

  3. 3.

    Individuals are clearly informed about their rights as a data subject (GDPR Art.12 to 23) and they can effectively exercise these rights

Identity of the organisation processing data

  1. 1.

    The consent form does not include the identity of the organisation processing data

  2. 2.

    The consent form includes quite clearly the identity of the organisation processing data

  3. 3.

    The consent form includes very clearly the identity of the organisation processing data

Purposes for which the data is being processed

  1. 1.

    The consent form does not explain the purposes for which the data is being processed

  2. 2.

    The consent form explains quite clearly the purposes for which the data is being processed

  3. 3.

    The consent form explains very clearly the purposes for which the data is being processed

Describes the type of data that will be processed

  1. 1.

    The consent form does not describe the type of data that will be processed

  2. 2.

    The consent form describes the type of data that will be processed

  3. 3.

    The consent form describes in detail the type of data that will be processed

International transfer of data

  1. 1.

    The consent form does not include information about whether the consent is related to an international transfer of your data

  2. 2.

    The consent form includes quite clearly information about whether the consent is related to an international transfer of your data

  3. 3.

    The consent form includes clearly information about whether the consent is related to an international transfer of your data