The validity and reliability of the cross-national comparison of degree programme levels in European countries. What have students learnt?

A cross-national comparison of degree programme levels became relevant when the borders of European countries opened for students and graduates, and higher education institutions were restructured into bachelor’s and master’s programmes. This new situation foregrounded the questions of what students are learning in the degree programmes of European countries and how to compare their achievements. Therefore, we conceptualised a valid and reliable ‘level’ construct that included a cognitive (‘disciplinary thinking’) and an affective aspect (‘professional attitude’). The main research question for our exploratory study was: ‘What procedure can lead to a valid and reliable cross-national comparison of degree programme levels?’ To achieve this comparison, we designed a Three-Step Procedure, in which level was operationalised (step 1), measured and analysed (step 2), and compared cross-nationally (step 3). The study was conducted in collaboration with four bachelor programmes in Hotel Management from four European countries; a total of 783 participants were involved. Four themes were generated to operationalise the concept of level: professional management, hospitality business research, leading management, and strategic management; their respective learning outcomes were measured with a questionnaire. Principal component analysis identified the conceptualised themes and measured their components with eigenvalues ≥1, which explained 66 % of the variance. The reliability of the components exceeded a Cronbach’s alpha coefficient of 0.70. Analysis of the components and of the single samples showed strong validity and reliability for the learning outcomes. Thus, we believe this study has produced a rigorous means to compare degree programme levels across countries.


Introduction
The cross-national comparison of degree programme levels became relevant when the borders of European countries opened for students and graduates, and higher education institutions (HEIs) were restructured into bachelor's and master's programmes in accordance with the Bologna Agreement, which was signed by the European Ministers of Education (1999). Huisman and Westerheijden (2010) indicate that since this European cooperation in quality assurance began, much has been realised in a new system of accreditation (European Association for Quality Assurance [ENQA] 2009) that functions at a supranational level through the development of the European Standards and Guidelines (ESG), the launch of the European Network of Quality Assurance Agencies, and the establishment of the Register of European Higher Education Quality Assurance Agencies. However, they conclude that 'there is too much stress on compliance to rigid procedures and mechanisms, at the cost of a focus on quality improvement and the learning experience' (p. 63; emphasis added). In the USA, a comparable problem is mentioned by Ewell (2010), who concluded Bedoelde u: that 'changes on quality assurance have rendered the process more intentional, more focused on undergraduate teaching and learning, and far more transparent. But the goal of providing adequate evidence of student learning remains elusive' (p. 173; emphasis added).
The Assessment of Higher Education Learning Outcomes (AHELO) project of the Organisation for Economic Cooperation and Development (OECD) started with a feasibility study (Nusche 2008) and declared that 'in most countries, assessment results are inaccessi-ble…If HEIs would specify the expected student outcomes explicitly and in a measurable way, comparative assessment of learning outcomes would become feasible' (p. 5). AHELO's full report on the feasibility study has since been published, in which they state: Testing of discipline specific skills was considered useful on a global scale but…the diversity of local contexts and disciplines would create difficulties. In general, this type of testing was thought to be easier and cheaper if you test one discipline but the costs would add up for each discipline you add to the test. Achieving consensus could be hard work but the test could prove more intrinsically interesting and engaging for the participants, provided there is no oversimplification of the test (and the results remain relevant). While several suggestions were put forward…to achieve a blended approach the most prevalent answer was to find a way to assess generic skills within a discipline context. (OECD/AHELO 2013, p. 42;emphases added) Based on the recommendations in the full report, we designed a blended procedure (the socalled Three-Step Procedure), paying particular attention to the validity and the reliability of the instruments that we selected. Indeed, 'achieving consensus could be hard work', but it leads to valid and reliable outcomes. These outcomes are achieved by a procedure, outlined in this article, which can be used for about 6 years, at which point it should be updated; this is less costly in the longer term than that suggested above in the AHELO's report.
In this transnational pilot study, our Three-Step Procedure leads to an elaboration of the degree programme level concept. The concept is operationalised using analysis-based themes and learning outcomes that characterise a typical professional bachelor's programme in Hotel Management. We conducted the research in collaboration with teachers and students from four bachelor programmes in Hotel Management from four European countries. The choice of discipline was a pragmatic one based on availability of similar programmes promising international cooperation.
To create a procedure relevant to student learning experiences, we had to reconsider how the level of degree programmes should be described and defined. Our participating graduates had completed their professional bachelor's degrees within the binary systems that prevail in these four countries. Basically higher education consists of two separate systems: one provides professional education, and the other delivers academic education that includes professional education, but on an academic level. The extent to which the two systems are separate is not absolute in the four countries. For example, Norwegian legislation provides space for an institute of professional degree programmes to become an academic institute and to award PhD degrees. The Dutch binary system, by contrast, is currently more rigid. However, various differentiations appear within the system, which suggests that the system is not static.
We proposed defining the level of a degree programme based on two crucial questions that students ask themselves when embarking on higher education: 'What do I have to learn?' and, 'What is expected from a professional in this particular field?' These questions are of paramount importance because the world that graduates enter is an international one. They can apply for jobs in other countries, which means that employers need to know the details of each graduate's degree programme. Employers want to know what the applicant has learned (i.e. 'disciplinary thinking') and how they will behave as a professional (i.e. 'professional attitude'). Thus, we consider these two aspects the basic pillars of employment and used them to conceptualise the degree programme level and thereby lay a solid base for delivering empirical evidence for a comparison of the learning outcomes of the respective programmes. In formulating the two aspects of disciplinary thinking and professional attitude, we realised the potential efficacy of a questionnaire designed for alumni; having graduated from the institutes in question, they would be credible sources of information about the opportunities they were offered during their years of higher education.
Our aim, therefore, was to create a procedure appropriate for measuring and analysing data cross-nationally. The main research question was: What procedure can lead to a valid and reliable cross-national comparison of degree programme levels?
A valid comparison requires a clear concept and unambiguous methods. Thus, in the following sections, we conceptualise 'degree programme level' and then map the critical factors that might affect the validity of a cross-national comparison. Next, under 'Methods' section, we discuss how the conceptual framework can be used to design a Three-Step Procedure focused on minimising bias and show how our procedure can provide valid empirical evidence from a cross-national comparison, which we carried out in collaboration with four bachelor programmes in Hotel Management across four European countries.

Conceptual framework
The degree programme level concept The concept of the degree programme level is connected to the mental activities of students in higher education. Vermunt (1996) studied these activities and concluded that learning in higher education taps into mental processes that can be categorised as cognitive, affective, or regulative activities. These activity types have been confirmed further by various studies such as that by Martínez-Fernández and Vermunt (2013). This study confirms that students in South American countries also use cognitive, affective, and regulative activities at learning in higher education. Cognitive activities are used by students to process content. They lay the foundation for learning results in terms of knowledge, understanding, and skills. Examples include looking for relationships between parts of the subject matter (relating) and looking for applications (applying). Affective activities are used by students to cope with the feelings that arise during their studies and may positively or negatively affect the progression of the learning process. Examples include motivating oneself, attributing learning results to causal factors, attaching subjective appraisals to learning tasks, and controlling emotions that impede learning (Liu et al. 2012). Regulative activities are used by students to organise and manage their cognitive and affective activities and therefore lead indirectly to positive learning results. Examples include monitoring the progress of a learning process, diagnosing the cause of difficulties, and adjusting learning processes when necessary. Given that the Three-Step Procedure focused on disciplinary thinking and professional attitude, we decided not to include regulative activities in the outline of the Procedure. Regulative activities are linked indirectly to the degree programme level and consequently have a different position with respect to the level. In our Three-Step Procedure, we outlined the desired learning outcomes related only to disciplinary thinking and professional attitude.
Disciplinary thinking The cognitive activities of the level concept involve the content (knowledge) of the respective discipline or domain (i.e. disciplinary thinking) (Shulman and Shulman 2004;Sternberg 2003). Disciplinary thinking includes higher-order cognitive processes such as analysing, evaluating, critical thinking, and creating (Kek and Huijser 2011;Robinson 2011;Biggs and Collis 1982;Koh et al. 2012), which are applied to complex discipline-specific problems. Biggs and Tang (2007) indicate that sound knowledge is based on interconnections and cognitive growth that lies not just in knowing more but also in restructuring and reconceptualising what is already known to connect it with new knowledge.
Professional attitude The affective activities of the level concept include the main emotive characteristics of the studied discipline or domain, which we call the professional attitude, that is, the accuracy of the bookkeeper, the conscientiousness of the academic worker, or the discretion of the nurse (O'Connor and Paunonen 2007). The affective aspect of the level subsumes the professional attitude and refers to the most emotive characteristic of the domain in which the programme exists.
We explore the level of disciplinary thinking and professional attitude under 'Instrumentation' section in the 'Methods' section, below.

Critical factors for validity
It is of utmost importance that the comparison between the levels in different countries is valid. Therefore, 'the beast of bias' must be recognised and minimised (Couper and de Leeuw 2003, p. 173;Field 2013, p. 163). Validity means that what was aimed for conceptually was actually measured. Bias refers to the presence of confounding factors that challenge the comparability of measurements across national groups. According to Harkness et al. (2003a, p. 13), construct bias and method bias are the critical factors for achieving validity in a cross-national project.
Construct bias occurs when the construct being measured is not identical across groups. It can be recognised by overlaps in the definitions of the construct across cultures or by incomplete coverage of all relevant aspects of the construct (Harkness et al. 2003b, p. 145). Construct validity is indicated by the distinction between the constructs, the strength of the loadings, and the extent of reliability.
Bias in the methods can be caused by the type of measurement, ambiguous instructions for the respondents, poor translations, and uncertainty about the meaning of terms. Method bias can also result from such factors as sample incomparability, instrument differences, tester and interviewer effects, and the mode of administration (e.g. communication problems and differential familiarity with material). For these reasons, method bias is not a concern of test developers, administrators, or data analysts exclusively but also applies to graduate coordinators, teachers, students, and other members of the educational and examination committees that participate in this type of study (Van de Vijver and Leung 1997, p. 11).
Cross-national comparisons remain complex because of the many triggers for bias (Van de Vijver 2003). In this study, we aimed to minimise bias by carefully designing an outline of a Three-Step Procedure based on a clear concept of the degree programme level in higher education. We believe that a valid measurement requires a clear definition of the concept being used (Koh et al. 2012).

Methods
Having conceptualised the degree programme level in terms of disciplinary thinking and professional attitude, we designed the Three-Step Procedure and carried it out in collaboration with four bachelor programmes from four European countries. Each of the three steps aims to deliver outcomes that validly and reliably reflect the levels of the participating degree programmes to facilitate a cross-national comparison. In this section, we introduce our participants and explain our instrumentation.
This study was initiated by the professional Hotel Management bachelor's programme in a large institute of higher professional education in the Netherlands (i.e. the Dutch HBO). For external quality assurance purposes, this institute required reliable data that reflected the level of the Hotel Management degree programme. Hotel Management schools are inherently internationally oriented and are interested in a rigorous cross-national comparison of their degree programme levels with those of schools in other countries. Additionally, this school was considering developing a more in-depth collaboration with an equivalent institution abroad, including a possible joint degree.

Participants
In total, 783 participants from four Hotel Management bachelor's programmes were involved in this study. The programmes were based in Austria, Belgium, the Netherlands, and Norway. The participants comprised the four deans of the degree programmes; 13 teachers who were members of educational and/or examination committees, some of whom were experts in the professional and/or scientific domain; two members of a central test office; 12 final year students; 18 stakeholders, including teachers and graduate coordinators; one project manager; and 733 graduates. The participating graduates had completed their professional bachelor's degrees within the binary systems that exist in these four countries.

SOLO taxonomy for disciplinary thinking
The taxonomy of the Structure of the Observed Learning Outcome (Biggs and Tang 2007), or SOLO, classifies the learning outcomes related to disciplinary thinking in terms of their complexity (Table 1). SOLO consists of five levels of a student's understanding in a domain that is new for them. The levels are distinguished from one another and reflect increasing structural complexity. Furthermore, students learn disciplinary thinking in two stages: one quantitative and the other qualitative. Students start learning in the quantitative stage, which contains three of the five SOLO levels: pre-structural, uni-structural, and multi-structural. At the pre-structural level students demonstrate almost no understanding of the task and might use tautology to cover this deficiency. At the uni-structural level, students concentrate on a part of the information and thus their conclusions are limited. The multi-structural level ranges from picking up several aspects of the domain information but without elaborating, to picking up many aspects and explaining them. This is illustrated by an example from a study of programmers (Lister et al. 2006), in which the programming students had to learn how to understand codes. Data were collected in the form of written and think-aloud responses from Relational. The student is skilled in relating parts of domain knowledge (e.g. by comparing, contrasting, and explaining causes). This is the first level of understanding in higher education. The student is capable of placing domain knowledge in a perspective. They think more as a professional in the domain. They make initially limited, and then more refined generalisations of ideas.
Apply, integrate, analyse, explain, conclude, review, argue, transfer, make a plan, debate, construct, solve a problem 3 Multi-structural. The student's learning results range from picking up a number of independent facets (of the domain knowledge) without elaborating on them, to picking up many facets and explaining them. The student tells what they know (knowing-telling).
Classify, describe, report, discuss, illustrate, select, compute, sequence, outline, separate 2 Uni-structural. The student focuses on a part of the (domain) information or task so that their conclusion is limited and probably dogmatic. They are able to recognise, identify, and define one facet.
Write, label, count, find, match, memorise, quote 1 Pre-structural. The student misses the quintessence of the (higher education) task or question. They demonstrate hardly any understanding of the question and might use tautology to cover their lack of understanding.
Show little evidence of relevant learning students (novices) and educators (experts), using examination questions. Lister et al. (2006) formulate this by offering another way of describing the multi-structural level of understanding: 'The multi-structural SOLO Response is a response where the student manifests an understanding of all parts of the problem, but does not manifest an awareness of the relationships between these parts-the student fails to see the forest for the trees' (p. 119). The qualitative stage comprises two of the five SOLO levels. The relational level is the first level that is relevant for higher education. Students are capable of relating parts of the domain knowledge and placing them in an appropriate context. They start thinking and understanding as professionals who integrate the parts of the problem into a coherent structure and use that structure to solve a given task. Finally, at the extended abstract level, students are able to transform declarative knowledge into functional knowledge. They are able to theorise, generalise, and reflect across the borders of the relational level and the domain: 'The coherent whole is conceptualised at a higher level of abstraction and is applied to new and broader domains… The trouble is that today's extended abstract is tomorrow's relational' (Biggs and Tang 2007, p. 78).

PA taxonomy for professional attitude
Professional attitude is the affective aspect of the level concept. It involves essential affective characteristics of the profession or domain in which a particular higher education programme exists. To measure the learning outcomes for the expression of professional attitude, we constructed the Taxonomy for Professional Attitude (PA) ( Table 2). This taxonomy provides a means of indicating how a learner's attitude develops in complexity when learning the affective aspects of a domain or profession.
The SOLO taxonomy was a source of inspiration for the development of our PA; for the taxonomy content, we drew on the taxonomy of Krathwohl, Bloom, and Masia (1974, p. 107-170) and on Zimmerman's (2006) self-regulation cycles. The learner's development runs through five levels. At the first level, a student becomes aware of the features of the professional attitude for which they are being educated. Then they 'accept' by moving toward Table 2 Taxonomy for professional attitude (PA) PA level and description 5 Internalising. At the highest level, the student is able to place most features of the professional attitude consistently into control of their own behaviour. Their behaviour consistently incorporates the characteristics of the domain's professional attitude.
one or more features as intended, but in a very inconsistent manner. At the third level, a student gradually 'demonstrates', albeit inconsistently, characteristics of a specific attitude. At the fourth level, a student 'integrates' more features into a consistent behaviour pattern. At the highest level, a student 'internalises' the various features of professional attitude, which means that they consistently place those features into control of their own attitude. The degree of complexity in the attitudinal development as indicated in this PA taxonomy was tested for inter-rater reliability; a Cohen's kappa coefficient of 0.83 indicated strong reliability (Bryman 2012, p. 280).

Questionnaire
After the completion of their degree at an institute for higher professional education, graduates received an e-mail invitation to participate in our study. Using a questionnaire, we solicited the graduates' opinions about the acquisition of disciplinary thinking and professional attitudes; the items were based on both the SOLO and the PA taxonomies and reflected the learning outcomes of the desired level of disciplinary thinking and professional attitudes. The existing questionnaire had been used for data collection in four different countries, with different languages and cultures, albeit within the common context of Europe. Thus, there was a great risk that the questions could be misinterpreted. To minimise the threat of construct and method bias, we pretested the draft questionnaire extensively. For this pretest, we made use of the Questionnaire appraisal coding system (Snijkers 2002; Van de Vijver and Leung 1997), a necessary instrument for minimising question ambiguity in an international survey. The draft questionnaire was pretested by teachers and final-year students from the participating programmes. The pretest took the form of an in-depth interview conducted by a teacher from a participating degree programme with individual students as interviewees. The students answered each question and the interviewer observed the question-and-answer process, while using the questionnaire appraisal coding system to detect possible problems (Table 3). The interviewees thought aloud and often made suggestions for improvements to the questionnaire.
They were asked to indicate the extent to which they had mastered the learning outcomes in their respective programmes. It was most important that the students immediately understood the intended meaning of each question; if they hesitated, it suggested that there might be ambiguities in the wording of the questionnaire.

Three-Step Procedure
We started the development of the Three-Step Procedure by conceptualising the degree programme level from a learning psychology perspective, which resulted in the two aspects, disciplinary thinking and professional attitude. These aspects had to be operationalised to reflect the specific domain of Hotel Management. Following this, we measured, analysed, and compared the degree programme level(s). The final design of a Three-Step Procedure that focuses on minimising or avoiding ambiguities comprised (Table 4): Step 1. Operationalising the degree programme level concept: What is the content of a degree programme and which learning outcomes should be discerned?
Step 2. Measuring and analysing the degree programme levels: How to organise the data collection related to the student performances Step 3. Cross-national comparison of the degree programme levels: What conclusion(s) can be drawn?

Results
The 'Results' section is composed of an explication of the three steps.
Step 1: Operationalising the degree programme level concept The aim of the first step was to determine what content was relevant for the level of the degree programmes. To this end, we analysed various related contexts, created and affirmed themes, and developed and validated learning outcomes. Step 1 Operationalising the degree programme level concept Analysing contexts Creating and confirming themes Developing and validating learning outcomes Step 2 Measuring and analysing the degree programme levels Selecting and constructing a measuring instrument Pre-testing and adjusting the questionnaire Measuring the level of the degree programmes Analysing representativeness, construct validity, and reliability Analysing the components of the full sample and the single samples Determining the validity and reliability of the degree programme levels Step 3 Cross-national comparison of the degree programme levels Calculating and presenting the cross-national comparison Discussing the outcomes of each programme Conclusion The validity and reliability of the cross-national comparison Analysing contexts. Experts of the participating programmes carried out a study of four contexts: the (1) professional and (2) academic domains; the (3) external surroundings of the domain, for example, a professional organisation, a council of the sector, or specific legislation; and the (4) current curriculum. The experts analysed these areas and proposed six themes.
Creating themes. The proposed themes were substantiated by peer-reviewed literature, to improve objectivity and transparency. These themes were then mapped and discussed by the experts, a process that produced newly refined themes to be used in the next step: hospitality business research, professional management, internationalisation, leading management, customer service, and strategic management.
Affirming themes. The aim of this step was to facilitate agreement about the validity of the themes. The themes created by the experts were presented to various stakeholders from the participating programmes, namely, managers involved in educational and examination committees, students, and teachers. They came to an agreement about four themes underlying a bachelor's curriculum in Hotel Management: hospitality business research, professional management, leading management, and strategic management ( Table 5). The stakeholders selected This research is related to marketing and finance, specifically for the hospitality sector. The most important topics in hospitality marketing today are consumer behaviour, service management, and e-marketing. Hospitality finance includes several aspects, such as risk management, financing, bankruptcy, and capital structure.

Yoo et al. 2011 Jang and Park 2011
Professional Management Introductory Hotel Management and processes of self-regulation are combined in this theme. Self-regulation concerns the personal and professional growth of the hotel manager that is crucial for the continuation of the company or hotel. It has been established by the educational and examination committees of the degree programmes that hospitality is the main emotive characteristic of the hotel manager's professional attitude. The affective aspects are expressed by adequately operating with guests from different countries, managing in a results-oriented way to obtain satisfied guests, and acquiring staff possessing the ability to deliver hospitality.

Watson 2008 Lee et al. 2011
Leading Management This management requires the capacity to deal with complicated situations that may occur with guests and colleagues. Leading management asks for curriculum topics that are often difficult to realise in degree programmes and are also included in other business core disciplines such as accounting, finance, and human resource management. The hospitality manager should acquire an international orientation and knowledge about cross-cultural differences between guests and staff members.

Becket and Brookes 2008
Strategic Management Strategic management refers to management that is grounded in well-considered policy. It implies clarity about the organisation's objectives and has become a key subject in many undergraduate and postgraduate programmes in hospitality schools worldwide. The main curriculum topics are organisational strategy and how to deal with uncertainty in the environment and with changes or differences, while concurrently enhancing effectiveness.

Okumus and Wong 2005 Harrington and
Ottenbacher 2011 these four themes based on their high degree of relevance to the programme level, having rated the six themes on a five-point scale ranging from 'not relevant' to 'very relevant'. Face validity was employed in accordance with Kane's (2006) definition. Developing learning outcomes. Once the themes were accepted, they were categorised into intended learning outcomes and worded from the students' perspectives. The SOLO taxonomy was used for the learning outcomes that reflected the necessary level of disciplinary thinking (Table 1), and the PA taxonomy was used for the learning outcomes that reflected the desired level of professional attitude (Table 2). It was decided that the 'relational' SOLO level (Table 1) was necessary for the Hotel Management professional bachelor's programmes because at this level the students demonstrate higher understanding and begin to think as professionals in their domain (Biggs and Tang 2007). Furthermore, it was decided that the PA 'attitude' level (Table 2), characterised by 'integrating', and preferably 'internalising', was necessary for the same programmes; at this level, the students are able to integrate aspects of professional attitudes without inconsistencies.
Validating learning outcomes. The description of the developed learning outcomes was discussed and finally assessed by other expert teachers and members of examination or curriculum committees. They indicated the degree to which the learning outcomes addressed the respective themes. Based on the two taxonomies, 31 learning outcomes were developed, discussed, and finally assessed by the expert teachers for content validity, resulting in a Cohen's kappa coefficient of 0.71, which signified good inter-rater reliability (Bryman 2012, p. 280).
Step 2: Measuring and analysing the degree programme levels The aim of this step was to collect 'unbiased' clean data that were suitable for proper analysis. We collected this data using our own questionnaire consisting of five-point Likert rating scales (See Appendix 2 for the items in a shortened formulation).
Selecting and constructing a measuring instrument. The potential for bias in a questionnaire exists in both the constructs and the methods. Construct bias often results from misinterpretations of the questions. It can be identified as overlaps in the definitions of the constructs across cultures or as incomplete coverage of all relevant aspects of the construct. Method bias can be caused by factors such as poor translations and uncertainty about the meaning of terms. We assumed that these types of biases would be best managed by conducting an extended pretest.
Pretesting and adjusting the questionnaire. The draft questionnaire was pretested on 12 students in their final year from the four bachelor programme Hotel Management from the four countries Austria, Belgium, The Netherlands, and Norway. The pretest was supported by an adapted Questionnaire appraisal coding system (Table 3).
Measuring the level of the degree programmes. The managers of the four participating bachelor programmes invited recent graduates (no more than 1.5 years prior) to participate in this study. In their letters of invitation, they provided the internet address for the questionnaire, along with a unique access code. They assured the participants that their answers would be anonymised.
We analysed the data from the completed questionnaires to establish the representativeness of the sample, the construct validity, and the reliability of the created themes forming the components of the degree programmes.
Representativeness. Representativeness refers to quantitative as well as qualitative characteristics of a sample. As mentioned, we drew our sample from the population of recent graduates. We chose the criterion 'recent graduates' to reduce the potential for bias introduced by influences other than the degree programme. We excluded participants who had graduated more than 1.5 years prior. Using this criterion, a total of 733 graduates were invited to participate in the survey. The gross number of respondents was 535 (73 %). The response number across the four degree programmes ranged between 154 (21 %) and 124 (17 %). However, 142 respondents were eliminated: 107 appeared to have graduated more than 1.5 years before, and another 35 did not complete their questionnaire (30 % or more missing values, or 9 or more unanswered questions out of 31). The net number of respondents was 393. This number was sufficient for a quantitative pilot study (Snijkers 2002, p. 65), in that it could generate a meaningful analysis of construct validity and reliability (Field 2013) (Appendix 2).
The qualitative data from the responding recent graduates are presented in Appendix 1, which demonstrates the even distribution of respondents across the four degree programmes (about 98 (25 %) per programme). Of these respondents, 94 graduated (24 %) from the Belgian programme, 95 (24 %) from the Norwegian programme, 100 (25 %) from the Austrian programme, and 104 (26 %) from the Dutch programme. Of the 393 respondents, 364 (93 %) graduated from bachelor degree programmes oriented to Hotel and Hospitality Management. For 188 (48 %) of the graduates, the time to completion of their degrees was 3 years, while 179 (46 %) took 4 years; 206 (52 %) graduates worked inside and 122 (31 %) outside the hospitality industry during their time in the programme. Recent graduates who filled a managerial position were 107 (27 %) versus 286 (73 %) who did not. Thus, the sample also matches the group's characteristics (Appendix 1).
Degree programme. Most of the respondents confirmed that they had graduated from a Hotel Management programme or an international hospitality management school. While most of the Austrian participants had graduated in Travel and Tourism, their programme involved a strong focus on Hotel Management. Furthermore, some of the Austrian respondents indicated that they had graduated in applied science in Hospitality administration. Thus, most of the respondents graduated from a programme within the same general domain.
Duration of study. The Belgian and Norwegian bachelor's programmes took 3 years, while the Austrian and Dutch lasted 4 years.
Type of organisation. Most respondents (65 %) from the Belgian programme worked in a hotel; some of the Austrian (41 %), Dutch (43 %), and Norwegian (26 %) respondents worked outside the hotel domain.
Graduates' positions. The meaning of 'manager' is limited to positions with executive responsibilities (e.g. directing, governing, and making decisions). An example of a nonmanagerial position is a food and beverage controller. Two hundred eighty-six (73 %) of the 393 responding graduates from the participating degree programmes worked in a nonmanagerial position. Graduates from the Austrian (40 %) and Dutch (28 %) degree programmes worked in managerial positions. Appendix 1 gives an overview of these features of the sample.
Construct validity. The full sample of the four degree programmes (N = 393) was analysed using principal component analysis (PCA), which is appropriate for exploratory data, and assessment and evaluation of treatments. We identified the four conceptualised themes, i.e. Hospitality Business Research, Professional Management, Leading Management, and Strategic Management, and measured four components with eigenvalues of more than 1 and explained 66 % of variance, which is a good percentage for a cross-national study. Consequently, the conceptualised themes were affirmed as existing constructs (Creswell 2007); from the original learning outcomes, 23 with loadings ≥.40 remained, meaning a reduction of eight items. The significant loadings ranged from 0.54 to 0.88, a good result in the stable sample of N = 393 (Field 2013). The loadings were high and the components demonstrated no overlap. The data of the full sample met the criteria of construct validity, suggesting that there was sufficient evidence for claiming that the content of the test corresponded to the content of the construct that it was designed to cover (Field 2013, p. 783). (Appendix 1) Reliability. The components were measured for scale reliability using Cronbach's alpha coefficient. The alpha coefficients exceeded 0.70, meeting the norm of 0.70 for measurements in groups (Committee On Test Affairs Netherlands [COTAN] 2011) (Appendix 2).
Analysis of the components. The first component, Professional Management, explained 45.59 % of the variance; the loadings ranged from 0.60 to 0.80 which was good, as shown at the bottom of Appendix 2. Conclusions have to be drawn on this sound measurement of the learning outcomes, which are confirmed by their high scale reliability (α = 0.91).
The second component, Hospitality Business Research, explained 9.15 % of the variance; it included strong loadings that ranged from 0.83 to 0.88 and sufficient loadings from 0.62 to 0.63. Hospitality Business Research referred to sound measurement of the learning outcomes, which was confirmed by a good-scale reliability measure (α = 0.90).
The third component, Leading Management, included five learning outcomes that explained 5.89 % of the variance. This component included more than sufficient loadings, ranging from 0.62 to 0.77, and good-scale reliability (α = 0.82).
The fourth and final measured component, Strategic Management, included three learning outcomes that explained 5.07 % variance. The loadings were good for a fourth component; the scale reliability was sufficient (α = 0.80).
Analysis of the single samples. The single samples of the four degree programmes (N = 94-104) were also analysed using PCA. The four themes were measured in components with eigenvalues ≥1 and explained 69-80 % of the variance (Austria 69 %, the Netherlands 70 %, Norway 73 %, Belgium 80 %), which were good percentages. The data from the single samples met the criteria of construct validity and of scale reliability (0.70-0.93), suggesting that there was sufficient evidence to support the claim that the content of the test corresponded in the different countries to the content of the construct it was designed to represent (Field 2013, p. 13).
Determining the degree programme level. The analysis of the components of the full as well as the single samples indicated that the measured outcomes were valid and reliable, which is a necessary foundation for the third step in our Three-Step Procedure.
Step 3. Cross-national comparison of degree programme levels For calculating and presenting the cross-national comparison of the levels of the four programmes, we proposed the following. The level achieved by the students was calculated using the grand mean and grand standard deviation of the themes in this study (Table 5). The respondents indicated on a five-point Likert scale the extent to which they had mastered the intended learning outcomes, from 'too little' through 'somewhat' to 'more than satisfactory'. Most gave ratings of about 2.00 and 3.00, which was not high. The standard deviation was often ≥1, which indicated a wider spread than desired and produced ambiguous interpretations. To overcome this problem, we calculated z-scores from the grand mean and grand standard deviation. The norm was established rather low (≥3.00) because it was the first time that the level had been conceptualised using taxonomies for both disciplinary thinking and professional attitude. The difference between the norm and the grand mean was divided by the grand standard deviation. This computational model makes the outcomes comparable: [norm − grand mean / grand standard deviation = z]. This resulting z-score was converted with the support of the cumulative standard normal distribution as a percentage that reflects the degree to which the level component has been achieved. It was determined that the level had been achieved if ≥50 % had indicated that they had mastered the component in question.

Discussion
What have students learnt?
The outcomes of the cross-national comparison demonstrate that students from three countries/ degree programmes (Austria, the Netherlands, and Belgium) successfully achieved the learning outcomes represented in the theme, Professional Management, while students from two countries (Austria and the Netherlands) achieved those of the theme, Leading Management. Professional Management includes introductory aspects of the 'professional attitude' of the hotel manager, and Leading Management involves dealing with more complicated situations that may occur with guests and colleagues. Both themes involve 'professional attitude' skills and deliver valid and reliable results.
However, the outcomes from the programmes in the Netherlands, Belgium, and Norway indicate less success for the theme, Hospitality Business Research. All the programmes, moreover, were less successful with the theme, Strategic Management. Hospitality Business Research involves the most theoretical aspects of the programmes and relates most strongly to 'disciplinary thinking'. Strategic Management implies research-based action and reflection that is often aligned with Hospitality Business Research. Neither theme may seem obvious for a professional bachelor's programme, a problematic assumption, given that these qualities are so necessary. Employers appreciate and need valid and reliable outcomes from Hospitality Business Research (Appendix 2). The theoretical themes are characteristics for Higher Education.
We conclude that most of the problems in these programmes are related to 'disciplinary thinking' (Hospitality Business Research, Strategic Management) rather than 'professional attitude' (Professional Management and Leading Management).
Degree programmes in Norway and Belgium had low scores in this pilot study, outcomes that we verified with the relevant participating institutions. It appeared that one of the programmes was in the process of dissolving, while the other programme was more practice-oriented than their stated level indicated. This information helped explain and affirm our results. The outcomes of the degree programmes in Austria and the Netherlands met the norm.
The Austrian and Dutch degree programmes were the most successful in this study (Table 6), as indicated by the percentages of graduates who fill managerial positions with executive responsibilities (directing, governing, and making decisions): 40 % of the Austrian graduates, 28 % of the Dutch graduates, 27 % of the Norwegian graduates, and 13 % of the Belgian graduates (Appendix 1).

What is crucial for students' learning?
Having discussed our results with the managers of the participating institutes, we sought to interpret the results and to draw conclusions from them. We argue that the regulative activities of teaching are crucial for students' successful achievement of disciplinary thinking and professional attitude at a professional bachelor's level. Students can learn the higher-order thinking processes required for solving complex problems from a discipline or domain and its main as well as its emotive characteristics, if the teaching strategies are suitable for students' learning activities. According to Vermunt and Verloop (1999), the development of learning and thinking activities does not occur if the regulative teaching activities are not compatible with the required regulative learning activities. For example, in higher professional education, students' learning styles are often application-directed in nature. By contrast, learning activities like concretising and applying are learner-initiated. Many teachers in higher professional education employ application-oriented teaching methods: they give many tasks, questions, and assignments gm grand means, gsd grand standard deviations, % converted z-scores of respondents scoring ≥3.00 from the four degree programmes and full sample in which students are asked for possible examples and applications of what they learn. It seems superfluous to stimulate students to employ learning activities that they use of their own initiative. In these situations, other learning activities such as structuring concepts, relating theories, and critically processing ideas, are often left out of the learning process; thus, students do not learn to initiate them, and nor do teachers stimulate students to use them. For students to learn disciplinary thinking and professional attitude, then, the more suitable teaching strategy is one with a 'shared form of regulation', as opposed to one with a 'strong teacher regulation'.

Conclusion
This study started with the research question: 'What procedure can lead to a valid and reliable cross-national comparison of degree programme levels?' To answer this question, we had to establish the concept of a degree programme level and understand what the most salient threats were to a valid and reliable measure of degree programme levels. The level was conceptualised as comprising a cognitive aspect (i.e. 'disciplinary thinking') and an affective aspect (i.e. 'professional attitude'). Construct bias and method bias were identified as the most relevant risk factors for validity and reliability in cross-national comparisons. Based on this understanding, steps were developed to make the degree programme level concept measurable (step 1), to measure it as accurately as possible, and to analyse the validity and reliability of the outcomes. Once the outcomes met these criteria and a representative sample was assembled, the data could be used for the next step of measuring and analysing (step 2). Following this process, calculations were made using the valid and reliable data and the data were presented for the cross-national comparison of degree programme levels (step 3). This Three-Step Procedure was carried out in collaboration with teachers and students from four bachelor programmes in Hotel Management in four European countries; the outcomes were deemed valid and reliable based on a representative sample.
While this Three-Step Procedure was developed in cooperation with degree programmes in the hospitality domain, we believe that its general outline makes it applicable in other domains in higher professional education. Broadly, the procedure focuses on a learning-psychology perspective, creates necessary themes for evaluating the content of a programme, and uses two taxonomies that deal with disciplinary thinking and professional attitude. Based on these taxonomies, learning outcomes were developed for use (in this case, in Hotel Management) in the construction of a questionnaire. The data were analysed for representativeness, construct validity, and reliability. The Three-Step Procedure facilitates, as we have concluded, a successful and valid comparison of degree programme levels.