Describing a new food group classification system for UK biobank: analysis of food groups and sources of macro- and micronutrients in 208,200 participants

Purpose The UK Biobank study collected detailed dietary data using a web-based self-administered 24 h assessment tool, the Oxford WebQ. We aimed to describe a comprehensive food grouping system for this questionnaire and to report dietary intakes and key sources of selected nutrients by sex and education. Methods Participants with at least one valid 24-h questionnaire were included (n = 208,200). Dietary data were grouped based on the presence of nutrients as well as culinary use, processing, and plant/animal origin. For each food group, we calculated the contribution to energy intake, key macronutrients, and micronutrients. We also identified the top contributors to energy intake, free sugars and saturated fat by sex and education. Results From the 93 food groups, the top five contributors to energy intake (in descending order) were: desserts/cakes/pastries; white bread; white pasta/rice; bananas/other fruit; semi-skimmed milk. Wine, beer, and fruit juices were the top beverage contributors to overall energy intake. Biscuits, and desserts/cakes/pastries were the highest contributors to free sugars, total fat, and saturated fat intakes, but also contributed to the calcium and iron intakes. Top contributors to energy, saturated fat, and free sugars were broadly similar by sex and education category, with small differences in average nutrient intakes across the population. Conclusion This new food classification system will support the growing interest in the associations between food groups and health outcomes and the development of food-based dietary guidelines. Food group variables will be available to all users of the UK Biobank WebQ questionnaire. Supplementary Information The online version contains supplementary material available at 10.1007/s00394-021-02535-x.


Introduction
A poor diet is one of the most important modifiable risk factors for chronic disease, especially cardiovascular disease, diabetes, and cancer [1,2]. Most of the existing evidence relates to associations between individual nutrients or foods and health outcomes, and hence dietary guidelines have traditionally been based on nutrient recommendations. However, people consume multiple foods and nutrients that may interact with each other, and studying overall dietary patterns may be more relevant to understand health risk [3]. Accordingly, there is growing interest in developing food-based dietary guidelines, which may also be easier to communicate to consumers to support changes towards more healthful dietary patterns.
Dietary habits are shaped by individual preferences, social context and cultural norms and may reveal differences not apparent in nutrient intakes. A previous study using purchasing data from UK households showed that the overall saturated fat content of purchases did not differ between socio-economic groups, however, there was a higher proportion of energy from cheese and dairy among higher socioeconomic status (SES) households, but a higher proportion of energy from sweet snacks and puddings among lower SES households [4]. These differences in food patterns may be associated with different health outcomes since dairy products will also contribute important micronutrients such as calcium, while sweet snacks will also be high in free sugars. Studying the selection of foods and drinks also provides insights into eating behaviours which might offer new approaches to interventions to support a healthier diet.
The UK Biobank study collected detailed measures of dietary intake using a web-based self-administered 24 h dietary assessment tool, the Oxford WebQ [5,6]. Although this is a unique resource for the study of diet and disease risk, this dietary questionnaire is not easy to work with because it does not have a comprehensive food grouping system. The aims of this study were: 1) to describe the development of a food grouping system to classify the foods consumed by the UK Biobank participants; 2) to describe the major food sources of energy, macro-and main micronutrients in the whole population, as well as by sex and educational attainment.

Study Population
The UK Biobank study is a national prospective cohort in the UK involving 502,655 participants aged 40-69 years at baseline who were recruited between 2006 and 2010 [7]. Approximately 9.2 million eligible adults living within 25 miles of the UK Biobank assessment centres (England, Wales and Scotland) were invited by letter through NHS central registries [8]. Participants who volunteered to take part (5.5% response rate) completed a full baseline assessment with self-reported measurements via touch-screen questionnaires as well as a verbal interview. A wide range of information on socio-demographic factors, lifestyle, and behavioural factors including a short food-frequency questionnaire and medical history were collected, along with physical measurements (such as height and weight), blood and urine samples.
UK Biobank protocols and study details can be found elsewhere (http:// www. ukbio bank. ac. uk/ wp-conte nt/ uploa ds/ 2011/ 11/ UK-Bioba nk-Proto col. pdf) [7]. The UK Biobank study was conducted according to the Declaration of Helsinki and ethical approval was granted by the North West Multi-Centre Research Ethics Committee (reference number 06/MRE08/65). At recruitment, all participants gave informed consent to participate and be followed-up through data-linkage.

Dietary assessment -Oxford WebQ
Towards the end of the baseline assessment period (April 2009-September 2010), the UK Biobank started collecting detailed dietary intake measures using a web-based selfadministered 24 h dietary assessment, the Oxford WebQ. The Oxford WebQ was completed by 70,724 participants attending their baseline assessment. Additionally, between 2011 and 2012 all participants with valid email addresses (n = 331,013) were invited to complete the Oxford WebQ on four separate occasions. This was done every 3-4 months on variable days to maximise the coverage by season and day of the week (cycle 1: February 2011 to April 2011; cycle 2: June 2011 to September 2011; cycle 3: October 2011 to December 2011; cycle 4: April 2012 to June 2012). Approximately 53% of the participants (n = 176,012) who were contacted by email completed at least one assessment, with a total of 211,050 participants completing at least one dietary assessment either online or at the baseline assessment [6].
The Oxford WebQ collects information on foods and beverages consumed over the previous day. Participants were presented with a list of up to 206 foods and 32 beverages commonly consumed in the UK and selected the number of portions consumed from each food. This food list was constructed using information from the UK National Diet and Nutrition Survey (NDNS) as well as a pilot study [5]. At the start of the questionnaire, participants indicated whether their diet over the previous day was typical and if they were following a special diet. Descriptions and help sections were used to help estimating portion sizes of foods (e.g. slices, cups, servings) and participants were asked to report ingredients of composite dishes separately. Total energy and nutrient intakes were generated by multiplying the number of portions consumed by the set quantity of each food portion size and its nutrient composition obtained from the UK Nutrient Databank Food Composition Tables (FCT) from survey year 6, (2012-2013 and 2013-2014) [9][10][11]. Dietary fibre was calculated using the Englyst method [12], which includes non-starch polysaccharides but not lignin and resistant starches.

Food group system
We classified the 206 foods and 32 beverages reported in the Oxford WebQ into 93 groups (79 food and 14 beverage groups) belonging to 15 main food categories (13 food and 2 beverage categories, Supplemental Table S1). This food group system was mainly based on the classification used in the UK NDNS but many food groups were further disaggregated to offer the potential to investigate a variety of specific research questions which are related to nutrients which may be differently related to health when consumed as part of different foods; such as free sugars, saturated fat or fibre, as well as culinary use; and the extent of processing or plant/animal origin. For cereal and cereal products, we separated breads and pasta by fibre content (e.g. white and wholemeal), while breakfast cereals were divided by type of cereal (e.g. oat and wheat based) and sugar content. Mixed dishes were divided into pizza, other cereal-based dishes with added fat, Indian snacks, sushi and soups, recognising the differences in fat content of each dish as well as their different cultural and culinary roles. For dairy and dairy products, we separated milk, cheese and yogurt by fat content (e.g. higher and medium/ lower fat). Fats, butter and spreads were separated by fat content but also by the source of fat (e.g. animal and vegetable fat). Meat and meat products were divided by the type of animal (e.g. beef and pork) except for the group processed meat which may include more than one animal source. Fish and fish dishes were separated into white fish/ tinned tuna, oily fish and battered fish. Meat substitutes were divided into soy-based and other vegetarian meals. Vegetables were separated into groups which considered micronutrient, carbohydrate or protein content (e.g. green leafy/cabbage, root (excluding potatoes), tomatoes, allium, legumes) as well as fat content (e.g. baked/boiled potatoes; mashed potatoes and fried/roast potatoes). Fruits were grouped according to micronutrient content (e.g. citrus; berries and apples/pears), as well as processing (e.g. dried and stewed fruit). Nuts and seeds were divided according to the salt content. Sugary foods were divided into groups reflecting differences in consumption: added sugars/preserves (including table sugar); chocolate confectionery; other sweets (non-chocolate); biscuits/cookies; desserts, cakes and pastries; milk-based desserts; soy-based desserts; and sweet spreads (including chocolate and peanutbutter spreads). Sauces and condiments were divided by the fat content. Beverages were first separated into alcoholic and non-alcoholic. Non-alcoholic beverages were further separated by the caffeine content, sugar content, or presence of milk. Where possible, milk added to coffee/tea and porridge was disaggregated to be included in the milk group. For other milk-based drinks this was not possible and they were categorised in their own group. Alcoholic drinks were then divided into wine, beer/cider, and spirits.
The final food group classification was further refined and consolidated following consultation with three nutrition scientists from the UK with experience in nutritional epidemiology or public health policy.

Exclusions
Participants who completed a minimum of one valid WebQs were included. Participants with implausible energy intakes (over-and under-reporters) were excluded before analysis. We used the individualised method to calculate the ratio of reported energy intake (EI) to estimated energy requirement (EER) (EI:EER), where EERs were calculated using the Schofield Equation [13]. 95% CIs were calculated to classify individuals as plausible reporters (EI:EER within the 95% CIs), over-reporters (EI:EER > upper 95% CI) or under-reporters (EI:EER < lower 95% CI).

Demographic and lifestyle characteristics
Demographic and lifestyle characteristics were collected at baseline using a touchscreen questionnaire and were categorised as: White ethnicity vs other (including Asian, Black, mixed background); education was classified as higher degree (college or university degree, or professional qualifications) vs any school degree (A levels, AS levels, O levels, GCSEs or CSEs) vs vocational qualifications (NVQ, HND or HNC), vs no qualifications or other not classified elsewhere; physical activity was categorised as high (≥ 3000 metabolic equivalent (MET)-minutes per week) vs moderate (≥ 600 and < 3000 metabolic equivalent (MET)-minutes per week) vs low (< 600 metabolic equivalent (MET)-minutes per week); smoking was categorised as never, current, previous; alcohol intake was categorised as 5 + units/week, 1-4 units/week, < 1 units /week, never. Body mass index (BMI) was calculated using the measured height and weight and categorised as: underweight [< 18.5 kg/m 2 ), healthy weight [18.5

Statistical analyses
We calculated the individual's mean intakes from all their completed dietary assessments. Descriptive statistics (crude means, SD) were computed for total daily energy (kJ/day), weight or volume of food and beverages (g or ml/day), as well as for macronutrients (g/day) and micronutrients (mg or µg/day). For each food and beverage group, we calculated per capita intakes as well as mean intakes among consumers only, including only the participants consuming more than 0 g or kJ from each food group. To identify the major sources of energy intake, the per capita contribution of each food or beverage group to total daily intake was calculated as the energy consumed from each food group divided by the total daily energy. We finally investigated the top food group sources of energy, saturated fat, and free sugars by sex and education status. Stata version 14 (StataCorp LP) was used for all analysis.

Results
A final sample of 208,200 participants, 56.1 (SD 7.9) years old at recruitment, was included in the analysis after the exclusion of participants who did not provide any dietary data (n = 292,136) and under-or over-reporters (n = 2,319). The majority of the study population was white (95%), 48% reported a higher education degree, 36% reported high levels of physical activity, 8% were current smokers, and 23% drank 5 + alcohol units/week. The mean body mass index was 27 (SD 5) kg/m 2 at recruitment, with 42% classified as overweight and 21% as obese (Supplemental Table S2).

Top sources of energy and macroand micronutrients
The ten major contributors to EI per capita (in descending order) were (Table 1): desserts, cakes, and pastries (5.2% EI); white bread (3.7% EI); white pasta & rice (3.1% EI); bananas & other fruits (other than citrus, berries, apples and pears, 3.1%EI); semi-skimmed milk (3% EI); biscuits (2.9% EI), fried/roast potatoes (2.9% EI); wholemeal bread (2.7% EI) and mixed bread (2.7% EI); and high-fat cheese (2.7% EI). Among the beverage subcategories, the top contributors to energy intake were fruit juice (1.8% EI) among non-alcoholic beverages and red wine (2.4% EI) and beer/cider (2.4% EI) among alcoholic beverages. However, water/sparkling water (513 g/d) and tea (425 g/d) were the top contributors to beverage volume per day. Overall, the categories of cereal and cereal products, followed by the sugar, preserves, cakes & confectionery and vegetables and potatoes were the top 3 categories contributing the most to the total energy intakes ( Table 1). Figure 1 shows the top food contributors to relevant macronutrients. Desserts, cakes and pastries were top contributors to total carbohydrate, free sugar, total fat and saturated fat intakes (Fig. 1, Supplemental Table S3). The bread was the top contributor to total carbohydrate (white, mixed) while wholemeal and mixed bread was the top contributor to fibre intake. Bananas & other fruits were major contributors to both total carbohydrates and fibre intakes. High fat cheese was a top contributor to total fat, saturated fat, and protein intake. Dairy fat spreads (e.g. butter) were top contributors to both total fat and saturated fat. Poultry and beef were the top sources of total protein intakes. Semi-skimmed milk was an important contributor to many micronutrients, including vitamin B12, calcium and potassium intakes (Supplemental Table S4). Overall, fruits and vegetables, as well as fruit juice made a substantial contribution to intakes of folate, vitamin C, and potassium. Beef, oily fish were also important contributors to vitamin B12, and beef alongside bread and cereals were important contributors to intakes of iron.

Sources of energy, saturated fat and free sugars by sex and education status
Total daily energy and alcohol intake were higher in men than in women, but the proportion of energy from macronutrients was similar ( Table 2). The top ten foods contributing the most to energy intake were similar in men and women (Fig. 2), with desserts, cakes and pastries ranking at the top. Among men, white bread and beer/cider ranked 2nd and 3rd; whereas in women bananas and other fruit and white pasta/ rice ranked 2nd and 3rd. Fruit juice was the top contributor to free sugars in women, while added sugars/preserves was the top contributor among men. For SFA intake, high fat cheese, desserts and cakes, and dairy-fat spreads provided the most SFA to the diets among men and women.
Total daily energy and macronutrient intakes were similar regardless of education (Table 2). Desserts, cakes and pastries and white bread were the top sources of energy intake in both education groups (Fig. 2). A few small differences were noted. Among people with higher education, white pasta/rice and higher fat cheese provided a higher proportion of energy intake, while those with lower education obtained a higher proportion of energy from biscuits, white bread, fried/roast potatoes and beer/cider. Fruit juice contributed more free sugars among people with higher education, whereas added sugars and preserves, sugars-sweetened beverages (SSB) and beer & cider provided more free sugars among those with lower education. Differences in SFA sources between education groups were usually small (< 0.6 g), but high fat cheese, dairy fats, desserts & cakes made a greater contribution to saturated fat in the diets of people with higher education, whereas biscuits, chocolate   Units for beverages are ml/day confectionery and beef provided slightly more saturated fat to the diet among those with lower education.

Discussion
We have developed a comprehensive food grouping system to help analyse the UK Biobank dietary data. We created 93 distinctive food groups belonging to 15 main food categories (13 food and 2 beverage categories). The proposed food groups were designed to broadly align with the UK NDNS food group system, but have been expanded to meet the needs of researchers with interests in diverse health outcomes (e.g. different types of red meat, animal vs plant sources, saturated fat, free sugars or fibre content). The top contributors to energy, saturated fat and free sugars were broadly similar by sex and level of education, reflecting small differences in nutrient intakes. Detailed measures of dietary data were collected for a large sample of the UK Biobank population using the Oxford WebQ which is a major strength. The development of a comprehensive food grouping system is rather complex due to the wide availability of products that people consume, and the wide diversity in food preparation and consumption habits. In addition, foods contain multiple sources and varying levels of important macro and micro-nutrients with  Fig. 1 Top ten food contributors to total carbohydrates, free sugars, fibre, total fat, saturated fat and total protein per capita among all UK Biobank participants Table 2 Average energy, macronutrient and micronutrient intakes by sex and education status among UK Biobank participants differential effects on health which were considered in this study. Although the Oxford WebQ was based on a fixed list of foods, it captures sufficient detail to separate foods by free sugar content, saturated fat or fibre, as well as by culinary use, processing or plant/animal origin, but these groupings could also be combined for specific analyses. An important limitation of using a fixed list of foods and beverages in the Oxford WebQ is that it can increase the likelihood of missing foods which are not on the list, although estimates of total energy intakes reported here were not notably low suggesting that it still captures dietary intake reliably as shown by comparison with recovery biomarkers and interviewer-administered 24 h recalls [6,14]. However, there are other sources of measurement error related to selfreporting of dietary intake. Firstly, each WebQ collected dietary intake over the previous 24 h which is not representative of usual intake. Here we included all participants who provided at least one dietary assessment, which will be affected by random error related to day-to-day variability, although this is usually not problematic when calculating population averages [15] which was the main aim of this study. However, it is important to use several 24-h dietary assessments when assessing diet-disease associations in future studies in this cohort in order to capture usual intakes. Systematic error related to over-and/or underreporting of dietary intake will likely affect our estimates. This bias can be introduced for example when participants forget to report their dietary intake (although this is less problematic when reporting diet over the previous 24 h), or deliberately under or over-report specific foods and beverages [15]. Regarding the representativeness of the sample, previous analyses have suggested that the participants completing more dietary assessments tended to be older and more educated compared to the general population of the UK Biobank, and this may have limited the ability to detect differences across education groups [6]. However, the direction of risk factor associations in the UK Biobank seem to be generalizable to the wider UK population [16].
In interpreting our findings it is important to consider the breadth of the food groups we describe; broader food groups are more likely to appear as top contributors to nutrient intakes and vice versa. Our approach was to create more food groups than will be necessary to answer most research questions, leaving it open to researchers to collapse categories to create larger groupings. For example, meats are reported as beef, pork, lamb etc. rather than "red meat", so that a reader interested in the contribution of total red meat to saturated fat would need to add the components together.
Overall the top food groups contributing to energy intake in this sample of British adults were consistent with the more disaggregated food groups reported by the UK NDNS [17]: desserts, cakes and pastries; white bread; white pasta/ rice; fruit; semi-skimmed milk; biscuits; fried/roast potatoes; wholemeal/mixed bread and high-fat cheese. Some of these foods are high in saturated fat and free sugars, contributing to excess intakes of these nutrients relative to dietary guidelines which are associated with ill-health, including obesity, diabetes and cardiovascular disease risk [18][19][20][21][22][23]. Some food groups (e.g. desserts) which are top contributors to energy, SFA, and free sugars are also important sources of calcium and iron (partly as a result of fortification), which will need to be replaced by other food sources of these important micronutrients if the overall diet quality is to be enhanced. Dairy products such as semi-skimmed milk and high fat cheese were the major contributors to total energy intakes as well as total fat, saturated fat, total protein, vitamin B12 and calcium. Most of the dietary fibre was obtained from wholemeal and mixed/granary breads, however white bread does not contribute much to the fibre intakes but was highly consumed (providing 3.4% EI overall compared to 2.7%EI of wholemeal and mixed/granary). This illustrates the potential for swapping refined grains with whole grains to make substantial progress towards dietary fibre recommendations which are associated with improved health outcomes [24,25]. Our analyses also highlight the large contribution of alcoholic drinks to total energy intake in this population, averaging approximately 7% EI from all alcoholic beverages, greatly exceeding the contribution of fruit juice (1.8% EI) or sugar-sweetened beverages (1.4% EI). In addition to increasing the risk of excess weight gain as a consequence of increased energy intake, this raises concerns about alcohol-related harms. Although meta-analyses of observational studies have found positive associations of moderate alcohol intake for the prevention of coronary heart disease [26], other studies have also found a higher risk of different types of stroke and other CVD subtypes [27], and several types of cancer [28], which support current guidance to limit consumption of alcoholic beverages. Among non-alcoholic beverages, tea and water contributed the most to the total daily beverage intake (gr). However, fruit juice and sugar-sweetened beverages were top sources of total energy intake from non-alcoholic beverages and contributed the most to free sugar intakes, which have been linked to increased risk of weight gain and increased cardiometabolic risk [18,[29][30][31][32].
We also found relatively few differences in intakes between people with different levels of education, which may reflect the limited variation in educational attainment in this cohort as well as a "healthy volunteer" bias related to the selection of participants who completed the dietary questionnaires [6]. The major differences found here were in beverage consumption and, as a result, people with lower levels of education consumed proportionally more free sugars, in the form of SSBs and beer/cider, whereas among people with higher levels of education fruit juice is the major contributor, which has previously been reported in NDNS [17]. There were also differences in the sources of saturated fat, with high-fat cheese being proportionally more important among people with higher education, compared to biscuits and chocolate confectionery in lower education groups. This is consistent with an analysis of purchases from 25,674 British households showing that dairy sources (mostly cheese) contributed more to SFA in higher socioeconomic groups [4]. These differences are small but may still reflect variability in dietary quality and contribute to inequalities in diet and health across SES groups which have been widely documented in the literature [4,33,34].
In conclusion, this work has developed a food grouping system which will be available to all studies using the Oxford WebQ, including the whole UK Biobank community. We encourage researchers to make use of these food groups in future studies to generate more consistent evidence which can inform food-based dietary guidelines or advice for the public to reduce health risks.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.