Assessment of physical activity and energy expenditure in epidemiological research of chronic diseases
- First Online:
- Cite this article as:
- Lagerros, Y.T. & Lagiou, P. Eur J Epidemiol (2007) 22: 353. doi:10.1007/s10654-007-9154-x
- 2.6k Views
Physical inactivity has emerged as an important risk factor for a number of diseases, but the typically crude exposure assessments in epidemiological studies, with entailing variation in measurement accuracy, may be a source of heterogeneity contributing to inconsistent results among studies. Consequently, the choice of method for the assessment of physical activity in epidemiological studies is important. Good methods increase our chances of avoiding misclassification and may enhance our understanding of the association between physical activity and health. Since physical activity is also a potential confounder of other lifestyle-health relationships, good methods may enhance our ability to control for confounding. But despite a steadily increasing selection of methods to choose from, no method is suitable for every situation and every population. Although the questionnaire is the most widely used method in epidemiological studies, and laboratory methods are mainly used for validation purposes, improved technology may change our ways of assessing physical activity in the future. This paper describes different methods to measure physical activity and energy expenditure from the epidemiological perspective, and attempts to address the concepts related to the measurement of physical activity.
KeywordsEnergy metabolismEpidemiologic methodsExerciseLeisure activitiesQuestionnairesReview
Industrial progress and increased employment in the service sector generally have significantly reduced occupational physical activity, while at the same time modern technology has also made it increasingly feasible to remain sedentary. Many people lead a life with little or no physical activity and their leisure time is often spent on sedentary activities such as surfing the internet, playing computer games, and watching television. Simultaneously, society is facing new patterns of illness stemming from the combined effects of expanded longevity, reduced physical activity, and body measures. While it is clear that physical activity has a far reaching influence on health, many questions remain to be answered. Several of these questions could be better addressed if the many aspects of physical activity were assessed more specifically and with higher validity and precision.
Some of the earlier epidemiologic studies on the impact of physical activity on cardiovascular health were done in the 1950s by Morris et al. . Since the sixties, there has been an explosion of studies focusing on the topic of physical activity and health. Despite the fact that physical activity has gained increasing attention, the lack of practical, valid, reliable, and sensitive instruments for assessing physical activity has been a limiting factor in this important area of research. Better methods for exposure quantification cannot only reveal the link between exposure and disease, but also allow the study of the exposure as a possible confounder or effect modifier. In order to investigate such implications of physical activity, there is a wide variety of methods of different complexity to choose from. This article will describe available methods to measure physical activity or energy expenditure (EE), seen from an epidemiological perspective with respect to validity and practicability. Furthermore, it attempts to review some of the basic concepts in the field of physical activity assessment—interdisciplinary concepts just as useful in epidemiology as in physiology.
Physical activity is a multidimensional and complex exposure to measure. Research on physical activity is done in a wide range of disciplines and from different perspectives, which sometimes results in conceptual confusion.
In physics the standard unit of energy is joule (J), but in the world of energy metabolism, the unit calorie (or kilocalorie, kcal, which is equivalent to 1,000 calories) is still commonly used. One calorie corresponds to ∼4.19 J.
Resting metabolic rate have been extensively studied and is fairly constant within and between persons. Besides age-related changes, it only varies 5–10% in adult life, and comparisons made within age, sex and weight groups show that 85% of individuals have a resting metabolic rate within 10% of the mean . It is physical activity that mostly affects the variation of total EE. As indicated, physical activity, and EE are not synonymous (Fig. 1), but many researchers extrapolate measures of physical activity to units of EE before analyzing their studies .
Optimally, epidemiological studies should identify all body movements and obtain information on dose—intensity, duration, and frequency—and occasionally even purpose of movement. With information on all these dimensions, comparison of results across studies would be more informative. Subtle dose-response relationships could be discovered and epidemiologists would be able to supply public health professionals with more useful evidence. Results from studies inquiring solely about one dimension cannot easily be converted to public health recommendations.
Obtaining absolute intensity from questionnaires is done by assigning each activity a specific MET-value, obtained from reference lists [8, 9]. Thus, for an individual with a weight of 60 kg, snow shoveling by hand is estimated to correspond to 6 METs, or requires 360 kcal/h or 1,260 ml oxygen/min. This does not take into account differences in age, sex, efficiency of the shoveling, geographic and environmental conditions in which the activity is performed, but it provides a classification system that standardizes measurement of physical activity in survey research .
When physical activity questionnaires give the respondent a few choices in terms of intensity (such as no, low, moderate, high, and vigorous) one overlooks the potential problem that the perception of intensity is highly dependent on age, gender, and fitness, as well as on duration, which itself represents an independent dimension. Only in a homogenous sample might the relative and absolute intensity be similar .
Some questionnaires ask the respondent to report frequency and duration of activities where physiological parameters such as induced sweating [12–16], increased heart rate and/or breathlessness [12, 17–19] mark the intensity. However, the physiological response for a specific intensity is still likely to be greater for an unfit or older individual with lower cardiorespiratory fitness and less muscle mass .
Expressing intensity relative to peak ability is an alternative method for characterizing physical activities. For example, The American College of Sports Medicine recommends intensity equal to 60–90% of one’s maximum heart rate or 50–85% of one’s maximum oxygen uptake for 20–60 min, 3–5 days a week, to develop and maintain cardiorespiratory fitness, body composition and muscle strength . Intensity relative to peak ability is also common in resistance training, in which the key is repetition of maximal contraction force for a given muscle group . Nevertheless, the use of absolute—as contrasted to relative—intensity is favored in epidemiological studies because it is free from the individual’s subjective view of effort.
The response alternatives for duration are usually given as interval options with minutes or hours per day or per week. Duration is a challenge to measure as people engage constantly in physical activity or inactivity, from sleeping or working for hours, to short bursts of muscle contraction lifting something or tapping fingers. To further complicate things, intermittent activity, undertaken in short sessions (≤10 min), has been shown to improve cardiorespiratory fitness to the same degree as an activity of the same intensity undertaken in a longer session for the same total amount of time as the intermittent activity [22–24]. It is reasonable to believe that there is a similar association between intermittent activity and other health outcomes as well. Short bouts of activity, such as walking up the stairs instead of taking the elevator and playing with children are important; small changes that increase daily physical activity may lead to substantial health benefits. Thus, the optimal method to measure physical activity should be sensitive to all achievements—even small ones with short durations.
With what regularity is a certain activity performed? This can be expressed as number of times a day, a week or a month. In countries where seasons vary greatly and therefore the possibility to participate in various outdoor activities, weather can become a barrier for physical activity. A number of studies have shown a relationship between seasonality and frequency of physical activities [25–29]. As weekly leisure time EE can be higher in spring and summer, seasonality should be considered when planning a study or an intervention.
Since it is possible to trade intensity for duration  and intensity for frequency  (resulting in the same EE) to achieve a level of health-enhancing physical activity, the primary interest in large epidemiologic studies is often the overall score. This is based on intensity, duration, and frequency of all activities, with consolidation of, rather than distinction between, occupational, leisure, and household activity.
Methods to assess physical activity and energy expenditure
identification of causal associations between physical activity and health outcomes,
description and quantification of the dose-response relationships between physical activity and health outcomes;
documentation of changes and differences in physical activity within and between individuals, respectively, over time;
formulation of public health recommendations;
validation of intervention programs;
comparison of physical activity levels between populations, particularly when cultural and language differences exist between these populations;
measurement of physical activity in children and other groups of individuals who have a limited capacity for accurate self-appraisal.
Validity, which is high when the instrument measures what it is supposed to measure. Preferably, validity of a method is assessed by comparing it to a “gold standard.” If this is not doable, “relative validity” can be assessed by comparison with a high quality method .
Reliability, which is high when the instrument generates the same measurement each time it is used under the same conditions. Reliability is a necessary, but not a sufficient, condition for validity.
Practicability, which takes into account the time and the cost involved for both the investigator and the respondent, but also the risk of altered behavior due to, for example, cumbersome equipment.
Some methods used to measure physical activity and energy expenditure
Doubly labeled water
Less prone to information bias
Labor intensive, time consuming and costly
Heart rate monitoring
Useful for validation of self-report methods
Equipment may affect behavior
Will generally not differentiate between intensity, frequency and duration
Psychophysical rating scales
Easy to administer and low cost, if self administered
If interviewer-administered, can be time consuming and high cost
Appropriate for general population and specific target groups
May influence physical activity
Can answer specific research questions
Susceptible to information bias
Possible to differentiate between intensity, frequency and duration
Susceptible to misclassification
Methods based on biological and physiological approaches (e.g., heart rate monitoring, accelerometry and doubly labeled water) generally require some type of monitoring and are thus harder to apply in large population studies than self-report assessments. Typically these methods have been restricted to relatively small sample sizes. However, with the rapid advancements in technology, some of these methods, such as heart rate monitoring and accelerometry are currently used in larger studies. In epidemiology, these methods are mainly used to validate other physical activity assessment methods.
Physiological approaches to measure energy expenditure
Since more than 95% of the energy expended by the body is derived from the reaction of oxygen with nutrients , an individual’s metabolic rate can be calculated once the volume per time oxygen (VO2) is known. Different approaches to assess VO2 to estimate EE as a proxy of physical activity are presented below.
Doubly labeled water: This method is a form of indirect calorimetry and is frequently considered the gold standard to estimate total EE. As indicated by the name, doubly labeled water is a water-based method, using the stable isotopes 2H2O and H218O, which are consumed by the study subject . The isotopes are distributed evenly throughout the body , and are gradually secreted in the subject’s urine. Depending on the isotope dose and excretion rate, the latter of which is dictated by the subject’s activity level and the environmental temperature, the urine collection period usually spans between 1 and 2 weeks. The rates at which the isotopes are eliminated from the body are proportionate to the degree of metabolic CO2 (VCO2) production. Thus, oxygen uptake (VO2) and, accordingly, total EE can be calculated for the study period from the difference in the elimination rates of the isotopes [4, 7, 32]. This method is safe, precise, and non-invasive, and can, for example, be used in children [33, 34] and pregnant women . Doubly labeled water is particularly useful for the assessment of total EE in free-living conditions, as no monitors are worn, and is thus particularly appealing for use in children. This method, being free from information bias and giving an exact measure of EE, could conceivably be a perfect gold standard. However, the isotopes and the measurement methods are expensive, and collection of complete urine samples at the appropriate times following dosing is essential for the method to succeed. Furthermore, the highly self-selected sample of persons who are willing to collect urine samples for weeks, may be quite different from the typical study population. Lastly, the method cannot be used to differentiate between intensity, duration and frequency of specific activities. Therefore, doubly labeled water is rarely used in large studies, but merely for validation of other methods more commonly used in epidemiology.
Indirect calorimetry: The participant wears a mask and carries the equipment needed for analyzing the expired air to measure VO2 . Wearing the equipment is likely to affect the physical activity of the carrier (the so called Hawthorne effect). Furthermore, the method is cumbersome and expensive and thus not appropriate for use in epidemiology.
Heart rate monitors: Although a strong linear relationship between heart rate and VO2 exists at higher levels of EE , this method is less precise for assessing EE at low intensities. Furthermore, other factors such as emotional stress, body temperature and medication also influence heart rate . Despite this, heart rate monitoring has worked well in medium sized epidemiological studies [36, 37].
Ventilometry: The close relationship between ventilation and VO2 has led to the development of devices to measure ventilatory response to physical activity , but these methods are yet to prove applicable in large-scale studies.
Cardiorespiratory fitness: The ability of the cardiovascular and respiratory systems to supply oxygen to the working muscles  is determined by exercise tests and correlates highly with maximal VO2. Cardiorespiratory fitness is sometimes used as a measure of physical activity in epidemiological studies [39, 40]. However, although fitness and total physical activity are correlated , they also have independent components [42, 43]. Fitness is a complex entity influenced by age, gender, and other habits; moreover, genetics play an important role in how well physical fitness responds to training [32, 44].
Calorimetry: Since almost all energy released by metabolism is converted to heat  this can be used to calculate EE. Direct calorimetry is based on this principle. Body temperature can also be used to calculate the EE of activity, but it is inconvenient—steady-state takes time  which makes it unfeasible for all but experimental studies.
The word pedometer is Greek and means “foot measurement”—as it measures the distance traveled by foot. The pedometer, usually clipped to a belt or worn around the ankle, counts steps in response to the force generated by the body’s mass connecting with the ground via the foot. It measures walking-related activity, but the length of a step varies with setting and different brands seem to detect steps differently . Furthermore, ordinary life often involves more than walking on a flat surface.
Accelerometers measure movement (acceleration and deceleration) in one (vertical), two (vertical, and medio-lateral), or three (vertical, medio-lateral, and antero-posterior) planes [46–48]. Intensity, duration and frequency can be assessed. However, some activities do not involve variations in acceleration. Isometric muscle contraction or muscular work against some external force, such as weight lifting, carrying and pushing, or activities like uphill walking, skating or rowing, are not detected well via accelerometry . Thus, physical activity is likely to be underestimated using accelerometry, if activities of the nature described above are common.
Recently other portable devices to measure physical activity have been developed. Among them are a combined heart rate recorder and movement sensor  and a device for analyzing body motion and posture changes resulting in a detailed record of performed activity . This is a promising area of research. The development of lightweight monitors with small computers that can store large amounts of data will probably make these methods to estimate physical activity more available to epidemiologists in the future. So far, for larger epidemiological studies, the cost of the monitors (between several hundred and several thousand Euros), the Hawthorne effect, and the problem with compliance may be reasons why these methods are not yet in common use.
The labor-intensive method of watching and recording a person’s activities is quite straightforward, but not the method of choice in large studies. Studies that base their physical activity estimate on occupation as assessed by an external observer resemble the method of behavioral observation.
More than 10 years ago the American College of Sports Medicine’s journal devoted an entire supplement to more than 30 different instruments for self-reported physical activity . With the growing interest in physical activity, new instruments continuously appear—most likely due to the fact that physical activity is a complex exposure to measure and no instrument is adequate for every situation and every population.
Psychophysical rating scales
The subjective perception of exertion has thoroughly been studied by Borg [53, 54], who has developed internationally popular scales for the evaluation and monitoring of exercise intensity. The RPE scale is a scale of ratings (R) for perceived (P) exertion (E). The scale, with steps from 6 to 20, is linearly associated with exercise intensity and heart rate (from 60 to 200 per min) during exercise on a bicycle ergometer . The category ratio scale (CR-10) is anchored at the top category of “maximal exertion.” Thus, two individuals working at their maximal working capacities will experience the same degree of exertion although their physical outputs may be different . Based on this, other categories represent equivalent locations with respect to maximum exertion. The scales measure the subject’s perceived “effort sense,” which is a type of intensity, but one that is relative to the subject’s fitness level. Even if relative intensity seldom is the focus of epidemiological studies, the CR-10 scale has been used as a complement to physical activity survey questions for estimating the degree of effort when exercising . Nonetheless, these scales are primarily used to measure subjective physical strain/fatigue on an ordinal scale when self-rating concurrent work load. They are seldomly used in epidemiological studies. One important reason may be that the scale steps in the Borg scale are represented by numbers and verbal expressions that lack an intuitive meaning when presented as response alternatives in epidemiological studies.
Physical activity records
Physical activity records are based on the diary idea—the study participant is asked to keep a record of the different types of activities undertaken, and the time spent doing each of them during a specific time period [7, 32]. The record is then processed using coding schemes which classify each activity by, for example, rate of EE or MET value . This method can detail all activities undertaken, but it is cumbersome and it takes time for the study subject to keep the diary and the researcher to decode the entries. The recording process may in itself produce changes in physical activity patterns during the time of recording. Thus, it is not the method of choice for large epidemiological studies, but it is a useful method for validation studies.
Physical activity logs
As with the physical activity record, the study participant is asked to report the time spent doing different types of activity during a given time period. Typically, physical activity logs provide a list of specific activities to choose from . The list facilitates the journal keeping for the study participant and the data processing for the researcher. There is a risk of losing important information as such a list can never be complete. In particular, low intensity activities, such as routine light activity, household chores and spontaneous activity, tend to be underrepresented in physical activity logs. By truncating the lower end of the continuum of physical activity, the instrument could suffer from “floor effects” as the sedentary population would be misclassified . This method is better suited to answer a specific question by the researcher, such as the participation rate in an exercise training program . This method may also influence the participant’s physical activity pattern—just like the diary.
The recall method, contrary to records and logs, runs a lower risk of affecting the patterns being measured. The study participant is asked to recall past activity, usually in an interview, in person or by phone . The time frame could be anywhere between 24 h, a week, a year or a life time. Skilled interviewers can obtain a good estimate of recalled activity by cueing, i.e., using questions that enhance memory capacity, and by taking a retrospective look back to allow the participant to search his or her memory for activities s/he may have forgotten to mention [52, 56]. The disadvantage of the recall method is the time and the cost of educating the interviewers, calling the study participants and coding the data.
Compared to other instruments, questionnaires are easy to distribute and administer, non-reactive and not requiring a lot of motivation or time from the study participant. With a decreased investment on time and money compared to many other methods, questionnaires allow the collection of information on physical activity and other factors from a greater number of study subjects. Hence, this is the method of choice in large epidemiological studies. There are many different physical activity instruments developed for questionnaires—all with different strengths and weaknesses. By and large questionnaires can be characterized as global, single-item or comprehensive questionnaires .
Global close-ended multiple choice questions ask the respondent to rate their relative level of physical activity or fitness compared to others of the same age and gender. These self-reports are simple and short and used in a variety of studies, often in combination with other questions [13, 15, 57–60]. Validity has been assessed against other measures of leisure time physical activity or fitness [13, 61–63]. This type of questionnaire gives a measure on a scale relative to peers. Conceivably the same self-rated answer stands for different levels of activity depending on culture, or even the social context in which friends are made—on the soccer field or in the chat rooms of internet.
Single-item questions lack the ability to capture all daily activities, but give a quick estimate of some components of physical activity. Participants could, for example, rank their overall level of physical activity on a 5-point scale  or rate frequency of leisure-time vigorous activity with a duration of at least 20 min . They could rate time spent sitting during leisure time or the time spent sitting, standing/walking, etc., on a working day [66, 67]. The frequency of activities requiring light or vigorous effort, or the question “For how many hours per week, on average, do you engage in activity strenuous enough to build up a sweat?” are examples of single-item questions [14, 16].
While global and single-item questionnaires give a direct approximation of the respondent’s physical activity level, comprehensive questionnaires request more in depth information. Some give an extensive list of activities and ask participants to indicate the duration and frequency of the activities in which they participate, thereby enabling the calculation of energy output. These questionnaires are often modeled after the Minnesota Leisure Time Physical Activity Questionnaire published in 1978. The questionnaire consists of 63 sports, recreational, yard, and household activities and was originally created for an interview . It has been validated in different countries and with different methods and shown to be a valid method for measuring leisure time activity [69–71].
Inactivity being a growing global health concern has resulted in an increased interest in instruments that can be used internationally and for population surveillance. The International Physical Activity Questionnaire (IPAQ) was developed by a multinational working group and tested for reliability and validity in a multicountry approach [72, 73]. With a short format (time in sedentary, moderate, and vigorous intensity activity and time spent walking) and a long format (covering leisure, work, household, yard, and sedentary activity, as well as self-powered transport) it can be used for both self-administration and telephone interviewing. Reliability has been shown to be high, validity acceptable, but as seen in other validation studies of other instruments [74, 75], over-reporters tend to have a lower educational level . This raises the concern that, by attempting to get a more detailed picture of the different dimensions of physical activity, one may increase the risk of misclassification due to misinterpretation of the questions.
Numerous questionnaires have been developed aiming at different target groups, inquiring about different aspects of physical activity at different time periods. Many epidemiological questionnaires have focused on either occupational or leisure-time activities (often with emphasis on premeditated exercise) rather than assessing total physical activity, which also includes unstructured activities of daily living. However, a full-time employed worker spends no more than 20% of the total time in a year at work. Similarly, the time devoted to organized exercise during leisure time typically constitutes only a few percent of the total time, and it tends to be over-estimated . Further, voluntary health-related endeavors like structured sports may be associated with other unmeasured health-promoting behaviors that possibly confound the observed health effects. Thus, optimally all types of physical activity should be of interest.
Physical activity is one of the most important modifiable factors that determine risk of chronic morbidity and mortality, but important questions remain, such as type and amount of activity required for a protective effect, as well as whether there are critical time periods when physical activity is more important. To accurately measure physical activity is of great importance. If EE is the key exposure measure, methods based on biological, and physiological approaches are required, but the expense and inconvenience have so far made most of these methods unfeasible in large studies among free-living individuals. Instruments for valid self-estimation of total EE, identifying frequency, duration and intensity of physical activity, are likely to enhance our understanding of the complex associations between physical activity, dietary energy intake, body measures, and disease risk. Errors in the estimation of physical activity in epidemiological studies are indisputably substantial, but better technology and innovative ways can improve precision, which may in turn improve our understanding of possible mechanisms through which physical activity impacts different biologic systems.
The authors would like to thank Dr. Paul W. Franks, Dr. Mikael Fogelholm, Prof. Olof Nyrén, and Prof. Hans-Olov Adami who provided helpful comments to the manuscript.