Phys. Ther. Korea 2021; 28(3): 177-185
Published online August 20, 2021
https://doi.org/10.12674/ptk.2021.28.3.177
© Korean Research Society of Physical Therapy
Seo-hyun Kim1 , PT, BPT, Chung-hwi Yi2
, PT, PhD, Jin-seok Lim1
, PT, BPT
1Department of Physical Therapy, The Graduate School, Yonsei University, 2Department of Physical Therapy, College of Software and Digital Healthcare Convergence, Yonsei University, Wonju, Korea
Correspondence to: Chung-hwi Yi
E-mail: pteagle@yonsei.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: Muscle undergoes change continuously with aging. Sarcopenia, in which muscle mass decrease with aging, is associated with various diseases, the risk of falling, and the deterioration of quality of life. Obesity and sarcopenia also have a synergy effect on the disease of the older adults.
Objects: This study examined the risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity and developed prediction models.
Methods: This machine-learning study used the 2008–2011 Korea National Health and Nutrition Examination Surveys in the analysis. After data curation, 5,563 older participants were selected, of whom 1,169 had sarcopenia, 538 had sarcopenic obesity, and 631 had sarcopenia without obesity; the remaining 4,394 were normal. Decision tree and random forest models were used to identify risk factors.
Results: The risk factors for sarcopenia chosen by both methods were body mass index (BMI) and duration of moderate physical activity; those for sarcopenic obesity were sex, BMI, and duration of moderate physical activity; and those for sarcopenia without obesity were BMI and sex. The areas under the receiver operating characteristic curves of all prediction models exceeded 0.75. BMI could predict sarcopenia-related disease.
Conclusion: Risk factors for sarcopenia-related diseases should be identified and programs for sarcopenia-related disease prevention should be developed. Data-mining research using population data should be conducted to enhance the effectiveness of early treatment for people with sarcopenia-related diseases through predictive models.
Keywords: Aging, Body mass index,Exercise, Machine learning, Sarcopenia
Human body composition continuously changes with age: body fat increases and muscle mass decreases, while body weight remains unchanged [1,2]. The loss of muscle mass is related to muscle strength (the force a muscle produces) and endurance (the ability of the muscle to contract continuously) at submaximal levels [3]. Furthermore, the motor units innervated in muscles change with age. A reduction in strength per motor unit is indicative of a decline in muscle quality [4]. The changes in muscle contribute to increased body fat because the loss of muscle mass reduces energy expenditures. Thus, changes in body composition lead to age-related impairment and disabilities [5]. In 1989, Rosenberg called this reduction of muscle mass with age “sarcopenia” [6].
Falls and fall-related injuries are common in older adults; the numbers of both increase exponentially with age [7]. The main causes of falls are old age, fear of falling, reduced balance, and impaired cognition and mobility [8,9]. One recently identified risk factor is sarcopenia [10,11], which plays a major role in the frailty and dysfunction of older adults; notably, it can lower their quality of life and affect mortality [12].
The increase in body fat with aging and decrease in physical activity due to sarcopenia are important risk factors for obesity; reduction in physical activity can lead to further loss of muscle mass [13]. Obesity is important in the development of metabolic syndrome and cardiovascular disease [14]. Therefore, the combination of sarcopenia and obesity (i.e., sarcopenic obesity) in older adults has synergistic effects on physical disability, metabolic disorders, cardiovascular disease, and mortality [15]. Older adults with sarcopenic obesity have higher prevalences of coronary heart and cardiovascular diseases, and higher mortality rates due to these diseases, compared with non-sarcopenic older adults [16]. Therefore, it is necessary to prevent sarcopenia and sarcopenic obesity by identifying the risk factors of both.
Sarcopenia and sarcopenic obesity are affected by various factors, such as age, nutritional imbalance, hormones, metabolism, immunological factors, and physical inactivity [7,15]. One large study examined the role of nutritional factors in sarcopenia [17]. The relationships among sarcopenia, sarcopenic obesity, and prevalences of certain diseases have been investigated [14,16,18]. Since various risk factors are interrelated, it is difficult to review them using statistical methods. Moreover, developing cost-effective risk prediction models in clinical settings is challenging. However, no machine-learning studies have yet examined the major risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity.
Therefore, this study examined risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity by means of machine learning; it also evaluated prediction models for these conditions.
Data from the 2008 to 2011 Korea National Health and Nutrition Examination Surveys (KNHANES) were reviewed. KNHANES assesses Koreans using a clustered, multistage, stratified, rolling sample; the assessment consists of a medical examination, health questionnaire, and nutrition survey. More than 800 items are surveyed each year. All assessments are conducted by professional survey teams consisting of nurses, nutritionists, and health researchers. Informed consent was obtained from each participant when the 2008–2011 KNHANES were conducted.
The data used in this study were collected from 37,753 KNHANES participants. However, participants were excluded if they were under 60 years of age, if they had not provided data regarding skeletal muscle mass, or if they had missing values. After exclusion, this study enrolled 5,563 participants. The participants’ characteristics are summarized in Table 1.
Table 1 . Participant characteristics.
Characteristics | Normal (n = 4,394) | Sar (n = 1,169) | SO (n = 538) | SnO (n = 631) |
---|---|---|---|---|
Sex (male/female) | 1,874/2,520 | 519/650 | 48/490 | 471/160 |
Age (y) | 68.62 (5.85) | 71.18 (6.15) | 70.84 (6.20) | 71.48 (6.10) |
Height (cm) | 157.82 (8.93) | 156.74 (8.82) | 152.33 (6.63) | 160.51 (8.72) |
Weight (kg) | 61.36 (9.65) | 51.56 (7.48) | 52.23 (6.47) | 50.98 (8.21) |
BMI (kg/m2) | 24.58 (2.94) | 20.97 (2.47) | 22.48 (2.07) | 19.69 (2.02) |
Body fat (%) | 29.25 (7.82) | 28.78 (8.60) | 36.54 (4.16) | 22.16 (5.22) |
Values are presented as number only or mean (standard deviation). Sar, sarcopenia; SO, sarcopenic obesity; SnO, sarcopenia without obesity; BMI, body mass index..
The criteria for sarcopenia followed the approach of the 1998 New Mexico Elder Health Survey. Whole-body dual x-ray absorptiometry (QDR 4500A; Hologic, Waltham, MA, USA) was used to measure body lean mass and the appendicular skeletal muscle mass (ASM) was defined the sum of the lean masses of the legs and arms. The skeletal muscle mass index was calculated using the formula ASM (kg)/square of height (m2). Sarcopenia was defined in accordance with Asia Working Group for Sarcopenia criteria (<5.4 kg/m2 in women and <7.0 kg/m2 in men).
The criterion selected for determination of obesity was a body fat ratio of 30%, because the body mass index (BMI) reflects both muscle and body fat [19]. The sarcopenia and normal groups were classified based on skeletal muscle mass index, with 1,169 and 4,394 participants, respectively. The sarcopenic obesity group comprised 538 participants with sarcopenia and a body fat ratio ≥ 30%; the sarcopenic without obesity group comprised 631 participants with sarcopenia and a body fat ratio < 30% (Figure 1).
Data were collected regarding 865 variables for 5,563 KNHANES participants. Variables for which more than 300 individuals did not respond were excluded; participants who did not respond to the remaining 345 variables were also excluded. The variables removed are ophthalmologic disease, dental disease, childhood-related variables and the variables included in this study are summarized in Table 2.
Table 2 . Survey variables used in machine learning.
Survey classification | Survey items | |
---|---|---|
Health survey | General information | Sex, age |
Education | Academic background | |
Economic information | Economic status, reasons for non-employment, employment form, position in the profession, occupation, the longest occupation | |
Quality of life | Subjective health recognition, EQ-5D (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) | |
Morbidity | Affected disease in the last 2 weeks, morbidity by 38 chronic diseases | |
Activity restrictions | Whether activity is restricted, reason for limiting activity | |
Smoking | Lifelong smoking, currently smoking, past smoking, no smoking, second-hand smoke | |
Drinking | Lifelong drinking, drinking start age, drinking frequency, alcohol consumption, binge drinking frequency | |
Physical activity | Vigorous physical activity, moderate physical activity, walking practice, strength training, flexibility exercise | |
Mental health | Sleep time, stress perception, depressive symptoms, suicide, mental problem counseling | |
Obesity and weight control | Subjective body recognition, weight control, weight control | |
Examination survey | Body measurement | Height, weight, waist circumference |
Blood pressure and pulse | Systolic blood pressure, diastolic blood pressure, pulse rate | |
Respiratory examination | Cough, phlegm, chest pain, difficulty breathing, fever |
During the data processing, negative responses (defined as 0 [none] and 88 or 888 [not applicable]) were unified as 0. To confirm risk factors and make prediction models, the following groups were compared: (1) sarcopenia and normal, (2) sarcopenic obesity and normal, and (3) sarcopenia without obesity and normal. Participant numbers in each group were matched by using under sampling.
2) Decision treeA decision tree is an analysis method that charts decision rules and categorizes groups of interest into several groups or makes predictions. Popular criteria are the Gini and entropy indexes [20]. This study used the classification and regression trees algorithm to perform separation using the Gini index. A classification tree selects the predictor variables that maximize the reduction in impurity (minimizing the Gini index for categorical variables and the expected sum variances for continuous variables as a node impurity criterion).
3) Random forestRandom forest is an ensemble technique that generates many classification trees by randomly selecting subsets of the given data and of the predictor variables, integrating the results of all models to obtain a random forest [20]. The top seven were selected as risk factors by calculating the mean decrease in accuracy and mean decrease in the Gini index through a random forest. Variables in the top seven were excluded if the combined score of the mean decrease in accuracy and mean decrease in Gini index exceeded one standard deviation from the mean.
4) Model evaluationThe accuracy, sensitivity, and specificity of six models (i.e., the decision tree and random forest models) were confirmed for each type of sarcopenia. Sensitivity is the probability of accurately identifying people with a disease, while specificity is the probability of accurately identifying people without disease. The probability of classification prediction for each model’s test set was assessed with receiver operating characteristic (ROC) curve analysis to determine the reliability of the final model. The ROC curve is a graph that shows the trade-off between sensitivity and (1–specificity) across a series of cutoff points. The area under the ROC curve (AUC) is considered an effective measure of the classification performance of the model.
Statistical analyses were performed using RStudio (ver. 1.2.5042; RStudio, Boston, MA, USA). The dataset was split into a 75% training set and a 25% test set for each comparative analysis. Decision tree and random forest models have good prediction capabilities, but have been criticized for overfitting. Therefore, 5-fold cross validation was used to evaluate the performances of the decision tree and random forest models.
The risk factors selected for sarcopenia using the decision tree method were BMI, duration of moderate physical activity, and EuroQol – 5 dimension descriptive system (EQ-5D) (Figure 2). Moderate physical activity is defined as “a little more difficult than usual or causing some breathlessness”. For 1 week, participants recorded the duration of moderate physical activity. EQ-5D is the preferred preference-based measure for health technology evaluation, with values ranging from 0 (death) to 1 (perfect health). The risk factors selected for sarcopenia using the random forest method were BMI, waist circumference, age, subjective body recognition (very thin, thin, moderate, slightly plump, and very fat), duration of moderate physical activity, and weight control for 1 year (weight loss, weight retention, weight gain, and none).
The sarcopenia decision tree model had an accuracy (95% confidence interval [CI]) of 76.2% (74.6–77.8), sensitivity (95% CI) of 77.6% (74.5–80.7), specificity (95% CI) of 74.8% (72.3–77.3), and AUC of 0.81. The sarcopenia random forest model had an accuracy (95% CI) of 75.4% (73.9–76.8), sensitivity (95% CI) of 72.2% (68.9–75.5), specificity (95% CI) of 78.6% (77.1–80.1), and AUC of 0.83.
The risk factors selected for sarcopenic obesity using the decision tree method were sex, BMI, duration of moderate physical activity, and occupational classification (Figure 3). Occupational classification is defined as “a position in one’s profession”. 1: administrator, experts; 2: office worker; 3: service and sales representatives; 4: agriculture, fisheries, and proficiency worker; 5: functional personnel, device and machine operation and assembly personnel; 6: simple labor worker; 7: unemployed (housewives, students, etc.). The risk factors selected for sarcopenic obesity using the random forest method were BMI, waist circumference, sex, reasons for non-employment (unnecessary, studying, retirement, health problems, searching for a job, and parenting), and occupation (manager, expert, office worker, service worker, agriculture and fisheries worker, mechanic, simple laborer, and military), sleep time, and duration of moderate physical activity.
The sarcopenic obesity decision tree model had an accuracy (95% CI) of 73.6% (72.3–74.9), sensitivity (95% CI) of 76.8% (72.9–81.1), specificity (95% CI) of 70.6% (66–75.2), and AUC of 0.78. The random forest model of sarcopenic obesity had an accuracy (95% CI) of 74% (72–76), sensitivity (95% CI) of 68% (65.2–70.8), specificity (95% CI) of 79.8% (75.6–84), and AUC of 0.80.
The risk factors selected for sarcopenia without obesity using the decision tree method were BMI, sex, and duration of moderate physical activity (Figure 4). The risk factors selected for sarcopenia without obesity using the random forest method were BMI, waist circumference, sex, subjective body recognition, and weight control for 1 year.
The sarcopenia without obesity decision tree model had an accuracy (95% CI) of 87.8% (86.7–88.9), sensitivity (95% CI) of 85.4% (84–86.7), specificity (95% CI) of 89.8% (87.5–92.1), and AUC of 0.9. The sarcopenia without obesity random forest model had an accuracy (95% CI) of 87.6% (86.6–88.6), sensitivity (95% CI) of 83.4% (82.4–84.4), specificity (95% CI) of 92% (90.4–93.6), and AUC of 0.93. The ROC curves for all models are shown in Figure 5.
In this study, we developed prediction models based on cross-sectional data to identify risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity, using decision tree and random forest models.
Machine learning has often been used to predict diseases and verify their risk factors. The decision tree classification model is a preferred method because it allows researchers to confirm the resulting model and cutoff values, thus justifying the selection of risk factors. A random forest creates multiple classification trees, each of which is trained on bootstrap samples of the original training data and determines the segmentation by searching for a subset of randomly selected input variables. Random forest is a popular method because it creates several classification trees consisting of randomly selected factors and aggregates them to identify risk factors; all factors can be considered.
The overall risk factors for sarcopenia resulting from the decision tree and random forest models are BMI and duration of moderate physical activity. BMI is associated with weight and predicts skeletal muscle mass [21]. A previous study found that BMI was an important risk factor in logistic regression, support vector machine, gradient boosting, and random forest models [17]. We found that BMI was an important predictor of sarcopenia, consistent with previous studies.
Physical activity is an important risk factor for sarcopenia [7,22]. Older adults have a greater reduction in muscle mass than younger adults when physical activity is reduced by the same amount, especially with respect to fast-twitch type Ⅱ muscle fibers, which are faster and contract more strongly than type Ⅰ muscle fibers [23]. This difference greatly influences the effects of sarcopenia in older adults; continuous physical activity is needed to stimulate muscles for healthy aging.
The overall risk factors of sarcopenic obesity determined by the two models were BMI, sex, and duration of moderate physical activity. Notably, BMI and duration of moderate physical activity were also risk factors for sarcopenia. While the other models chose BMI as the most important predictor, the decision tree model of sarcopenic obesity chose sex as the most important predictor; male sex had 82% accuracy for the normal group. This was similar to the results of a study that indicated a high prevalence of sarcopenic obesity in menopausal women due to an age-related reduction in sex hormones [24].
The risk factors for sarcopenia without obesity from the two models were BMI and sex. The prediction models of sarcopenia without obesity had fewer risk factors, but performed better than the other models. The AUCs of the decision tree and random forest models performed well, with values of 0.9 and 0.93, respectively; thus, they are valuable for assessing whether older adults exhibit sarcopenia without obesity.
The accuracy, sensitivity, specificity, and AUC of each model were determined. The prediction models of sarcopenia had acceptable accuracy (decision tree, 76.2%; random forest, 75.4%), sensitivity (77.6% and 72.2%), and specificity (74.8% and 78.6%). The prediction models of sarcopenic obesity exhibited the worst performance, with acceptable accuracy (73.6% and 76.8%), sensitivity (76.8% and 68%), and specificity (70.6% and 79.8%). The sarcopenia without obesity prediction models exhibited the best performance, with high accuracy (87.8% and 87.6%), sensitivity (85.4% and 83.4%), and specificity (89.8% and 92%). Given that X-ray absorptiometry is essential for diagnosis of sarcopenia, our prediction models may be very useful because they use readily available variables.
Through machine learning, this study examined the major risk factors of sarcopenia, sarcopenic obesity, sarcopenia without obesity, and even completed the evaluation by creating a prediction model. A previous study that examined risk factors associated with nutrition with machine learning showed that AUC was around 0.80 [17]. In this study, AUC of the sarcopenia classification model were better at 0.81, 0.83. We found that BMI and duration of moderate physical activity are associated with sarcopenia and BMI and duration of moderate physical activity might be an informative predictor in the primary care setting.
This study had several limitations. First, it did not include assessments of muscle strength and function (i.e., gait speed). Recently, the European Working Group on Sarcopenia in Older People suggested that muscle mass, strength, and function should all be considered during diagnosis of sarcopenia. Unfortunately, muscle strength and function were not included as variables in the 2008–2011 KNHANES. Further studies using new sarcopenia criteria are needed. Second, this study was based on cross-sectional data; thus, the causal effects of risk factors on sarcopenia could not be confirmed. Further studies are necessary using longitudinal or cohort data with risk factors.
This machine learning study confirmed that the risk factors from population data can predict sarcopenia-related diseases. BMI is important risk factor for sarcopenia-related diseases. Older adults with low BMI need to be careful to prevent sarcopenia. Moderate-intensity physical exercise prevents sarcopenia and sarcopenic obesity. More special care is needed to prevent sarcopenic obesity among older women. Our results suggest that the risk factors identified in this study can help develop effective sarcopenia-related disease prevention and intervention programs.
This study was supported by the “Brain Korea 21 FOUR Project”, the Korean Research Foundation for Department of Physical Therapy in the Graduate School of Yonsei University.
No potential conflict of interest relevant to this article was reported.
Conceptualization: SK, CY. Data curation: SK, SY. Formal analysis: SK, JL. Investigation: SK. Methodology: SK, JL. Project administration: SK. Resources: SK, CY, JL. Software: SK. Supervision: SK, CY. Visualization: SK. Writing - original draft: SK. Writing - review & editing: SK, CY, JL.
Phys. Ther. Korea 2021; 28(3): 177-185
Published online August 20, 2021 https://doi.org/10.12674/ptk.2021.28.3.177
Copyright © Korean Research Society of Physical Therapy.
Seo-hyun Kim1 , PT, BPT, Chung-hwi Yi2
, PT, PhD, Jin-seok Lim1
, PT, BPT
1Department of Physical Therapy, The Graduate School, Yonsei University, 2Department of Physical Therapy, College of Software and Digital Healthcare Convergence, Yonsei University, Wonju, Korea
Correspondence to:Chung-hwi Yi
E-mail: pteagle@yonsei.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: Muscle undergoes change continuously with aging. Sarcopenia, in which muscle mass decrease with aging, is associated with various diseases, the risk of falling, and the deterioration of quality of life. Obesity and sarcopenia also have a synergy effect on the disease of the older adults.
Objects: This study examined the risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity and developed prediction models.
Methods: This machine-learning study used the 2008–2011 Korea National Health and Nutrition Examination Surveys in the analysis. After data curation, 5,563 older participants were selected, of whom 1,169 had sarcopenia, 538 had sarcopenic obesity, and 631 had sarcopenia without obesity; the remaining 4,394 were normal. Decision tree and random forest models were used to identify risk factors.
Results: The risk factors for sarcopenia chosen by both methods were body mass index (BMI) and duration of moderate physical activity; those for sarcopenic obesity were sex, BMI, and duration of moderate physical activity; and those for sarcopenia without obesity were BMI and sex. The areas under the receiver operating characteristic curves of all prediction models exceeded 0.75. BMI could predict sarcopenia-related disease.
Conclusion: Risk factors for sarcopenia-related diseases should be identified and programs for sarcopenia-related disease prevention should be developed. Data-mining research using population data should be conducted to enhance the effectiveness of early treatment for people with sarcopenia-related diseases through predictive models.
Keywords: Aging, Body mass index,Exercise, Machine learning, Sarcopenia
Human body composition continuously changes with age: body fat increases and muscle mass decreases, while body weight remains unchanged [1,2]. The loss of muscle mass is related to muscle strength (the force a muscle produces) and endurance (the ability of the muscle to contract continuously) at submaximal levels [3]. Furthermore, the motor units innervated in muscles change with age. A reduction in strength per motor unit is indicative of a decline in muscle quality [4]. The changes in muscle contribute to increased body fat because the loss of muscle mass reduces energy expenditures. Thus, changes in body composition lead to age-related impairment and disabilities [5]. In 1989, Rosenberg called this reduction of muscle mass with age “sarcopenia” [6].
Falls and fall-related injuries are common in older adults; the numbers of both increase exponentially with age [7]. The main causes of falls are old age, fear of falling, reduced balance, and impaired cognition and mobility [8,9]. One recently identified risk factor is sarcopenia [10,11], which plays a major role in the frailty and dysfunction of older adults; notably, it can lower their quality of life and affect mortality [12].
The increase in body fat with aging and decrease in physical activity due to sarcopenia are important risk factors for obesity; reduction in physical activity can lead to further loss of muscle mass [13]. Obesity is important in the development of metabolic syndrome and cardiovascular disease [14]. Therefore, the combination of sarcopenia and obesity (i.e., sarcopenic obesity) in older adults has synergistic effects on physical disability, metabolic disorders, cardiovascular disease, and mortality [15]. Older adults with sarcopenic obesity have higher prevalences of coronary heart and cardiovascular diseases, and higher mortality rates due to these diseases, compared with non-sarcopenic older adults [16]. Therefore, it is necessary to prevent sarcopenia and sarcopenic obesity by identifying the risk factors of both.
Sarcopenia and sarcopenic obesity are affected by various factors, such as age, nutritional imbalance, hormones, metabolism, immunological factors, and physical inactivity [7,15]. One large study examined the role of nutritional factors in sarcopenia [17]. The relationships among sarcopenia, sarcopenic obesity, and prevalences of certain diseases have been investigated [14,16,18]. Since various risk factors are interrelated, it is difficult to review them using statistical methods. Moreover, developing cost-effective risk prediction models in clinical settings is challenging. However, no machine-learning studies have yet examined the major risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity.
Therefore, this study examined risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity by means of machine learning; it also evaluated prediction models for these conditions.
Data from the 2008 to 2011 Korea National Health and Nutrition Examination Surveys (KNHANES) were reviewed. KNHANES assesses Koreans using a clustered, multistage, stratified, rolling sample; the assessment consists of a medical examination, health questionnaire, and nutrition survey. More than 800 items are surveyed each year. All assessments are conducted by professional survey teams consisting of nurses, nutritionists, and health researchers. Informed consent was obtained from each participant when the 2008–2011 KNHANES were conducted.
The data used in this study were collected from 37,753 KNHANES participants. However, participants were excluded if they were under 60 years of age, if they had not provided data regarding skeletal muscle mass, or if they had missing values. After exclusion, this study enrolled 5,563 participants. The participants’ characteristics are summarized in Table 1.
Table 1 . Participant characteristics.
Characteristics | Normal (n = 4,394) | Sar (n = 1,169) | SO (n = 538) | SnO (n = 631) |
---|---|---|---|---|
Sex (male/female) | 1,874/2,520 | 519/650 | 48/490 | 471/160 |
Age (y) | 68.62 (5.85) | 71.18 (6.15) | 70.84 (6.20) | 71.48 (6.10) |
Height (cm) | 157.82 (8.93) | 156.74 (8.82) | 152.33 (6.63) | 160.51 (8.72) |
Weight (kg) | 61.36 (9.65) | 51.56 (7.48) | 52.23 (6.47) | 50.98 (8.21) |
BMI (kg/m2) | 24.58 (2.94) | 20.97 (2.47) | 22.48 (2.07) | 19.69 (2.02) |
Body fat (%) | 29.25 (7.82) | 28.78 (8.60) | 36.54 (4.16) | 22.16 (5.22) |
Values are presented as number only or mean (standard deviation). Sar, sarcopenia; SO, sarcopenic obesity; SnO, sarcopenia without obesity; BMI, body mass index..
The criteria for sarcopenia followed the approach of the 1998 New Mexico Elder Health Survey. Whole-body dual x-ray absorptiometry (QDR 4500A; Hologic, Waltham, MA, USA) was used to measure body lean mass and the appendicular skeletal muscle mass (ASM) was defined the sum of the lean masses of the legs and arms. The skeletal muscle mass index was calculated using the formula ASM (kg)/square of height (m2). Sarcopenia was defined in accordance with Asia Working Group for Sarcopenia criteria (<5.4 kg/m2 in women and <7.0 kg/m2 in men).
The criterion selected for determination of obesity was a body fat ratio of 30%, because the body mass index (BMI) reflects both muscle and body fat [19]. The sarcopenia and normal groups were classified based on skeletal muscle mass index, with 1,169 and 4,394 participants, respectively. The sarcopenic obesity group comprised 538 participants with sarcopenia and a body fat ratio ≥ 30%; the sarcopenic without obesity group comprised 631 participants with sarcopenia and a body fat ratio < 30% (Figure 1).
Data were collected regarding 865 variables for 5,563 KNHANES participants. Variables for which more than 300 individuals did not respond were excluded; participants who did not respond to the remaining 345 variables were also excluded. The variables removed are ophthalmologic disease, dental disease, childhood-related variables and the variables included in this study are summarized in Table 2.
Table 2 . Survey variables used in machine learning.
Survey classification | Survey items | |
---|---|---|
Health survey | General information | Sex, age |
Education | Academic background | |
Economic information | Economic status, reasons for non-employment, employment form, position in the profession, occupation, the longest occupation | |
Quality of life | Subjective health recognition, EQ-5D (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) | |
Morbidity | Affected disease in the last 2 weeks, morbidity by 38 chronic diseases | |
Activity restrictions | Whether activity is restricted, reason for limiting activity | |
Smoking | Lifelong smoking, currently smoking, past smoking, no smoking, second-hand smoke | |
Drinking | Lifelong drinking, drinking start age, drinking frequency, alcohol consumption, binge drinking frequency | |
Physical activity | Vigorous physical activity, moderate physical activity, walking practice, strength training, flexibility exercise | |
Mental health | Sleep time, stress perception, depressive symptoms, suicide, mental problem counseling | |
Obesity and weight control | Subjective body recognition, weight control, weight control | |
Examination survey | Body measurement | Height, weight, waist circumference |
Blood pressure and pulse | Systolic blood pressure, diastolic blood pressure, pulse rate | |
Respiratory examination | Cough, phlegm, chest pain, difficulty breathing, fever |
During the data processing, negative responses (defined as 0 [none] and 88 or 888 [not applicable]) were unified as 0. To confirm risk factors and make prediction models, the following groups were compared: (1) sarcopenia and normal, (2) sarcopenic obesity and normal, and (3) sarcopenia without obesity and normal. Participant numbers in each group were matched by using under sampling.
2) Decision treeA decision tree is an analysis method that charts decision rules and categorizes groups of interest into several groups or makes predictions. Popular criteria are the Gini and entropy indexes [20]. This study used the classification and regression trees algorithm to perform separation using the Gini index. A classification tree selects the predictor variables that maximize the reduction in impurity (minimizing the Gini index for categorical variables and the expected sum variances for continuous variables as a node impurity criterion).
3) Random forestRandom forest is an ensemble technique that generates many classification trees by randomly selecting subsets of the given data and of the predictor variables, integrating the results of all models to obtain a random forest [20]. The top seven were selected as risk factors by calculating the mean decrease in accuracy and mean decrease in the Gini index through a random forest. Variables in the top seven were excluded if the combined score of the mean decrease in accuracy and mean decrease in Gini index exceeded one standard deviation from the mean.
4) Model evaluationThe accuracy, sensitivity, and specificity of six models (i.e., the decision tree and random forest models) were confirmed for each type of sarcopenia. Sensitivity is the probability of accurately identifying people with a disease, while specificity is the probability of accurately identifying people without disease. The probability of classification prediction for each model’s test set was assessed with receiver operating characteristic (ROC) curve analysis to determine the reliability of the final model. The ROC curve is a graph that shows the trade-off between sensitivity and (1–specificity) across a series of cutoff points. The area under the ROC curve (AUC) is considered an effective measure of the classification performance of the model.
Statistical analyses were performed using RStudio (ver. 1.2.5042; RStudio, Boston, MA, USA). The dataset was split into a 75% training set and a 25% test set for each comparative analysis. Decision tree and random forest models have good prediction capabilities, but have been criticized for overfitting. Therefore, 5-fold cross validation was used to evaluate the performances of the decision tree and random forest models.
The risk factors selected for sarcopenia using the decision tree method were BMI, duration of moderate physical activity, and EuroQol – 5 dimension descriptive system (EQ-5D) (Figure 2). Moderate physical activity is defined as “a little more difficult than usual or causing some breathlessness”. For 1 week, participants recorded the duration of moderate physical activity. EQ-5D is the preferred preference-based measure for health technology evaluation, with values ranging from 0 (death) to 1 (perfect health). The risk factors selected for sarcopenia using the random forest method were BMI, waist circumference, age, subjective body recognition (very thin, thin, moderate, slightly plump, and very fat), duration of moderate physical activity, and weight control for 1 year (weight loss, weight retention, weight gain, and none).
The sarcopenia decision tree model had an accuracy (95% confidence interval [CI]) of 76.2% (74.6–77.8), sensitivity (95% CI) of 77.6% (74.5–80.7), specificity (95% CI) of 74.8% (72.3–77.3), and AUC of 0.81. The sarcopenia random forest model had an accuracy (95% CI) of 75.4% (73.9–76.8), sensitivity (95% CI) of 72.2% (68.9–75.5), specificity (95% CI) of 78.6% (77.1–80.1), and AUC of 0.83.
The risk factors selected for sarcopenic obesity using the decision tree method were sex, BMI, duration of moderate physical activity, and occupational classification (Figure 3). Occupational classification is defined as “a position in one’s profession”. 1: administrator, experts; 2: office worker; 3: service and sales representatives; 4: agriculture, fisheries, and proficiency worker; 5: functional personnel, device and machine operation and assembly personnel; 6: simple labor worker; 7: unemployed (housewives, students, etc.). The risk factors selected for sarcopenic obesity using the random forest method were BMI, waist circumference, sex, reasons for non-employment (unnecessary, studying, retirement, health problems, searching for a job, and parenting), and occupation (manager, expert, office worker, service worker, agriculture and fisheries worker, mechanic, simple laborer, and military), sleep time, and duration of moderate physical activity.
The sarcopenic obesity decision tree model had an accuracy (95% CI) of 73.6% (72.3–74.9), sensitivity (95% CI) of 76.8% (72.9–81.1), specificity (95% CI) of 70.6% (66–75.2), and AUC of 0.78. The random forest model of sarcopenic obesity had an accuracy (95% CI) of 74% (72–76), sensitivity (95% CI) of 68% (65.2–70.8), specificity (95% CI) of 79.8% (75.6–84), and AUC of 0.80.
The risk factors selected for sarcopenia without obesity using the decision tree method were BMI, sex, and duration of moderate physical activity (Figure 4). The risk factors selected for sarcopenia without obesity using the random forest method were BMI, waist circumference, sex, subjective body recognition, and weight control for 1 year.
The sarcopenia without obesity decision tree model had an accuracy (95% CI) of 87.8% (86.7–88.9), sensitivity (95% CI) of 85.4% (84–86.7), specificity (95% CI) of 89.8% (87.5–92.1), and AUC of 0.9. The sarcopenia without obesity random forest model had an accuracy (95% CI) of 87.6% (86.6–88.6), sensitivity (95% CI) of 83.4% (82.4–84.4), specificity (95% CI) of 92% (90.4–93.6), and AUC of 0.93. The ROC curves for all models are shown in Figure 5.
In this study, we developed prediction models based on cross-sectional data to identify risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity, using decision tree and random forest models.
Machine learning has often been used to predict diseases and verify their risk factors. The decision tree classification model is a preferred method because it allows researchers to confirm the resulting model and cutoff values, thus justifying the selection of risk factors. A random forest creates multiple classification trees, each of which is trained on bootstrap samples of the original training data and determines the segmentation by searching for a subset of randomly selected input variables. Random forest is a popular method because it creates several classification trees consisting of randomly selected factors and aggregates them to identify risk factors; all factors can be considered.
The overall risk factors for sarcopenia resulting from the decision tree and random forest models are BMI and duration of moderate physical activity. BMI is associated with weight and predicts skeletal muscle mass [21]. A previous study found that BMI was an important risk factor in logistic regression, support vector machine, gradient boosting, and random forest models [17]. We found that BMI was an important predictor of sarcopenia, consistent with previous studies.
Physical activity is an important risk factor for sarcopenia [7,22]. Older adults have a greater reduction in muscle mass than younger adults when physical activity is reduced by the same amount, especially with respect to fast-twitch type Ⅱ muscle fibers, which are faster and contract more strongly than type Ⅰ muscle fibers [23]. This difference greatly influences the effects of sarcopenia in older adults; continuous physical activity is needed to stimulate muscles for healthy aging.
The overall risk factors of sarcopenic obesity determined by the two models were BMI, sex, and duration of moderate physical activity. Notably, BMI and duration of moderate physical activity were also risk factors for sarcopenia. While the other models chose BMI as the most important predictor, the decision tree model of sarcopenic obesity chose sex as the most important predictor; male sex had 82% accuracy for the normal group. This was similar to the results of a study that indicated a high prevalence of sarcopenic obesity in menopausal women due to an age-related reduction in sex hormones [24].
The risk factors for sarcopenia without obesity from the two models were BMI and sex. The prediction models of sarcopenia without obesity had fewer risk factors, but performed better than the other models. The AUCs of the decision tree and random forest models performed well, with values of 0.9 and 0.93, respectively; thus, they are valuable for assessing whether older adults exhibit sarcopenia without obesity.
The accuracy, sensitivity, specificity, and AUC of each model were determined. The prediction models of sarcopenia had acceptable accuracy (decision tree, 76.2%; random forest, 75.4%), sensitivity (77.6% and 72.2%), and specificity (74.8% and 78.6%). The prediction models of sarcopenic obesity exhibited the worst performance, with acceptable accuracy (73.6% and 76.8%), sensitivity (76.8% and 68%), and specificity (70.6% and 79.8%). The sarcopenia without obesity prediction models exhibited the best performance, with high accuracy (87.8% and 87.6%), sensitivity (85.4% and 83.4%), and specificity (89.8% and 92%). Given that X-ray absorptiometry is essential for diagnosis of sarcopenia, our prediction models may be very useful because they use readily available variables.
Through machine learning, this study examined the major risk factors of sarcopenia, sarcopenic obesity, sarcopenia without obesity, and even completed the evaluation by creating a prediction model. A previous study that examined risk factors associated with nutrition with machine learning showed that AUC was around 0.80 [17]. In this study, AUC of the sarcopenia classification model were better at 0.81, 0.83. We found that BMI and duration of moderate physical activity are associated with sarcopenia and BMI and duration of moderate physical activity might be an informative predictor in the primary care setting.
This study had several limitations. First, it did not include assessments of muscle strength and function (i.e., gait speed). Recently, the European Working Group on Sarcopenia in Older People suggested that muscle mass, strength, and function should all be considered during diagnosis of sarcopenia. Unfortunately, muscle strength and function were not included as variables in the 2008–2011 KNHANES. Further studies using new sarcopenia criteria are needed. Second, this study was based on cross-sectional data; thus, the causal effects of risk factors on sarcopenia could not be confirmed. Further studies are necessary using longitudinal or cohort data with risk factors.
This machine learning study confirmed that the risk factors from population data can predict sarcopenia-related diseases. BMI is important risk factor for sarcopenia-related diseases. Older adults with low BMI need to be careful to prevent sarcopenia. Moderate-intensity physical exercise prevents sarcopenia and sarcopenic obesity. More special care is needed to prevent sarcopenic obesity among older women. Our results suggest that the risk factors identified in this study can help develop effective sarcopenia-related disease prevention and intervention programs.
This study was supported by the “Brain Korea 21 FOUR Project”, the Korean Research Foundation for Department of Physical Therapy in the Graduate School of Yonsei University.
No potential conflict of interest relevant to this article was reported.
Conceptualization: SK, CY. Data curation: SK, SY. Formal analysis: SK, JL. Investigation: SK. Methodology: SK, JL. Project administration: SK. Resources: SK, CY, JL. Software: SK. Supervision: SK, CY. Visualization: SK. Writing - original draft: SK. Writing - review & editing: SK, CY, JL.
Table 1 . Participant characteristics.
Characteristics | Normal (n = 4,394) | Sar (n = 1,169) | SO (n = 538) | SnO (n = 631) |
---|---|---|---|---|
Sex (male/female) | 1,874/2,520 | 519/650 | 48/490 | 471/160 |
Age (y) | 68.62 (5.85) | 71.18 (6.15) | 70.84 (6.20) | 71.48 (6.10) |
Height (cm) | 157.82 (8.93) | 156.74 (8.82) | 152.33 (6.63) | 160.51 (8.72) |
Weight (kg) | 61.36 (9.65) | 51.56 (7.48) | 52.23 (6.47) | 50.98 (8.21) |
BMI (kg/m2) | 24.58 (2.94) | 20.97 (2.47) | 22.48 (2.07) | 19.69 (2.02) |
Body fat (%) | 29.25 (7.82) | 28.78 (8.60) | 36.54 (4.16) | 22.16 (5.22) |
Values are presented as number only or mean (standard deviation). Sar, sarcopenia; SO, sarcopenic obesity; SnO, sarcopenia without obesity; BMI, body mass index..
Table 2 . Survey variables used in machine learning.
Survey classification | Survey items | |
---|---|---|
Health survey | General information | Sex, age |
Education | Academic background | |
Economic information | Economic status, reasons for non-employment, employment form, position in the profession, occupation, the longest occupation | |
Quality of life | Subjective health recognition, EQ-5D (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) | |
Morbidity | Affected disease in the last 2 weeks, morbidity by 38 chronic diseases | |
Activity restrictions | Whether activity is restricted, reason for limiting activity | |
Smoking | Lifelong smoking, currently smoking, past smoking, no smoking, second-hand smoke | |
Drinking | Lifelong drinking, drinking start age, drinking frequency, alcohol consumption, binge drinking frequency | |
Physical activity | Vigorous physical activity, moderate physical activity, walking practice, strength training, flexibility exercise | |
Mental health | Sleep time, stress perception, depressive symptoms, suicide, mental problem counseling | |
Obesity and weight control | Subjective body recognition, weight control, weight control | |
Examination survey | Body measurement | Height, weight, waist circumference |
Blood pressure and pulse | Systolic blood pressure, diastolic blood pressure, pulse rate | |
Respiratory examination | Cough, phlegm, chest pain, difficulty breathing, fever |