Phys. Ther. Korea 2020; 27(4): 233-240
Published online November 20, 2020
© Korean Research Society of Physical Therapy
Department of Physical Therapy, College of Health and Welfare, Woosong University, Daejeon, Korea
Background: Cross-culturally adapted questionnaires may not be comparable to their original version.
Objects: To examine concurrent validity of two health-related quality of life (HRQOL) instruments for the Korean versions of EuroQOL-5 Dimension (EQ-5D) and the abbreviated version of the World Health Organization Quality of Life (WHOQOL-BREF) instrument.
Methods: A total of 139 cancer survivors from two rehabilitation institutes was recruited. All participants were registered for palliative rehabilitation care. Both instruments were concurrently administered by health care providers following the second bout of the rehabilitation cares. Rasch partial credit model and Spearman’s correlation analysis were used to investigate: 1) dimensionality, 2) hierarchical item difficulty, and 3) concurrent validity using correlations between two instruments.
Results: For the WHOQOL-BREF, all items except negative feeling, pain, dependence of medical aid, were found to be acceptable, while all items of EQ-5D were acceptable. There was an evidence of negative correlations between EQ-5D and 4 domains of WHOQOL-BREF. Two correlations were strong (EQ-5D vs. physical health domain, ρ = –0.610, 95% CI = –0.716 to –0.475) and moderate (EQ-5D vs. psychosocial domain, ρ = –0.402, 95% CI = –0.546 to –0.236). Other two correlations were weak (EQ-5D vs. social relationship and environmental domains, ρ = –0.242, 95% CI = –0.401 to –0.075 and ρ = –0.364, 95% CI = –0.514 to –0.207, respectively). Item difficulty calibrations of the two measurements were ranged from –0.84 to 0.86 for the EQ-5D and –1.07 to 1.06 for the WHOQOL-BREF.
Conclusion: The study provides some supports for the concurrent validity of the two Korean versions of HRQOL instrument, with evidences of weak to strong correlations between the EQ- 5D and four domains of the WHOQOL-BREF applied to various cancer survivors. Additionally, the cancer survivors appeared to have more of a tendency to view the EQ-5D items as being slightly more challenging than the WHOQOL-BREF items.
Keywords: Cancer survivors, Palliative care, Patient outcome assessment, Quality of life
Palliative care for cancer survivors is now being practiced in many clinical settings across the world and is generally focusing on their adaptations to such overwhelming circumstances [1-4]. The adaptations include sustainable efforts, such as palliative rehabilitation care (PRC), to draw positive impact on health-related quality of life (HRQOL) in the course of terminal illness. In fact, recent evidences also proved that PRC enhances the quality of life (QOL) of cancer survivors [5,6]. It is essential to determine how the PRC impacts the HRQOL of the survivors and how optimally measure the variations in HRQOL status over time in a suitable manner. The issue appears to be critical when creating cross-cultural adapted versions of the HRQOL instrument. The transposition of the instrument from its original cultural context is commonly unsuccessful because of cultural distinctions [7-9].
With almost endless array of cross-culturally adapted patient-reported outcome (PRO) measures to determine the impact on HRQOL status resulting from cancer-related conditions, the abbreviated version of the World Health Organization Quality of Life (WHOQOL-BREF) developed by the World Health Organization Quality of Life (WHOQOL) Group is the most widely accepted generic version of assessment for the purpose [10-14]. The WHOQOL-BREF focuses on a need for a genuinely international measure of QOL and the holistic aspects of individual’s well-being, which is originally designed for various populations with a wide range of HRQOL status [12,15,16]. The WHOQOL-BREF contains a total of 26 test items with two items for general QOL/satisfaction of health and 24 items for measuring four domains (i.e., physical, psychological, social relationships and environmental). EQ-5D is also well-known for being a reliable and valid measures with less number items than other HRQOL measures [17-20]. These generic PRO measures representing the HRQOL have been proven to be reliable and valid. These HRQOL measures are not primarily designed to estimate the patient’s views of their HRQOL status under specific conditions such as cancer condition, WHOQOL-BREF and EQ-5D measures have been applied to cancer survivor groups and cross-culturally adapted with many other languages [8,10,21,22]. Although there is a little consensus on how to maintain psychometric properties of their original version when creating cross-culturally adapted versions of it, these two instruments are increasingly being used in other disease groups [23-26]. While originally designed to approximate HRQOL status for being applicable cross culturally, versions of it often fail to maintain optimal psychometric properties such as dimensionality [8,23,24,26,27].
Most, if not all, scores from the classical test theory (CTT)-based instrument typically lead to ceiling or floor effects when imprecisely applied to various disease populations [27,28]. Ceiling effects commonly occur when measures are not challenging enough for respondents’ capability. For example, if items of a HRQOL instrument are too easy to be viewed by cancer survivors, the outcome scores will not increase for the survivors who may have improved from the palliative care. Consequently, this type of problem often results in type II errors . That is, false-negative rate is large among those scores in the upper extreme of the HRQOL measure . Likewise, floor effects may occur when items of a HRQOL measure are impossible to be challenged for a group of cancer survivors. In general, questionnaires do not have adequate breadth for the trait being measured and capability of capture any increment of health-related status [27,29]. Moreover, these psychometric deficits can be arisen from any unmatched distinction between instrument items and test takers. For example, when challenging items of the HRQOL are administered to cancer survivors with low status of HRQOL or vice versa (i.e., when challenging items were not closely match to respondents with low levels of HRQOL), this may result in the error . Several authors call for a methodology to overcome the psychometric deficits [26,27,31]. The authors provide convincing arguments to support item level investigation on resolving those limitations by directly scrutinizing individual items rather than the instrument as a whole.
To overcome the drawbacks of the existing CTT-based measures, many authors encouraged using item response theory (IRT) in evaluating and developing HRQOL measures for cancer-related investigations. This methodology has increasingly been used to develop and validate versions of the HRQOL instrument [32-37]. In contrast to CTT-based measure which primarily focuses on the instrument as a whole, IRT model focuses on individual items within the instrument. Through the item-level exploring on measurement qualities of HRQOL measures, one can estimate the probability that a cancer survivor will select a particular rating of the item. For example, cancer survivors with low status of HRQOL would likely to endorse ‘not at all’ on less challenging items, while cancer survivors with higher status of HRQOL would likely to select ‘very much’ on more challenging items. The item difficulties produced by item-level analysis (i.e., Rasch analysis, one-parameter IRT model) can serve as a means of validating the structure of the measures.
The purposes of this study are to investigate; 1) dimensionality, 2) hierarchical item difficulty continuum, and 3) comparisons between Korean versions of WHOQOL-BREF and EQ-5D applied on various cancer survivors undergoing palliative rehabilitation cares.
The Korean version of WHOQOL-BREF used in the present study was validated by Min et al. . Twenty-six items are rated on a five-point frequency of experience rating scale ranging from 1 (not at all) to 5 (completely) with respect to the past two weeks following the palliative rehabilitation care provided by the institutions. Since higher scores indicate higher status of HRQOL, item 3, 4 and 26 are reversely scored (i.e., negatively phrased). That is, item 3 ‘to what extent do you feel that physical pain prevents you from doing what you need to do’ indicates lower status of HRQOL when rated with 5. Thus, higher score for those three items indicates lower status of HRQOL for the survivors. The original WHOQOL-BREF contains a total of 24 items grouped into four domains: 1) physical health (P), 2) psychosocial (Psy), 3) social relationship (S), and 4) environmental domain (E).
The Korean version of EQ-5D measure used in the study was validated by Kim et al. . The EQ-5D has five domains of HRQOL: 1) mobility, 2) self-care, 3) usual activities, 4) pain/discomfort, and 5) depression/anxiety. Five items are rated on five categories of severity in each domain. The score can be converted into an index score by applying health preference weights elicited from a general population. This index score ranging from 0 to 1 can provides insight into ways to determine the status of HRQOL. For the direct comparison to four domains of WHOQOL-BREF score, total raw score of the EQ-5D was used. Thus, higher score indicates higher status of HRQOL. Two Korean versions of WHOQOL-BREF and EQ-5D were concurrently administered upon the last entry of the second bout of palliative rehabilitation care at the institutions (i.e., WHOQOL-BREF was administered immediately after EQ-5D instrument in the questionnaire set).
A total of 142 subjects were recruited from April 16, 2018 to February 20, 2020 at two institutions, Busan and Daejeon, Republic of Korea. All appropriate cancer survivors undergoing palliative rehabilitation care at the institutions were introduced with detail information of the study including informed consent and the conflict of interests not influencing the quality care of current palliative rehabilitation care regardless of participating in the study. The survivors were eligible if diagnosed with potentially curable cancer conditions within the last one year. A total of 142 subjects initially joined and signed informed consent for the present study. Answers from three subjects were excluded from the analyses due to incomplete data. Of the 139 completed both measures, more than 42 percent were male (n = 59) and less than 58 percent were female (n = 80) with an average age of 61.8 ranged from 35 through 88 years of age. More than thirty percent of the subjects (n = 42) were diagnosed with breast cancer and less than 70 percent (n = 97) were diagnosed with various cancers (i.e., cancers on colon, lung, stomach, pancreas, kidney, liver, ovary, lymphatic system-related, tonsil, bile duct, and brain). The study was approved by the Institutional Review Board of College of Health and Welfare, Woosong University (approval No. 1041549-190114-SB-70).
Scores were analyzed with Winsteps software program version 3.57.2 (Winsteps.com, Chicago, IL, USA) using partial credit model for: 1) fit statistics for dimensionality, 2) item difficulty calibrations. The goodness of fit for the Rasch model application describes how well a particular item fits the Rasch model. The criteria for fit statistics were determined by Fox and Bond’s suggestion, where mean squares are (MnSq) ≥ 1.4 and ≤ 0.6 . If fit statistics of items are out of ranges, it was considered to be misfit, which is an indication that the particular item or survivor may responded in unexpected ways.
Rasch model transform raw scores into the estimate of person ability (i.e., levels of HRQOL) and item difficulty (i.e., more or less challenging) in a log odd unit scale (i.e., logit). Logits are logarithmic transformation of item and person scores into interval scales, which based on the ratio of the probability of success over failure on an item at a particular category response. Items that are of greater challenge receive higher calibrations, while items that are of less challenge receive lower calibrations. By estimating the probability of selecting a particular rating for an individual item, Rasch model yields invariant item difficulty calibration. As test items are presented in order of item difficulty, one can logically expect that easy items require less challenges, while difficult items require more challenges. Hence, either health care professionals or the survivors can get a general sense of which domain of HRQOL instrument is having difficulty with. This logical decision-making procedure can be taken into consideration in the process of palliative rehabilitation care.
IBM-SPSS statistics-version 25 was used to compute descriptive statistics and Spearman’s correlation analysis. The Spearman’s rank test was used to estimate correlation coefficients (Spearman’s rho, ρ) between scores of EQ-5D and 4-domains of WHOQOL-BREF. The correlation analysis was to examine the concurrent validity of two translated versions of the HRQOL instrument from the palliative rehabilitation program for cancer survivor data in order to understand its potential use in the Korean context. The 95% confidence intervals (CIs) were estimated by bootstrapping method with 1,000 iterations. The degree of correlations was interpreted as: 0–0.19 = very weak, 0.20–0.39 = weak, 0.40–0.59 = moderate, 0.60–0.79 = strong, and 0.80–1.0 = very strong relationship .
As an initial means in determining the dimensionality of two HRQOL instruments, fit statistics were examined with the criterion of misfit. Table 1 presents item measures (i.e., item difficulty calibrations), Infit/outfit statistics and Z score standardized for 24 items of WHOQOL-BREF except for two general items. All items, except three items of ‘negative feeling’, ‘pain’, ‘dependence of medical aids’, exhibit acceptable fit statistics (Table 1). Table 2 presents that all five items of EQ-5D present acceptable fit statistics (Table 2). Item difficulty calibrations of the two measurements were ranged from –0.84 to 0.86 for the EQ-5D and –1.07 to 1.06 for the WHOQOL-BREF.
Concurrent validity between the EQ-5D and the WHOQOL-BREF at a cross-sectional data point was assessed using scatterplots and Spearman’s rank test to estimate correlation coefficients (Spearman’s rho, ρ). Table 3 presents the correlation coefficients between scores of the EQ-5D and the 4 domains (i.e., physical health, psychosocial, social relationship, and environmental domains) of WHOQOL-BREF instrument. There was an evidence of negative correlations between the EQ-5D and four domains of WHOQOL-BREF, which were statistically significant at p = 0.01. One was strong (EQ-5D vs. physical health domain, ρ = –0.610, 95% CI = –0.716 to –0.475) and moderate (EQ-5D vs. psychosocial domain, ρ = –0.402, 95% CI = –0.546 to –0.236). Two correlations were weak (EQ-5D vs. social relationship/environmental domains, ρ = –0.242, 95% CI = –0.401 to –0.075 and ρ = –0.364, 95% CI = –0.514 to –0.207, respectively). Correlations among four domains of WHOQOL-BREF were moderate to strong (0.477 to 0.705). All correlations were statistically significant at p = 0.01. Scatter plots showing the relationship between the EQ-5D and three domains are presented in Figure 1.
The study provides some supports for the concurrent validity of the two Korean versions of HRQOL instrument, with evidence of weak to strong correlations between the EQ-5D and four domains of the WHOQOL-BREF. Although there was an evidence of strong correlation between the two instruments (i.e., EQ-5D and physical health domain), correlations with other domains are ranged from moderate to weak. A surprising finding in the study was that the EQ-5D appeared to be negatively correlated with all four domains of the WHOQOL-BREF. Although these two instruments are created on based on the same underpinning theoretical framework, two scores are negatively correlated. This negative correlation is of concern. First, why is there a difference across two instruments? Which instrument is more valid? Several reasons can be postulated for the two scores to be negatively correlated. This was likely due to the use of inadequately translated versions of the instruments. Thus, few authors provided reasonable suggestions to use appropriate procedures when developing cross-culturally adapted versions of instruments [24,28]. The procedures should include qualitative and quantitative validation methods rather than quantitative approach only [24,25]. Also, there may be psychosocial reasons for the survivors to be inconsistent on the responses to WHOQOL-BREF questions. Three items of WHOQOL-BREF (i.e., items of negative feeling, pain, and dependence of medical aid) are negatively phrased, while the EQ-5D questions are positively phrased. In fact, those three items showed erratic response patterns with high fit statistics. Negative correlations between these similar instruments in the study might have reflected the inadequately translated phrases and psychosocial reasons of misinterpreting test items [28,29,34].
Rasch fit statistics based on the probability of getting a rating on an item can be used to determine whether a particular item falls within a latent trait being measured (i.e., domain or construct). For the fit statistic determination, mean square value is the unstandardized form of fit statistics, which is mean of the squared residual for the item. The residual values represent the difference between the theoretical expectation of the item estimated by Rasch model and the item performance from actual data. Thus, one can examine to determine the dimensionality of instrument. The study results on fit statistic analysis showed acceptable for EQ-5D, while three items of WHOQOL-BREF were out of range (i.e., if MnSq < 0.6 or > 1.4 based on Bond and Fox’s suggestion). Those three items appeared to have dimensionality issue within the instrument as well as negative correlation between EQ-5D and four domain scores of WHOQOL-BREF.
Rasch model also can provide valuable insight into the concept of item difficulty in which the cancer survivor’s response patterns to test items are presented in order of the item difficulty hierarchy . By inspecting the responses in relation to a cancer survivor’s HRQOL level, one can logically guess the next level of HRQOL. For example, it would be expected that anxiety/depression item of EQ-5D would logically be more challenging than mobility item when a cancer survivor is having challenge on mobility. Empirically, cancer survivors may not rate with higher ratings on mobility (i.e., less challenging item) and lower ratings on anxiety/depression (i.e., more challenging item). Therefore, one can get a general idea of a cancer survivor’s overall performance the next level of HRQOL level.
The study used data from institution-based cohort of various cancer survivors recruited from urban areas in Republic of Korea. Although interferences that can be drawn from this type of observational study are limited by selection bias, clinical database from multicenter may compensate the data quality to address issues related to the potential bias [41,42]. The study was originally designed to include four institutions, enabling overcoming the bias to validate those instruments. However, only two institutions were actually participated because those two institutions were operating active palliative rehabilitation programs. Limitations of the study include; 1) limited sample size cannot produce a positive result on this validation study. However, as more clinically-driven data accrued, the future study with acceptable sample size can provide more insight into the validation on psychometric properties of cross-culturally adapted versions of HRQOL measurement. Additionally, medical conditions of various cancer survivors participated in the study might have led to response bias which happens on the validity of structured interview or questionnaire.
The study using Item level analysis provides some support for the concurrent validity of the cross-culturally adapted version of EQ-5D and WHOQOL-BREF. The correlations between scores of EQ-5D and four domain scores of WHOQOL-BREF were ranged from strong to weak relationships. However, there was an evidence of negative relationship between two well-known HRQOL instruments. This finding reinforces the importance of; 1) negatively phrased items and 2) thus, inadequately translated the phrases for the particular items (i.e., negative feeling, pain, and dependence of medical aid items) within those instruments. Other than those three erratically responded items, all items of the EQ-5D and WHOQOL-BREF were acceptable in regards to fit statistics and hierarchical order of item difficulty.
This research is based on the support of 2020 Woosong University Academic Research Funding.
No potential conflict of interest relevant to this article was reported.