Open Access Repository for Measurement Instruments

AkkordeonDetailedScaleViews

Adapted German, Arabic and Dari Short-Form Health Survey (SF-12-ENSURE)

  • Author: Richter, L., Brust, O., & Menold, N.
  • In ZIS since: 2024
  • DOI: https://doi.org/10.6102/zis345
  • Abstract: The Short-Form Health Survey (SF-12) assesses physical health-related (PH) and mental health (MH)-related well-being. Its 13-item Version, used in the socioeconomic panel (SOEP) 2016 (see S1), unde ... morerwent adaptation, cognitive pretesting, revision, and validation for German, Arabic, and Dari utilizing two samples: a probability individual register sample (for German and Arabic) and a convenience sample recruited via Facebook (for Arabic and Dari). The survey targets the German-speaking population and individuals with a migration background or refugees proficient in Arabic or Dari aged 18 years or older. The two-factor structure of the revised SF-12, including a PH and MH dimension, could be supported by Multi Group Confirmatory Factor Analyses (MG-CFAs). Validity and reliability analyses support its use within each language group and in heterogeneous samples. Measurement invariance analyses showed metric and scalar invariance in the register sample. In the Facebook sample, metric invariance and partial scalar invariance are supported. less
  • Language Documentation: English
  • Language Items: German, Arabic, Dari
  • Number of Items: 12
  • Survey Mode: Self-administered online survey
  • Reliability: PH: Raykov’s Rho = .76 to .88; MH: Raykov’s Rho = .87 to .90
  • Validity: Evidence for factorial, construct, and criterion validity
  • Construct: Physical and mental health-related well-being
  • Catchwords: well-being, health related quality of life, mental health, physical health
  • Item(s) used in Representative Survey: no
  • URL Website:

    pretesting report: https://pretest.gesis.org/pretestProjekt/University-of-Dresden-Translation-of-established-public-health-measurement-instruments-into-Arabic-and-Dari-%28ENSURE%29 %28English-Version%29 (Hadler, P., Nießen, D., Lenzner, T., Steins, P., Quint, F., & Neuert, C. (2021): Translation of established public health measurement instruments into Arabic and Dari (ENSURE) (English Version). Cognitive pretest. GESIS Project Reports. Version: 1.0. GESIS - Pretestlab. Text.)

    URL Data Archive: data available upon request

  • Status of Development: validated
    • Instruction

      No instruction is used for items 1 to 4. The instruction for items 5-12 is as follows: 'Bitte denken Sie nun an die letzten 4 Wochen. Wie oft kam es hier vor, dass Sie…' (English translation, for comprehension only: 'Please think back to the last 4 weeks. How often did it happen here that you...'). (For German items see Table 1).

       

      Items

      Table 1

      Items of the German-Language adaption of the SF-12 Version, used in the SOEP 2016

      No.

      Label

      German

      English translation (for comprehension only)

      Polarity

      Subscale

      1

      Overall physical

       

      Wie ist ihr physischer, also Ihr körperlicher Gesundheitszustand im Allgemeinen? Würden Sie sagen er ist ...

       

      How is your physical state of health in general? Would you say it is ...

      +

      Physical Health  

       

       

      2

      Overall mental

      Wie ist ihr psychischer, also ihr seelischer oder emotionaler Gesundheitszustand im Allgemeinen? Würden Sie sagen er ist...

       

      How is your psychological, i.e. your mental or emotional state of health in general? Would you say it is...

      +

      Mental Health

       

       

       

      3

      Stairs

      Wenn Sie innerhalb der letzten 4 Wochen Treppen gestiegen, also bereits wenige Stockwerke zu Fuß hochgegangen sind, in welchem Ausmaß waren Sie hierbei durch Ihren Gesundheitszustand beeinträchtigt? Wenn Sie innerhalb der letzten 4 Wochen keine Treppen ge-stiegen sind, kreuzen Sie bitte „trifft nicht zu“ an.

       

      If you have climbed stairs in the last 4 weeks, i.e. walked up a few floors, to what extent were you affected by your state of health? If you have not climbed any stairs in the last 4 weeks, please check "not applicable".

      -

      Physical Health

       

       

       

       

       

       

       

       

       

      4

      Daily functioning

      Und wie war das mit anderen anstrengenden Tätigkeiten im Alltag, wenn Sie z.B. etwas Schweres heben mussten oder Beweglichkeit brauchten? In welchem Ausmaß waren Sie hierbei innerhalb der letzten 4 Wochen durch Ihren Gesundheitszustand beeinträchtigt?

      And what about other strenuous activities in everyday life, e.g. when you had to lift some-thing heavy or needed to be mobile? To what extent have you been affected by your state of health in the last 4 weeks?

      -

      Physical Health

       

       

       

       

       

       

       

      Bitte denken Sie nun an die letzten 4 Wochen. Wie oft kam es hier vor, dass Sie…

       

      Please think back to the last 4 weeks. How often did it happen here that you...

      5

      Blue

      sich trübsinnig fühlten?

      felt melancholy?

      -

      Mental Health

      6

      Social functioning

       

      sich zurückziehen wollten?

      wanted to withdraw?

      -

      Mental Health

      7

      Restricted physically

      wegen physischen, also körperlichen gesundheitlichen Problemen, bei Ihrer Arbeit oder in Ihrem Alltag eingeschränkt waren?

      were restricted at work or in your everyday life, due to health problems of a physical nature?

      -

      Physical Health

      8

      Less accomplished physically

      wegen physischen, also körperlichen gesundheitlichen Problemen, bei Ihren privaten oder beruflichen Tätigkeiten weniger geschafft haben als Sie wollten?

       

      have achieved less than you wanted to in your private or professional activities, due to health problems of a physical nature?

      -

      Physical Health

      9

      Restricted mentally

      wegen psychischen, also seeli-schen oder emotionalen Problemen, bei ihren privaten oder beruflichen Tätigkeiten eingeschränkt waren?

       

      were restricted in your private or professional activities, due to psychological, i.e. mental or emotional problems?

      -

      Mental Health

      10

      Less accomplished mentally

      wegen psychischen, also seeli-schen oder emotionalen Problemen, bei Ihrer Arbeit oder in Ihren alltäglichen Beschäftigungen weniger geschafft haben als Sie wollten?

       

      achieved less than you wanted to at work or in your everyday activities, due to psychological, i.e. mental or emotional problems?

      -

      Mental Health

      11

      Less social physically

      wegen physischen, also körperlichen gesundheitlichen Problemen, weniger soziale Kontakte, zum Beispiel zu Freunden, Bekannten oder Verwandten hatten?

       

      had fewer social contacts, for example with friends, acquaintances or relatives, due to health problems of a physical nature?

      -

      Physical Health

      12

      Less social mentally

      wegen psychischen, also seeli-schen oder emotionalen Problemen, weniger soziale Kontakte, zum Beispiel zu Freunden, Bekannten oder Verwandten hatten?

      had fewer social contacts, for example with friends, acquaintances or relatives, due to psychological, i.e. mental or emotional problems?

      -

      Mental Health

       

       

       

       

       

       

      Note. Item 7 to 12: Colloquial language is used; Researchers can test the use of “aufgrund von…”    instead of “wegen” to avoid the colloquial use. 

       

      Response specifications

      Table 2

      Response specifications

      Items

      German

      English (for comprehension only)

      1-2

      1 = sehr schlecht

      2 = eher schlecht

      3 = teils-teils

      4 = eher gut

      5 = sehr gut

      -9 = nicht beantwortet

      1 = very bad

      2 = rather bad

      3 = partly bad/partly good

      4 = rather good

      5 = very good

      -9 = not answered

      3-4

      1 = gar nicht

      2 = wenig

      3 = mittelmäßig

      4 = ziemlich

      5 = sehr

      -9 = nicht beantwortet

      -1 = trifft nicht zu

      1 = not at all

      2 = a little bit

      3 = moderately

      4 = almost fully

      5 = fully

      -9 = not answered

      -1 = not applicable

      5-12

      1 = nie

      2 = selten

      3 = manchmal

      4 = oft

      5 = immer

      -9 = nicht beantwortet

      1 = never

      2 = seldom

      3 = sometimes

      4 = often

      5 = all the time

      -9 = not answered

       

      Scoring

      The instrument covers two dimensions of well-being: physical health-related (PH) and mental health (MH)-related well-being. These dimensions are drawn from theory and supported by factor analysis. Unweighted mean scores can be calculated for PH and MH subscales. In addition, factor scores can be used. When building subscale scores, it is necessary to reverse the polarity of the item values for the negatively-poled items 3 to 12. Pairwise exclusion of missing values is recommended. Lower scores are associated with poorer PH / MH while higher scores indicate better PH / MH.

       

      Application field

      The target group of the survey instrument are adults aged 18 years or older who speak German, Arabic, or Dari. The use is restricted to the web-based self-administered mode, as the instrument was validated in this mode. The instrument can be used for research purposes only. Considering the results of the reliability and validity analysis, the instrument can be used in each language group separately and also in the heterogeneous language samples. Comparability is given between Arabic and German languages as well as between Arabic and Dari. Further research and analysis are needed to make simultaneous comparisons between the three languages.

    The Short Form Health Survey SF-12 questionnaire (Ware et al., 1996) is a short version of the SF-36 (Morfeld & Bullinger, 2008; Ware et al., 1993) used to assess the impact of health on an individual’s everyday life. Thanks to its brevity, this quality of life measure is appropriate for use in survey research. Like its predecessor the SF-36, it provides composite scores for both PH and MH across eight domains of health: PH-related scales comprise 1) General Health; 2) Physical Functioning; 3) Role Physical and 4) Body Pain. MH-related scales include 5) Vitality; 6) Social Functioning; 7) Role Emotional and 8) Mental Health.

     

    In recent decades, there has been a marked shift in prevalent health problems from acute to chronic conditions in industrialized nations, influenced by medical advances and demographic changes (Robert Koch Institute, 2015). Traditionally, evaluating the impact of diseases on a population has relied on mortality data. However, with increasing life expectancy and advances in medical treatments, the need to assess the severity of illness, both physical and mental, has become increasingly apparent. Such assessments need to encompass various aspects of health and their consequences for the well-being of individuals (Bullinger & Quitmann, 2014). Therefore, health-related quality of life measures have become increasingly established in both research and clinical practice, playing a crucial role in assessing the overall well-being of individuals and providing insights for informed decision-making in healthcare settings (Wirtz & Bengel, 2011). While a consistent definition of the concept is lacking, it commonly considers physical, psychological, social, family, and work-related aspects (Otto & Ravens-Sieberer, 2020).

     

    The SF-12 is one of the most widely utilized tools for evaluating health-related quality of life, in both German-speaking contexts and internationally (Morfeld & Bullinger, 2008). The reliability of the SF-12 has been evaluated in the U.S., UK., and China for general populations (Fong et al., 2010; Ware et al., 1996) as well as for groups with mental illness (Huo et al., 2018). Test-retest reliability scores have been reported as consistently higher for PH (.61 (for clinical samples) to .89) than for MH (.57 (for clinical samples) to .77). With respect to validity, SF-12 form was found to strongly correspond to the longer SF-36 form, to correlate with other measures of PH and MH to the expected degree (Ware et al., 1996) and to differentiate between groups with and without a corresponding health diagnosis (Fong et al., 2010). Some previous studies have indicated potential factorial structure problems as it varied between different language and ethnic groups (Fleishman & Lawrence, 2003; Fong et al., 2010). A version in German and in languages of migrant groups in Germany has been used in the SOEP (Andersen et al., 2007). The factorial structure of the SF-12 is not supported in the SEOP refugee panel (Tibubos & Kröger, 2020).

    Cognitive pretests

    The German, Arabic, and Farsi versions of the SOEP (Andersen et al., 2007) were adapted. The Farsi version was adapted to Dari using the TRAPD approach (Translation, Review, Adjudication, Pretest, Documentation) with two professional translators. In the next step Cognitive Interviews (CI) were conducted with 18 refugees (Syrian N = 6, Iraqi N = 6, Afghan N = 6) of different gender, age, and education (Hadler et al., 2021). The pretest was conducted through remote interviews using either video or telephone conferences. In addition to the interviewees, interpreters also took part in the conferences in order to take cultural and linguistic particularities into account. In the case of several items, interviewees expressed confusion regarding whether certain items referred to PH, MH, or were intended as indicators of general health. Further challenges arose due to the lack of clarity regarding temporal references, as interviewees found it difficult to distinguish between their current feelings and those experienced over the previous four weeks. Refugees also tended to understand “all-day activities” as referring primarily to activities relating to their professional work. Respondents also interpreted the items regarding emotional feelings differently and discussed cultural and gender differences in their expression. Confusions were also associated with the varying number of response categories, with three categories used for some indicators and five for others. Respondents also had difficulties with double barreled questions (DBQs), i.e. questions that include enumerations or more than one stimulus to evaluate in one indicator (Menold, 2020).

     

    Item generation and selection

    Revisions of the instruments were provided to address these problems. Both the participants' feedback and the interpreters' assessments were taken into account. Additional indicators were developed to balance PH and MH. DBQs were split into different indicators. The number of response options was unified and five categories were used for all items. Redundancies were omitted and a clearer visual layout was used. Finally, the findings from the cognitive pretest were used to make adjustments to the introduction, to unify and highlight the temporal reference, and to provide alternatives to emotional terms.

     

    In response to feedback from interviewees and upon splitting DBQs, various alternatives for item formulation were generated and incorporated into the tested form, resulting in a total of 18 items. Items were selected to find the most suitable six items per subscale (PH or MH), applying the principle of a clear separation between PH and MH. In addition, indicators that exhibited high differences in thresholds among the three groups were excluded. The general well-being item was excluded from the instrument, because it cannot be uniquely assigned to PH or MH.

     

    Samples

    Register sample

    A probabilistic sample of three large cities in Saxony (Dresden, Leipzig, Chemnitz) was implemented. The target population included German-speaking residents and Arabic-speaking Syrian and Iraq residents (refugees). The addresses of sampled persons, together with information on gender, age, and country of origin, were provided by municipal registration offices. We only required the selection of addresses for refugees who had been resident in the city since 2014. In total, contact details for 1,000 German speakers, 2,000 Arabic speakers and 1,000 Dari speakers were provided. We used mail contact and re-contacts to recruit the participants to a web-based survey. Response rates RR6 (AAPOR, 2023) were 28.42 % for German-speaking residents, 19% for Arabic-speaking refuges and 16% for Dari-speaking refugees. The sample was randomized, with one group of respondents answering the SF-12 questionnaire version prior to the cognitive pretest, and the other group answering the questionnaire version after the cognitive pretest. For the SF-12 version administered after the cognitive pretest, relevant to this report, the sample size was N = 147 for Arabic speakers and N = 125 for German speakers

     

    Facebook sample

    The Facebook survey was hosted on the servers of TU Dresden and targeted people from Iraq, Syria and Afghanistan living in Germany. An external service provider was commissioned to advertise the web-based survey. In addition, Bielefeld University supported advertising, but was unable to collect socio-demographic information. To ensure that participants were refugees who spoke these languages, we screened participants to limit them to refugees living in Germany only. The questionnaire was available in Arabic and Dari only. A random part of them administered the adapted instrument as developed after the cognitive pretests. For the SF-12 version administered after the cognitive pretest, relevant to this report, the sample size was N = 779 for Arabic speakers and N = 503 for Dari speakers. See Tables 3 and 4 for a description of both samples.

     

    Table 3

    Descriptives Register sample

     

    German

    Arabic

     

    Statistics

    N

    Statistics

    N

    Gender (%men)

    44.8

    125

    66.7

    147

    Age (Mean, SD)

    45.7 (19.0)

    125

    33.8 (8.9)

    147

    Education

     

     

     

     

    %School completed

    99.1

    111

    91.6

    119

    %University degree

    49.1

    108

    30.6

    121

    Years of Education

    (Mean, SD)

    11.3 (1.9)

     

    10.8 (4.0)

     

    Country of origin / nationality

    n.a.

    n.a.

     

    116

    Syria (%)

     

     

    65.3

    Iraq (%)

     

     

    34.7

    In Germany since 2014 (%)

    n.a.

     

    95.4

    116

    Note. N refers to the total sample size of respondents for German/Arabic with available demographic data for the specific question. Deviations in N due to Data Missing.

     

    Table 4

    Descriptives Facebook sample

     

    Arabic

    Dari

     

    Statistics

    N

    Statistics

    N

    Gender (%men)

    67.4

    144

    78.6

    56

    Age (Mean, SD)

    34.0 (9.0)

    144

    30.4 (8.7)

    56

    Education

     

     

     

     

    %School completed

    94.5

    493

    78.3

    226

    %University degree

    50.0

    482

    45.4

    240

    Years of Education

    (Mean, SD)

    11.3 (3.8)

    476

    11.1 (3.3)

    225

    Country of origin / nationality(%)

     

    144

     

    56

    Syria

    64.6

    93

    0.0

    0

    Iraq

    35.4

    51

    0.0

    0

    Afghanistan

    0.0

    0

    100

    56

    Germany

    0.0

    0

    0.0

    0

    Mother tongue(%)

     

     

     

     

    Arabic

    100

    144

    0

    0

    Dari

    0

    0

    100

    56

    Farsi

    0

    0

    0

    0

    In Germany since 2013

    91.3

    461

    91.2

    251

    Note. N refers to the total sample size of respondents for Arabic/Dari with available demographic data for the specific question. Deviations in N due to Data Missing.

     

    Item analyses

    Table  and 6 show the distribution parameters of the register and Facebook sample for each item of the instrument.

     

    Table 5

    Register sample: N, Means, Standard Deviations, Skewness, Excess Kurtosis, and Discriminatory Power of Items

     

    N

    Mean

     

     

    SD

     

     

    Skewness

    (SE)

    Excess Kurtosis (SE)

    Item-scale correlation (corrected)

    Item 1

    269

    3.74

    0.89

    -0.85 (0.15)

    1.16 (0.30)

    .73

    Item 2

    270

    3.50

    1.07

    -0.46 (0.15)

    -0.29 (0.30)

    .70

    Item 3

    233

    3.91

    1.23

    -1.05 (0.16)

    0.10 (0.32)

    .60

    Item 4

    264

    3.88

    1.17

    -0.87 (0.15)

    -0.15 (0.30)

    .77

    Item 5

    261

    3.34

    1.18

    -0.23 (0.15)

    -0.81 (0.30)

    .70

    Item 6

    260

    3.22

    1.20

    -0.05 (0.15)

    -0.86 (0.30)

    .55

    Item 7

    258

    3.95

    1.18

    -0.98 (0.15)

    -0.01 (0.30)

    .82

    Item 8

    257

    3.91

    1.14

    -0.82 (0.15)

    -0.24 (0.30)

    .81

    Item 9

    259

    3.88

    1.16

    -0.77 (0.15)

    -0.29 (0.30)

    .79

    Item 10

    260

    3.79

    1.16

    -0.62 (0.15)

    -0.57 (0.30)

    .80

    Item 11

    256

    4.27

    1.06

    -1.42 (0.15)

    1.22 (0.30)

    .69

    Item 12

    256

    4.02

    1.15

    -1.03 (0.15)

    0.19 (0.30)

    .78

    Note. Note that polarity of item values was reversed for negatively poled items 3-12.

     

    Table 6

    Facebook sample: N, Means, Standard Deviations, Skewness, Excess Kurtosis, and Discriminatory Power of Items

     

    N

    Mean

     

     

    SD

     

     

    Skewness

    (SE)

    Excess Kurtosis (SE)

    Item-scale correlation (corrected)

    Item 1

    1082

    3.63

    1.00

    -0.62 (0.07)

    0.13 (0.15)

    .57

    Item 2

    1101

    3.10

    1.19

    -0.18 (0.07)

    -0.90 (0.15)

    .70

    Item 3

    910

    3.41

    1.41

    -0.39 (0.08)

    -1.16 (0.16)

    .61

    Item 4

    1032

    3.65

    1.28

    -0.64 (0.08)

    -0.67 (0.15)

    .73

    Item 5

    924

    2.89

    1.28

    0.13 (0.08)

    -0.99 (0.16)

    .78

    Item 6

    925

    2.82

    1.22

    0.25 (0.08)

    -0.74 (0.16)

    .56

    Item 7

    918

    3.74

    1.20

    -0.59 (0.08)

    -0.67 (0.16)

    .67

    Item 8

    911

    3.58

    1.21

    -0.44 (0.08)

    -0.78 (0.16)

    .61

    Item 9

    908

    3.41

    1.34

    -0.28 (0.08)

    -1.14 (0.16)

    .76

    Item 10

    911

    3.24

    1.32

    -0.10 (0.08)

    -1.12 (0.16)

    .72

    Item 11

    914

    3.83

    1.22

    -0.75 (0.08)

    -0.50 (0.16)

    .57

    Item 12

    917

    3.34

    1.34

    -0.21 (0.08)

    -1.17 (0.16)

    .78

    Note. Note that polarity of item values was reversed for negatively poled items 3-12.

    Reliability

    To assess reliability, a composite general reliability coefficient (Raykov's Rho) based on factor analysis was calculated from the two-factor models reported in the “Factorial Validity” section. In addition, we calculated single general Rho based on the two factors (Raykov, 2012; see table 7). Correlated error terms were consistently included in the denominator as a part of the measurement error (cf. Raykov, 2012).

     

    Table 7

    Register and Facebook sample: Composite Reliability (Raykov´s Rho) with Standard Errors (SE)

    Sample

    Scale

    Language / Group

    Composite Reliability (Raykov's Rho (SE))

    Register sample

    General

    Arabic

    .84 (.03)

     

     

    German

    .80 (.02)

     

     

    Overall Sample

    .83 (.02)

     

    Physical health

    Arabic

    .88 (.02)

     

    German

    .88 (.02)

     

    Overall Sample

    .88 (.02)

    Mental health

    Arabic

    .88 (.02)

     

    German

    .87 (.03)

     

     

    Overall Sample

    .87 (.02)

    Facebook sample

    General

    Arabic

    .85 (.01)

     

     

    Dari

    .90 (.01)

     

     

    Overall Sample

    .86 (.01)

     

    Physical health

    Arabic

    .83 (.01)

     

    Dari

    .76 (.02)

     

    Overall Sample

    .80 (.01)

    Mental health

    Arabic

    .87 (.01)

     

    Dari

    .89 (.01)

     

     

    Overall Sample

    .88 (.01)

     

    Reliability analyses indicate high composite reliability scores for the PH scale in the probability sample and middle scores in the Facebook sample. Reliability of MH is high in both samples and all language groups. The two-factor based Rho (General Reliability) is reasonably good in the register and excellent in the Facebook sample. Overall, composite reliability of the adapted SF-12-ENSURE was found to be sufficient to excellent.

     

    Validity

    Factorial Validity

    Factorial validity was evaluated by means of MG-CFAs for each sample separately (see Table 8). We used Mplus 8.7 (Muthén & Muthén,1998-2021) for these analyses. The model fit in the MG-CFAs was evaluated using the chi-square test, the Root-Mean-Square Error of Approximation (RMSEA), and the Comparative Fit Index (CFI) (Beauducel & Wittmann, 2005). The CFI of .95 or higher and RMSEA of.08 or less indicate an acceptable fit (Hu & Bentler, 1999). The Robust Maximum Likelihood estimator (MLR) was used due to the ordinal nature and non-normality of the data (Muthén & Muthén, 1998-2021). Due to five categories used we treated indicators as continuous (cf. e.g. Beauducel & Herzberg, 2006). The two factors could be supported as a just acceptable or acceptable model fit was reached in both samples. In both samples, two correlated error terms were included as suggested by high modification indices. The first correlated error was introduced between the items “restricted mentally” and “less accomplished mentally”. As the items are similar in content and wording and address limitations in activities due to the emotional problems, the error term can be justified. The second correlated error term was indicated between the items “stairs” and “hinder”, which seem to be more similar to each other with respect to physical limitations than other items of the PH factor. Both correlations are also reported for the SOEP samples (cf. Tibubos & Kröger, 2020) and are confirmed here, although the number of correlated error terms is reduced from four to two. The factor loadings were high for all items of the two factors in both samples. The factor correlation are of moderate size for German and Arabic languages, but high for Dari (Facebook sample).

     

    Table 8

    Register and Facebook sample: Results of MG-CFAs

     

    Register Sample

    Facebook Sample

     

    Arabic

    German

    Arabic

    Dari

    Item

    PH

    MH

    PH

    MH

    PH

    MH

    PH

    MH

     

    Overall physical

    .73

     

    .71

     

    .65

     

    .65

     

     

    Stairs

    .51

     

    .60

     

    .57

     

    .41

     

     

    Daily functioning

    .75

     

    .74

     

    .57

     

    .53

     

     

    Resticted physically

    .90

     

    .91

     

    .87

     

    .69

     

     

    Less accomplished physically

    .89

     

    .90

     

    .82

     

    .72

     

     

    Less social physically

    .75

     

    .64

     

    .73

     

    .74

     

     

    Overal mental

     

    .76

     

    .78

     

    .75

     

    .79

     

    Blue

     

    .69

     

    .73

     

    .83

     

    .83

     

    Social funtioning

     

    .53

     

    .64

     

    .53

     

    .74

     

    Restricted mentally

     

    .83

     

    .80

     

    .78

     

    .78

     

    Less accomplished mentally

     

    .87

     

    .78

     

    .73

     

    .76

     

    Less social mentally

     

    .88

     

    .77

     

    .84

     

    .82

     

    N

    148

    125

    651

    467

    Factor correlation

    .55

    .50

    .61

    .79

    Error correlations

    SF-9 and SF-10;  SF-3 and SF-4

    CMIN

    211.581***

    350.767***

    df

    102

    102

    RMSEA

    .089

    .066

    CFI

    .935

    .946

    Note. Standardized parameters reported., ***p < .001.

     

    Convergent Validity

    Convergent validity was assessed using Pearson correlations between the SF-12 scales and predefined concepts, which were assumed to be significantly (p < .001) correlated, with moderate to strong correlations of |.4| and above. There were positive correlations with similar concepts, both in the register and Facebook sample (see Table 9). There was a strong negative correlation with the general health status of refugees (Refugee Health Scale, RHS-SOEP, negatively polarized) both with regard to PH (r = -.58, NRegister = 110, r = -.66, NFacebook = 359), as well as with regard to MH (r = -.80, NRegister = 110, r = -.82, NFacebook = 359). This means that participants who described their physical and mental health as more pronounced in the SF-12-ENSURE also described themselves as healthier in the RHS. There were also correlations in the expected direction with regard to typical measures of MH. There were moderate negative correlations with depression and anxiety (Patient Health Questionnaire, PHQ-4) both with regard to PH (rDepression = -.40, rAnxiety = -.36, NRegister = 132; rDepression = -.42, rAnxiety = -.45, NFacebook = 403 resp. 402), as well as with regard to MH (rDepression = -.75, rAnxiety = -.73, NRegister = 132; rDepression = -.75, rAnxiety = -.73, NFacebook = 402 resp. 402). This means that participants who described their physical and mental health as more pronounced in the SF-12-ENSURE also described themselves as mentally healthier in the PHQ-4. Furthermore, in the register sample there is a weak positive correlation between PH and education (number of years of schooling, r = .21, NRegister = 226). Thus, health-promoting behaviour is usually associated with higher education (Adler & Newman, 2002; Zajacova & Lawrence, 2018), which corresponds to common findings and is also reflected in the results. With regard to MH, there is no correlation (r = .07, NRegister = 226, r = .00, NFacebook = 559).

     

    Divergent Validity

    For divergent validity, correlations between the SF-12 scales and unrelated concepts were anticipated to be non-significant (p > .05), with correlation coefficients below |.3|, indicating weak associations. As expected, we observed very weak or no correlations between PH and MH in the SF-12-ENSURE and different concepts. However, there were weak to moderate correlations towards attitudes to democracy as well as weak correlations to authoritarianism and faith (see Table 9).

     

    Table 9

    Register and Facebook sample: Convergent and Divergent Validities of the SF-12

    Register Sample

     

     

     

    Convergent

     

    Divergent

     

     

    Education

    (years)

    Depression (PHQ-4)

    Anxiety (PHQ4)

    Health (RHS)

    Democracy

    Authoritarianism

    Faith

    Trust

     

    Physical

     

    .21**

    -.40***

    -.36***

    -.58***

    -.12

    -.16*

    -.10

    .00

     

    Mental

     

    .07

    -.75***

    -.73***

    -.80***

    -.28**

    .05

    -.01

    .02

    Facebook Sample

     

     

    Physical

     

    .03

    -.42***

    -.45***

    -.66***

    -.12*

    -.04

    -.03

    .07

     

    Mental

     

    .00

    -.75***

    -.73***

    -.82***

    -.14*

    .06

    .10*

    .11

                         

    Note. PHQ-4 = Patient Health Questionnaire 4, RHS = Refugee Health Scale, pearson’s correlation, *p < .05, **p < .01, ***p < .001, Register Sample N = 104 – 241, Facebook Sample N = 329 – 655

     

    Criterion Validity

    The criterion validity has been established using Pearson correlations between the SF-12 scales and health outcomes, assumed to be significantly (p < .001) correlated, with scores of |.3|  and above. There were moderate to strong correlations with possible external criteria, both in the register sample and in the Facebook sample (see Table 10). There were moderate to strong negative correlations with reported comorbidities with regard to both PH (r = -.37, NRegister = 260, r = -.40, NFacebook = 777) and MH (r = -.31, NRegister = 260, r = -.37, NFacebook = 777). This means that people who described their physical and mental health as more pronounced in the SF-12-ENSURE also reported having fewer illnesses. There were moderate to strong positive correlations with reported general life satisfaction with regard to both PH ( r = .50, NRegister = 150, r = .42, NFacebook = 536) and MH (r = .52, NRegister = 151, r = .52, NFacebook = 534). This means that people who described their physical and mental health as more pronounced in the SF-12-ENSURE also described themselves as more satisfied overall. In the register sample here were weak negative correlations with alcohol misuse with regard to PH (r = -.19, NRegister = 251). This means that people who reported consuming alcohol on a regular basis also described their physical health as slightly worse. Finally, there were moderate to strong negative correlations with loneliness with regard to both PH (r = -.29, NFacebook = 546) and MH (r = -.46, NRegister = 130, r = -.50, NFacebook = 545). This means that people who described their physical and mental health as somewhat better in the SF-12-ENSURE also described themselves as being less lonely.

     

    Table 10

    Register and Facebook sample: Criterion Validities of the SF-12

    Register Sample

     

     

    Alcohol

    Comorbidities

    Life Satisfaction

    Loneliness

     

    Physical

     

    -.19**

    -.37***

    .50***

    -.10

    Mental

     

    -.10

    -.31***

    .52***

    -.46***

    Facebook Sample

     

    Physical

     

    -.04

    -.40***

    .42***

    -.29***

    Mental

     

    .05

    -.37***

    .52***

    -.50***

    Note. pearson’s correlations, *p < .05, **p < .01, ***p < .001, Register Sample N = 129 – 260, Facebook Sample N = 534 – 777

     

    Measurement Equivalence

    Measurement invariance analyses were conducted by means of MG-CFA when using the models obtained by the evaluation of factorial validity (Table 8). A difference was, however, that the introduced correlated error terms as described in the above section were kept equal between the respective groups to ensure equal configuration also for unexplained factors. The factor variance was set at 1 to allow comparisons among the loadings and the factor means were set at 0 to allow for comparison of intercepts and model identification (cf. e.g. Byrne, 2011). The analyses compare Arabic and German for the register sample and Arabic and Dari for the Facebook sample. A significant change of chi-square (Meredith, 1993) or a change of ΔCFI > .010 and ΔRMSEA > .010 indicated significant differences in model fit (Chen, 2007) and thus lack of exact measurement invariance. For the register sample, metric invariance is established according to all goodness of fit (GOF) statistics and scalar measurement invariance is given according to non-significant change of RMSEA and CFI. For Arabic and Dari, however, metric invariance is supported by RMSEA and CFI, whereas the scalar invariance is rejected by change of all GOF statistics. Partial scalar invariance is established in the Facebook sample when freeing the intercepts of three items (overall physical health, social functioning and less social physically). The adapted SF-12 can therefore be used to compare correlations between German and Arabic or between Arabic and Dari. Latent and summarized means are also fully comparable between Arabic and German. Mean comparisons between Arabic and Dari can be carried out based on partial measurement invariance models (cf. Pokropek et al., 2019). For comparisons among the three languages, further validation is required.

     

    Table 11

    Results of measurement invariance analysis in the register sample

    Modell

    χ2(df)

    Δχ2(df)

    RMSEA

    ΔRMSEA

    CFI

    ΔCFI

    configural

    210.79** (104)

    -

    .087

    -

    .937

     

    metric

    228.79** (116)

    18 (12)

    .084

    -.003

    .933

    .004

    scalar

    257.27** (128)

    29.53**(12)

    .086

    .002

    .923

    .010

    Note. *p < .05, **p < .01, ***p < .001; Δχ2 corrected by scale correction factor for MLR.

     

    Table 12

    Results of measurement invariance analysis in the Facebook sample

    model

    χ2(df)

    Δχ2(df)

    RMSEA

    ΔRMSEA

    CFI

    ΔCFI

    configural

    351.27*** (104)

     

    .066

    -

    .946

    -

    metric

    409.58*** (116)

    61.40*** (12)

    .068

    .002

    .936

    .010

    scalar

    642.61*** (128)

    271.56*** (12)

    .085

    .017

    .888

    .058

    partial scalar

    456.85*** (125)

    51.75***(9)

    .069

    .001

    .927

    .009

    Note. *p < .05, **p < .01, ***p < .001; Δχ2 corrected by scale correction factor for MLR.

     

     

    Acknowledgement

    This work was funded by German Science Foundation (DFG) in the scope of the ENSURE project as part of the PH-LENS Research Unit (FOR 2928/GZ: ME 3538/10-1).

     

    Supplementary Material

    S1: 13 Item SOEP-Version of the Short-Form Health Survey

     

    Instruction

    Gesundheit und Krankheit

     

    Items

    No.

    German

    Polarity

    Subscale

    1

     

    Wie würden Sie Ihren gegenwärtigen Gesundheitszustand beschreiben?

     

    -

     

    Physical Health, Mental Health

    2

     

    Wenn Sie Treppen steigen müssen, also mehrere Stockwerke zu Fuß hochgehen: Beeinträchtigt Sie dabei Ihr Gesundheitszustand stark, ein wenig oder gar nicht?

    +

     

    Physical Health

     

    3

     

     

    Und wie ist das mit anderen anstrengenden Tätigkeiten im Alltag, wenn man z.B. etwas Schweres heben muss oder Beweglichkeit braucht: Beeinträchtigt Sie dabei Ihr Gesundheitszustand stark, ein wenig oder gar nicht?

    +

     

    Physical Health

     

    4

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie sich gehetzt oder unter Zeitdruck fühlten?

    +

     

    Mental Health

     

    5

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie sich niedergeschlagen und trübsinnig fühlten?

    +

     

    Mental Health

     

    6

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie sich ruhig und ausgeglichen fühlten?

    -

     

    Mental Health

     

    7

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie jede Menge Energie verspürten?

    -

     

    Mental Health

     

    8

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie starke körperliche Schmerzen hatten?

    +

     

    Physical Health

     

    9

     

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen gesundheitlicher Probleme körperlicher Art in Ihrer Arbeit oder Ihren alltäglichen Beschäftigungen weniger geschafft haben als Sie eigentlich wollten?

    +

     

    Physical Health

     

     

    10

     

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen gesundheitlicher Probleme körperlicher Art in Ihrer Arbeit oder Ihren alltäglichen Beschäftigungen in der Art Ihrer Tätigkeiten eingeschränkt waren?

    +

     

     

    Physical Health

     

     

    11

     

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen seelischer oder emotionaler Probleme in Ihrer Arbeit oder Ihren alltäglichen Beschäftigungen weniger geschafft haben als Sie eigentlich wollten?

    +

     

     

    Mental Health

     

     

    12

     

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen seelischer oder emotionaler Probleme in Ihrer Arbeit oder Ihren alltäglichen Beschäftigungen Ihre Arbeit oder Tätigkeit weniger sorgfältig als sonst gemacht haben?

    +

     

     

    Mental Health

     

     

    13

     

     

    Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen gesundheitlicher oder seelischer Probleme in Ihren sozialen Kontakten, zum Beispiel mit Freunden, Bekannten oder Verwandten, eingeschränkt waren?

    +

     

     

    Physical Health

     

     

     

     

    Response specifications SOEP

    Items

    Response categories SOEP

    1

    1 = Sehr gut

    2 = Gut

    3 = Zufriedenstellend

    4 = Weniger gut

    5 = Schlecht

    -9 nicht beantwortet

    2, 3

    1 = Stark

    2 = Ein wenig

    3 = Gar nicht

    -9 nicht beantwortet

    4-13

    1 = Immer

    2 = Oft

    3 = Manchmal

    4 = Fast nie

    5 = Nie

    -9 nicht beantwortet

    Luise Richter; Dresden University of Technology; Luise.Richter4@tu-dresden.de

    Oliver Brust; Dresden University of Technology; Oliver.Brust@tu-dresden.de

    Natalja Menold; Dresden University of Technology; Natalja.Menold@tu-dresden.de