ZIS - Adapted German, Arabic and Dari Short-Form

Arbeit & Beruf

Arbeitsbelastungen & Stress

Adapted German, Arabic and Dari Short-Form Health Survey (SF-12-ENSURE)

Rollenkonflikt

Soziale Stressoren am Arbeitsplatz

Work-Family Conflict Scale (ISSP)

Arbeitsmerkmale

Arbeitsmotivation & -einstellungen

Arbeitszufriedenheit

Beurteilungen & Bewertungen

Führung

Bildung

Digitalisierung

Gesellschaft & Soziales

Gesundheit

Individuum & Persönlichkeit

Politik

Religion & Kultur

Umwelt

Adapted German, Arabic and Dari Short-Form Health Survey (SF-12-ENSURE)

Author: Richter, L., Brust, O., & Menold, N.

In ZIS Since: 2024

DOI: https://doi.org/10.6102/zis345

Downloads

Actions

For use for other purposes contact the authors

Summary:

Abstract:

The Short-Form Health Survey (SF-12) assesses physical health-related (PH) and mental health (MH)-related well-being. Its 13-item Version, used in the socioeconomic panel (SOEP) 2016 (see S1), underwent adaptation, cognitive pretesting, revision, and validation for German, Arabic, and Dari utilizing two samples: a probability individual register sample (for German and Arabic) and a convenience sample recruited via Facebook (for Arabic and Dari). The survey targets the German-speaking population and individuals with a migration background or refugees proficient in Arabic or Dari aged 18 years or older. The two-factor structure of the revised SF-12, including a PH and MH dimension, could be supported by Multi Group Confirmatory Factor Analyses (MG-CFAs). Validity and reliability analyses support its use within each language group and in heterogeneous samples. Measurement invariance analyses showed metric and scalar invariance in the register sample. In the Facebook sample, metric invariance and partial scalar invariance are supported.

Language Documentation: English

Language Items: German, Arabic, Dari

Number of Items: 12

Survey Mode: Self-administered online survey

Reliability: PH: Raykov’s Rho = .76 to .88; MH: Raykov’s Rho = .87 to .90

Validity: Evidence for factorial, construct, and criterion validity

Construct: Physical and mental health-related well-being

Catchwords: well-being, health related quality of life, mental health, physical health

Item(s) used in Representative Survey: no

Url Website:

pretesting report: https://pretest.gesis.org/pretestProjekt/University-of-Dresden-Translation-of-established-public-health-measurement-instruments-into-Arabic-and-Dari-%28ENSURE%29 %28English-Version%29 (Hadler, P., Nießen, D., Lenzner, T., Steins, P., Quint, F., & Neuert, C. (2021): Translation of established public health measurement instruments into Arabic and Dari (ENSURE) (English Version). Cognitive pretest. GESIS Project Reports. Version: 1.0. GESIS - Pretestlab. Text.)

URL Data Archive: data available upon request

Scale development:

Instrument

Instruction

No instruction is used for items 1 to 4. The instruction for items 5-12 is as follows: 'Bitte denken Sie nun an die letzten 4 Wochen. Wie oft kam es hier vor, dass Sie…' (English translation, for comprehension only: 'Please think back to the last 4 weeks. How often did it happen here that you...'). (For German items see Table 1).

Items

Table 1

Items of the German-Language adaption of the SF-12 Version, used in the SOEP 2016

No.	Label	German	English translation (for comprehension only)	Polarity	Subscale
1	Overall physical	Wie ist ihr physischer, also Ihr körperlicher Gesundheitszustand im Allgemeinen? Würden Sie sagen er ist ...	How is your physical state of health in general? Would you say it is ...	+	Physical Health
2	Overall mental	Wie ist ihr psychischer, also ihr seelischer oder emotionaler Gesundheitszustand im Allgemeinen? Würden Sie sagen er ist...	How is your psychological, i.e. your mental or emotional state of health in general? Would you say it is...	+	Mental Health
3	Stairs	Wenn Sie innerhalb der letzten 4 Wochen Treppen gestiegen, also bereits wenige Stockwerke zu Fuß hochgegangen sind, in welchem Ausmaß waren Sie hierbei durch Ihren Gesundheitszustand beeinträchtigt? Wenn Sie innerhalb der letzten 4 Wochen keine Treppen ge-stiegen sind, kreuzen Sie bitte „trifft nicht zu“ an.	If you have climbed stairs in the last 4 weeks, i.e. walked up a few floors, to what extent were you affected by your state of health? If you have not climbed any stairs in the last 4 weeks, please check "not applicable".	-	Physical Health
4	Daily functioning	Und wie war das mit anderen anstrengenden Tätigkeiten im Alltag, wenn Sie z.B. etwas Schweres heben mussten oder Beweglichkeit brauchten? In welchem Ausmaß waren Sie hierbei innerhalb der letzten 4 Wochen durch Ihren Gesundheitszustand beeinträchtigt?	And what about other strenuous activities in everyday life, e.g. when you had to lift some-thing heavy or needed to be mobile? To what extent have you been affected by your state of health in the last 4 weeks?	-	Physical Health
Bitte denken Sie nun an die letzten 4 Wochen. Wie oft kam es hier vor, dass Sie…			Please think back to the last 4 weeks. How often did it happen here that you...
5	Blue	sich trübsinnig fühlten?	felt melancholy?	-	Mental Health
6	Social functioning	sich zurückziehen wollten?	wanted to withdraw?	-	Mental Health
7	Restricted physically	wegen physischen, also körperlichen gesundheitlichen Problemen, bei Ihrer Arbeit oder in Ihrem Alltag eingeschränkt waren?	were restricted at work or in your everyday life, due to health problems of a physical nature?	-	Physical Health
8	Less accomplished physically	wegen physischen, also körperlichen gesundheitlichen Problemen, bei Ihren privaten oder beruflichen Tätigkeiten weniger geschafft haben als Sie wollten?	have achieved less than you wanted to in your private or professional activities, due to health problems of a physical nature?	-	Physical Health
9	Restricted mentally	wegen psychischen, also seeli-schen oder emotionalen Problemen, bei ihren privaten oder beruflichen Tätigkeiten eingeschränkt waren?	were restricted in your private or professional activities, due to psychological, i.e. mental or emotional problems?	-	Mental Health
10	Less accomplished mentally	wegen psychischen, also seeli-schen oder emotionalen Problemen, bei Ihrer Arbeit oder in Ihren alltäglichen Beschäftigungen weniger geschafft haben als Sie wollten?	achieved less than you wanted to at work or in your everyday activities, due to psychological, i.e. mental or emotional problems?	-	Mental Health
11	Less social physically	wegen physischen, also körperlichen gesundheitlichen Problemen, weniger soziale Kontakte, zum Beispiel zu Freunden, Bekannten oder Verwandten hatten?	had fewer social contacts, for example with friends, acquaintances or relatives, due to health problems of a physical nature?	-	Physical Health
12	Less social mentally	wegen psychischen, also seeli-schen oder emotionalen Problemen, weniger soziale Kontakte, zum Beispiel zu Freunden, Bekannten oder Verwandten hatten?	had fewer social contacts, for example with friends, acquaintances or relatives, due to psychological, i.e. mental or emotional problems?	-	Mental Health

Note. Item 7 to 12: Colloquial language is used; Researchers can test the use of “aufgrund von…” instead of “wegen” to avoid the colloquial use.

Response specifications

Table 2

Response specifications

Items	German	English (for comprehension only)
1-2	1 = sehr schlecht 2 = eher schlecht 3 = teils-teils 4 = eher gut 5 = sehr gut -9 = nicht beantwortet	1 = very bad 2 = rather bad 3 = partly bad/partly good 4 = rather good 5 = very good -9 = not answered
3-4	1 = gar nicht 2 = wenig 3 = mittelmäßig 4 = ziemlich 5 = sehr -9 = nicht beantwortet -1 = trifft nicht zu	1 = not at all 2 = a little bit 3 = moderately 4 = almost fully 5 = fully -9 = not answered -1 = not applicable
5-12	1 = nie 2 = selten 3 = manchmal 4 = oft 5 = immer -9 = nicht beantwortet	1 = never 2 = seldom 3 = sometimes 4 = often 5 = all the time -9 = not answered

Scoring

The instrument covers two dimensions of well-being: physical health-related (PH) and mental health (MH)-related well-being. These dimensions are drawn from theory and supported by factor analysis. Unweighted mean scores can be calculated for PH and MH subscales. In addition, factor scores can be used. When building subscale scores, it is necessary to reverse the polarity of the item values for the negatively-poled items 3 to 12. Pairwise exclusion of missing values is recommended. Lower scores are associated with poorer PH / MH while higher scores indicate better PH / MH.

Application field

The target group of the survey instrument are adults aged 18 years or older who speak German, Arabic, or Dari. The use is restricted to the web-based self-administered mode, as the instrument was validated in this mode. The instrument can be used for research purposes only. Considering the results of the reliability and validity analysis, the instrument can be used in each language group separately and also in the heterogeneous language samples. Comparability is given between Arabic and German languages as well as between Arabic and Dari. Further research and analysis are needed to make simultaneous comparisons between the three languages.

Theory

The Short Form Health Survey SF-12 questionnaire (Ware et al., 1996) is a short version of the SF-36 (Morfeld & Bullinger, 2008; Ware et al., 1993) used to assess the impact of health on an individual’s everyday life. Thanks to its brevity, this quality of life measure is appropriate for use in survey research. Like its predecessor the SF-36, it provides composite scores for both PH and MH across eight domains of health: PH-related scales comprise 1) General Health; 2) Physical Functioning; 3) Role Physical and 4) Body Pain. MH-related scales include 5) Vitality; 6) Social Functioning; 7) Role Emotional and 8) Mental Health.

In recent decades, there has been a marked shift in prevalent health problems from acute to chronic conditions in industrialized nations, influenced by medical advances and demographic changes (Robert Koch Institute, 2015). Traditionally, evaluating the impact of diseases on a population has relied on mortality data. However, with increasing life expectancy and advances in medical treatments, the need to assess the severity of illness, both physical and mental, has become increasingly apparent. Such assessments need to encompass various aspects of health and their consequences for the well-being of individuals (Bullinger & Quitmann, 2014). Therefore, health-related quality of life measures have become increasingly established in both research and clinical practice, playing a crucial role in assessing the overall well-being of individuals and providing insights for informed decision-making in healthcare settings (Wirtz & Bengel, 2011). While a consistent definition of the concept is lacking, it commonly considers physical, psychological, social, family, and work-related aspects (Otto & Ravens-Sieberer, 2020).

The SF-12 is one of the most widely utilized tools for evaluating health-related quality of life, in both German-speaking contexts and internationally (Morfeld & Bullinger, 2008). The reliability of the SF-12 has been evaluated in the U.S., UK., and China for general populations (Fong et al., 2010; Ware et al., 1996) as well as for groups with mental illness (Huo et al., 2018). Test-retest reliability scores have been reported as consistently higher for PH (.61 (for clinical samples) to .89) than for MH (.57 (for clinical samples) to .77). With respect to validity, SF-12 form was found to strongly correspond to the longer SF-36 form, to correlate with other measures of PH and MH to the expected degree (Ware et al., 1996) and to differentiate between groups with and without a corresponding health diagnosis (Fong et al., 2010). Some previous studies have indicated potential factorial structure problems as it varied between different language and ethnic groups (Fleishman & Lawrence, 2003; Fong et al., 2010). A version in German and in languages of migrant groups in Germany has been used in the SOEP (Andersen et al., 2007). The factorial structure of the SF-12 is not supported in the SEOP refugee panel (Tibubos & Kröger, 2020).

Development

Cognitive pretests

The German, Arabic, and Farsi versions of the SOEP (Andersen et al., 2007) were adapted. The Farsi version was adapted to Dari using the TRAPD approach (Translation, Review, Adjudication, Pretest, Documentation) with two professional translators. In the next step Cognitive Interviews (CI) were conducted with 18 refugees (Syrian N = 6, Iraqi N = 6, Afghan N = 6) of different gender, age, and education (Hadler et al., 2021). The pretest was conducted through remote interviews using either video or telephone conferences. In addition to the interviewees, interpreters also took part in the conferences in order to take cultural and linguistic particularities into account. In the case of several items, interviewees expressed confusion regarding whether certain items referred to PH, MH, or were intended as indicators of general health. Further challenges arose due to the lack of clarity regarding temporal references, as interviewees found it difficult to distinguish between their current feelings and those experienced over the previous four weeks. Refugees also tended to understand “all-day activities” as referring primarily to activities relating to their professional work. Respondents also interpreted the items regarding emotional feelings differently and discussed cultural and gender differences in their expression. Confusions were also associated with the varying number of response categories, with three categories used for some indicators and five for others. Respondents also had difficulties with double barreled questions (DBQs), i.e. questions that include enumerations or more than one stimulus to evaluate in one indicator (Menold, 2020).

Item generation and selection

Revisions of the instruments were provided to address these problems. Both the participants' feedback and the interpreters' assessments were taken into account. Additional indicators were developed to balance PH and MH. DBQs were split into different indicators. The number of response options was unified and five categories were used for all items. Redundancies were omitted and a clearer visual layout was used. Finally, the findings from the cognitive pretest were used to make adjustments to the introduction, to unify and highlight the temporal reference, and to provide alternatives to emotional terms.

In response to feedback from interviewees and upon splitting DBQs, various alternatives for item formulation were generated and incorporated into the tested form, resulting in a total of 18 items. Items were selected to find the most suitable six items per subscale (PH or MH), applying the principle of a clear separation between PH and MH. In addition, indicators that exhibited high differences in thresholds among the three groups were excluded. The general well-being item was excluded from the instrument, because it cannot be uniquely assigned to PH or MH.

Samples

Register sample

A probabilistic sample of three large cities in Saxony (Dresden, Leipzig, Chemnitz) was implemented. The target population included German-speaking residents and Arabic-speaking Syrian and Iraq residents (refugees). The addresses of sampled persons, together with information on gender, age, and country of origin, were provided by municipal registration offices. We only required the selection of addresses for refugees who had been resident in the city since 2014. In total, contact details for 1,000 German speakers, 2,000 Arabic speakers and 1,000 Dari speakers were provided. We used mail contact and re-contacts to recruit the participants to a web-based survey. Response rates RR6 (AAPOR, 2023) were 28.42 % for German-speaking residents, 19% for Arabic-speaking refuges and 16% for Dari-speaking refugees. The sample was randomized, with one group of respondents answering the SF-12 questionnaire version prior to the cognitive pretest, and the other group answering the questionnaire version after the cognitive pretest. For the SF-12 version administered after the cognitive pretest, relevant to this report, the sample size was N = 147 for Arabic speakers and N = 125 for German speakers

Facebook sample

The Facebook survey was hosted on the servers of TU Dresden and targeted people from Iraq, Syria and Afghanistan living in Germany. An external service provider was commissioned to advertise the web-based survey. In addition, Bielefeld University supported advertising, but was unable to collect socio-demographic information. To ensure that participants were refugees who spoke these languages, we screened participants to limit them to refugees living in Germany only. The questionnaire was available in Arabic and Dari only. A random part of them administered the adapted instrument as developed after the cognitive pretests. For the SF-12 version administered after the cognitive pretest, relevant to this report, the sample size was N = 779 for Arabic speakers and N = 503 for Dari speakers. See Tables 3 and 4 for a description of both samples.

Table 3

Descriptives Register sample

	German		Arabic
	Statistics	N	Statistics	N
Gender (%men)	44.8	125	66.7	147
Age (Mean, SD)	45.7 (19.0)	125	33.8 (8.9)	147
Education
%School completed	99.1	111	91.6	119
%University degree	49.1	108	30.6	121
Years of Education (Mean, SD)	11.3 (1.9)		10.8 (4.0)
Country of origin / nationality	n.a.	n.a.		116
Syria (%)			65.3
Iraq (%)			34.7
In Germany since 2014 (%)	n.a.		95.4	116

Note. N refers to the total sample size of respondents for German/Arabic with available demographic data for the specific question. Deviations in N due to Data Missing.

Table 4

Descriptives Facebook sample

	Arabic		Dari
	Statistics	N	Statistics	N
Gender (%men)	67.4	144	78.6	56
Age (Mean, SD)	34.0 (9.0)	144	30.4 (8.7)	56
Education
%School completed	94.5	493	78.3	226
%University degree	50.0	482	45.4	240
Years of Education (Mean, SD)	11.3 (3.8)	476	11.1 (3.3)	225
Country of origin / nationality(%)		144		56
Syria	64.6	93	0.0	0
Iraq	35.4	51	0.0	0
Afghanistan	0.0	0	100	56
Germany	0.0	0	0.0	0
Mother tongue(%)
Arabic	100	144	0	0
Dari	0	0	100	56
Farsi	0	0	0	0
In Germany since 2013	91.3	461	91.2	251

Note. N refers to the total sample size of respondents for Arabic/Dari with available demographic data for the specific question. Deviations in N due to Data Missing.

Item analyses

Table and 6 show the distribution parameters of the register and Facebook sample for each item of the instrument.

Table 5

	N	Mean	SD	Skewness (SE)	Excess Kurtosis (SE)	Item-scale correlation (corrected)
Item 1	269	3.74	0.89	-0.85 (0.15)	1.16 (0.30)	.73
Item 2	270	3.50	1.07	-0.46 (0.15)	-0.29 (0.30)	.70
Item 3	233	3.91	1.23	-1.05 (0.16)	0.10 (0.32)	.60
Item 4	264	3.88	1.17	-0.87 (0.15)	-0.15 (0.30)	.77
Item 5	261	3.34	1.18	-0.23 (0.15)	-0.81 (0.30)	.70
Item 6	260	3.22	1.20	-0.05 (0.15)	-0.86 (0.30)	.55
Item 7	258	3.95	1.18	-0.98 (0.15)	-0.01 (0.30)	.82
Item 8	257	3.91	1.14	-0.82 (0.15)	-0.24 (0.30)	.81
Item 9	259	3.88	1.16	-0.77 (0.15)	-0.29 (0.30)	.79
Item 10	260	3.79	1.16	-0.62 (0.15)	-0.57 (0.30)	.80
Item 11	256	4.27	1.06	-1.42 (0.15)	1.22 (0.30)	.69
Item 12	256	4.02	1.15	-1.03 (0.15)	0.19 (0.30)	.78

Note. Note that polarity of item values was reversed for negatively poled items 3-12.

Table 6

Facebook sample: N, Means, Standard Deviations, Skewness, Excess Kurtosis, and Discriminatory Power of Items

	N	Mean	SD	Skewness (SE)	Excess Kurtosis (SE)	Item-scale correlation (corrected)
Item 1	1082	3.63	1.00	-0.62 (0.07)	0.13 (0.15)	.57
Item 2	1101	3.10	1.19	-0.18 (0.07)	-0.90 (0.15)	.70
Item 3	910	3.41	1.41	-0.39 (0.08)	-1.16 (0.16)	.61
Item 4	1032	3.65	1.28	-0.64 (0.08)	-0.67 (0.15)	.73
Item 5	924	2.89	1.28	0.13 (0.08)	-0.99 (0.16)	.78
Item 6	925	2.82	1.22	0.25 (0.08)	-0.74 (0.16)	.56
Item 7	918	3.74	1.20	-0.59 (0.08)	-0.67 (0.16)	.67
Item 8	911	3.58	1.21	-0.44 (0.08)	-0.78 (0.16)	.61
Item 9	908	3.41	1.34	-0.28 (0.08)	-1.14 (0.16)	.76
Item 10	911	3.24	1.32	-0.10 (0.08)	-1.12 (0.16)	.72
Item 11	914	3.83	1.22	-0.75 (0.08)	-0.50 (0.16)	.57
Item 12	917	3.34	1.34	-0.21 (0.08)	-1.17 (0.16)	.78

Note. Note that polarity of item values was reversed for negatively poled items 3-12.

Quality criteria

Reliability

To assess reliability, a composite general reliability coefficient (Raykov's Rho) based on factor analysis was calculated from the two-factor models reported in the “Factorial Validity” section. In addition, we calculated single general Rho based on the two factors (Raykov, 2012; see table 7). Correlated error terms were consistently included in the denominator as a part of the measurement error (cf. Raykov, 2012).

Table 7

Sample	Scale	Language / Group	Composite Reliability (Raykov's Rho (SE))
Register sample	General	Arabic	.84 (.03)
		German	.80 (.02)
		Overall Sample	.83 (.02)
	Physical health	Arabic	.88 (.02)
		German	.88 (.02)
		Overall Sample	.88 (.02)
	Mental health	Arabic	.88 (.02)
		German	.87 (.03)
		Overall Sample	.87 (.02)
Facebook sample	General	Arabic	.85 (.01)
		Dari	.90 (.01)
		Overall Sample	.86 (.01)
	Physical health	Arabic	.83 (.01)
		Dari	.76 (.02)
		Overall Sample	.80 (.01)
	Mental health	Arabic	.87 (.01)
		Dari	.89 (.01)
		Overall Sample	.88 (.01)

Reliability analyses indicate high composite reliability scores for the PH scale in the probability sample and middle scores in the Facebook sample. Reliability of MH is high in both samples and all language groups. The two-factor based Rho (General Reliability) is reasonably good in the register and excellent in the Facebook sample. Overall, composite reliability of the adapted SF-12-ENSURE was found to be sufficient to excellent.

Validity

Factorial Validity

Factorial validity was evaluated by means of MG-CFAs for each sample separately (see Table 8). We used Mplus 8.7 (Muthén & Muthén,1998-2021) for these analyses. The model fit in the MG-CFAs was evaluated using the chi-square test, the Root-Mean-Square Error of Approximation (RMSEA), and the Comparative Fit Index (CFI) (Beauducel & Wittmann, 2005). The CFI of .95 or higher and RMSEA of.08 or less indicate an acceptable fit (Hu & Bentler, 1999). The Robust Maximum Likelihood estimator (MLR) was used due to the ordinal nature and non-normality of the data (Muthén & Muthén, 1998-2021). Due to five categories used we treated indicators as continuous (cf. e.g. Beauducel & Herzberg, 2006). The two factors could be supported as a just acceptable or acceptable model fit was reached in both samples. In both samples, two correlated error terms were included as suggested by high modification indices. The first correlated error was introduced between the items “restricted mentally” and “less accomplished mentally”. As the items are similar in content and wording and address limitations in activities due to the emotional problems, the error term can be justified. The second correlated error term was indicated between the items “stairs” and “hinder”, which seem to be more similar to each other with respect to physical limitations than other items of the PH factor. Both correlations are also reported for the SOEP samples (cf. Tibubos & Kröger, 2020) and are confirmed here, although the number of correlated error terms is reduced from four to two. The factor loadings were high for all items of the two factors in both samples. The factor correlation are of moderate size for German and Arabic languages, but high for Dari (Facebook sample).

Table 8

	Register Sample				Facebook Sample
	Arabic		German		Arabic		Dari
Item	PH	MH	PH	MH	PH	MH	PH	MH
Overall physical	.73		.71		.65		.65
Stairs	.51		.60		.57		.41
Daily functioning	.75		.74		.57		.53
Resticted physically	.90		.91		.87		.69
Less accomplished physically	.89		.90		.82		.72
Less social physically	.75		.64		.73		.74
Overal mental		.76		.78		.75		.79
Blue		.69		.73		.83		.83
Social funtioning		.53		.64		.53		.74
Restricted mentally		.83		.80		.78		.78
Less accomplished mentally		.87		.78		.73		.76
Less social mentally		.88		.77		.84		.82
N	148		125		651		467
Factor correlation	.55		.50		.61		.79
Error correlations	SF-9 and SF-10; SF-3 and SF-4
CMIN	211.581***				350.767***
df	102				102
RMSEA	.089				.066
CFI	.935				.946

Note. Standardized parameters reported., ***p < .001.

Convergent Validity

Convergent validity was assessed using Pearson correlations between the SF-12 scales and predefined concepts, which were assumed to be significantly (p < .001) correlated, with moderate to strong correlations of |.4| and above. There were positive correlations with similar concepts, both in the register and Facebook sample (see Table 9). There was a strong negative correlation with the general health status of refugees (Refugee Health Scale, RHS-SOEP, negatively polarized) both with regard to PH (r = -.58, N_Register = 110, r = -.66, N_Facebook = 359), as well as with regard to MH (r = -.80, N_Register = 110, r = -.82, N_Facebook = 359). This means that participants who described their physical and mental health as more pronounced in the SF-12-ENSURE also described themselves as healthier in the RHS. There were also correlations in the expected direction with regard to typical measures of MH. There were moderate negative correlations with depression and anxiety (Patient Health Questionnaire, PHQ-4) both with regard to PH (r_Depression = -.40, r_Anxiety = -.36, N_Register = 132; r_Depression = -.42, r_Anxiety = -.45, N_Facebook = 403 resp. 402), as well as with regard to MH (r_Depression = -.75, r_Anxiety = -.73, N_Register = 132; r_Depression = -.75, r_Anxiety = -.73, N_Facebook = 402 resp. 402). This means that participants who described their physical and mental health as more pronounced in the SF-12-ENSURE also described themselves as mentally healthier in the PHQ-4. Furthermore, in the register sample there is a weak positive correlation between PH and education (number of years of schooling, r = .21, N_Register = 226). Thus, health-promoting behaviour is usually associated with higher education (Adler & Newman, 2002; Zajacova & Lawrence, 2018), which corresponds to common findings and is also reflected in the results. With regard to MH, there is no correlation (r = .07, N_Register = 226, r = .00, N_Facebook = 559).

Divergent Validity

For divergent validity, correlations between the SF-12 scales and unrelated concepts were anticipated to be non-significant (p > .05), with correlation coefficients below |.3|, indicating weak associations. As expected, we observed very weak or no correlations between PH and MH in the SF-12-ENSURE and different concepts. However, there were weak to moderate correlations towards attitudes to democracy as well as weak correlations to authoritarianism and faith (see Table 9).

Table 9

Register Sample
		Convergent				Divergent
		Education (years)	Depression (PHQ-4)	Anxiety (PHQ4)	Health (RHS)	Democracy	Authoritarianism	Faith	Trust
Physical	.21**		-.40***	-.36***	-.58***	-.12	-.16*	-.10	.00
Mental	.07		-.75***	-.73***	-.80***	-.28**	.05	-.01	.02
Facebook Sample
Physical		.03	-.42***	-.45***	-.66***	-.12*	-.04	-.03	.07
Mental		.00	-.75***	-.73***	-.82***	-.14*	.06	.10*	.11

Note. PHQ-4 = Patient Health Questionnaire 4, RHS = Refugee Health Scale, pearson’s correlation, *p < .05, **p < .01, ***p < .001, Register Sample N = 104 – 241, Facebook Sample N = 329 – 655

Criterion Validity

The criterion validity has been established using Pearson correlations between the SF-12 scales and health outcomes, assumed to be significantly (p < .001) correlated, with scores of |.3| and above. There were moderate to strong correlations with possible external criteria, both in the register sample and in the Facebook sample (see Table 10). There were moderate to strong negative correlations with reported comorbidities with regard to both PH (r = -.37, N_Register = 260, r = -.40, N_Facebook = 777) and MH (r = -.31, N_Register = 260, r = -.37, N_Facebook = 777). This means that people who described their physical and mental health as more pronounced in the SF-12-ENSURE also reported having fewer illnesses. There were moderate to strong positive correlations with reported general life satisfaction with regard to both PH ( r = .50, N_Register = 150, r = .42, N_Facebook = 536) and MH (r = .52, N_Register = 151, r = .52, N_Facebook = 534). This means that people who described their physical and mental health as more pronounced in the SF-12-ENSURE also described themselves as more satisfied overall. In the register sample here were weak negative correlations with alcohol misuse with regard to PH (r = -.19, N_Register = 251). This means that people who reported consuming alcohol on a regular basis also described their physical health as slightly worse. Finally, there were moderate to strong negative correlations with loneliness with regard to both PH (r = -.29, N_Facebook = 546) and MH (r = -.46, N_Register = 130, r = -.50, N_Facebook = 545). This means that people who described their physical and mental health as somewhat better in the SF-12-ENSURE also described themselves as being less lonely.

Table 10

Register Sample
	Alcohol	Comorbidities	Life Satisfaction	Loneliness
Physical	-.19**	-.37***	.50***	-.10
Mental	-.10	-.31***	.52***	-.46***
Facebook Sample
Physical	-.04	-.40***	.42***	-.29***
Mental	.05	-.37***	.52***	-.50***

Note. pearson’s correlations, *p < .05, **p < .01, ***p < .001, Register Sample N = 129 – 260, Facebook Sample N = 534 – 777

Measurement Equivalence

Measurement invariance analyses were conducted by means of MG-CFA when using the models obtained by the evaluation of factorial validity (Table 8). A difference was, however, that the introduced correlated error terms as described in the above section were kept equal between the respective groups to ensure equal configuration also for unexplained factors. The factor variance was set at 1 to allow comparisons among the loadings and the factor means were set at 0 to allow for comparison of intercepts and model identification (cf. e.g. Byrne, 2011). The analyses compare Arabic and German for the register sample and Arabic and Dari for the Facebook sample. A significant change of chi-square (Meredith, 1993) or a change of ΔCFI > .010 and ΔRMSEA > .010 indicated significant differences in model fit (Chen, 2007) and thus lack of exact measurement invariance. For the register sample, metric invariance is established according to all goodness of fit (GOF) statistics and scalar measurement invariance is given according to non-significant change of RMSEA and CFI. For Arabic and Dari, however, metric invariance is supported by RMSEA and CFI, whereas the scalar invariance is rejected by change of all GOF statistics. Partial scalar invariance is established in the Facebook sample when freeing the intercepts of three items (overall physical health, social functioning and less social physically). The adapted SF-12 can therefore be used to compare correlations between German and Arabic or between Arabic and Dari. Latent and summarized means are also fully comparable between Arabic and German. Mean comparisons between Arabic and Dari can be carried out based on partial measurement invariance models (cf. Pokropek et al., 2019). For comparisons among the three languages, further validation is required.

Table 11

Results of measurement invariance analysis in the register sample

Modell	χ²(df)	Δχ²(df)	RMSEA	ΔRMSEA	CFI	ΔCFI
configural	210.79** (104)	-	.087	-	.937
metric	228.79** (116)	18 (12)	.084	-.003	.933	.004
scalar	257.27** (128)	29.53**(12)	.086	.002	.923	.010

Note. *p < .05, **p < .01, ***p < .001; Δχ² corrected by scale correction factor for MLR.

Table 12

Results of measurement invariance analysis in the Facebook sample

model	χ²(df)	Δχ²(df)	RMSEA	ΔRMSEA	CFI	ΔCFI
configural	351.27*** (104)		.066	-	.946	-
metric	409.58*** (116)	61.40*** (12)	.068	.002	.936	.010
scalar	642.61*** (128)	271.56*** (12)	.085	.017	.888	.058
partial scalar	456.85*** (125)	51.75***(9)	.069	.001	.927	.009

Note. *p < .05, **p < .01, ***p < .001; Δχ²corrected by scale correction factor for MLR.

Acknowledgement

This work was funded by German Science Foundation (DFG) in the scope of the ENSURE project as part of the PH-LENS Research Unit (FOR 2928/GZ: ME 3538/10-1).

Supplementary Material

S1: 13 Item SOEP-Version of the Short-Form Health Survey

Instruction

Gesundheit und Krankheit

Items

No.

German

Polarity

Subscale

Wie würden Sie Ihren gegenwärtigen Gesundheitszustand beschreiben?

Physical Health, Mental Health

Wenn Sie Treppen steigen müssen, also mehrere Stockwerke zu Fuß hochgehen: Beeinträchtigt Sie dabei Ihr Gesundheitszustand stark, ein wenig oder gar nicht?

Physical Health

Und wie ist das mit anderen anstrengenden Tätigkeiten im Alltag, wenn man z.B. etwas Schweres heben muss oder Beweglichkeit braucht: Beeinträchtigt Sie dabei Ihr Gesundheitszustand stark, ein wenig oder gar nicht?

Physical Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie sich gehetzt oder unter Zeitdruck fühlten?

Mental Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie sich niedergeschlagen und trübsinnig fühlten?

Mental Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie sich ruhig und ausgeglichen fühlten?

Mental Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie jede Menge Energie verspürten?

Mental Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie starke körperliche Schmerzen hatten?

Physical Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen gesundheitlicher Probleme körperlicher Art in Ihrer Arbeit oder Ihren alltäglichen Beschäftigungen weniger geschafft haben als Sie eigentlich wollten?

Physical Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen gesundheitlicher Probleme körperlicher Art in Ihrer Arbeit oder Ihren alltäglichen Beschäftigungen in der Art Ihrer Tätigkeiten eingeschränkt waren?

Physical Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen seelischer oder emotionaler Probleme in Ihrer Arbeit oder Ihren alltäglichen Beschäftigungen weniger geschafft haben als Sie eigentlich wollten?

Mental Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen seelischer oder emotionaler Probleme in Ihrer Arbeit oder Ihren alltäglichen Beschäftigungen Ihre Arbeit oder Tätigkeit weniger sorgfältig als sonst gemacht haben?

Mental Health

Wie oft kam es in den letzten vier Wochen vor, dass Sie wegen gesundheitlicher oder seelischer Probleme in Ihren sozialen Kontakten, zum Beispiel mit Freunden, Bekannten oder Verwandten, eingeschränkt waren?

Physical Health

Response specifications SOEP

Items	Response categories SOEP
1	1 = Sehr gut 2 = Gut 3 = Zufriedenstellend 4 = Weniger gut 5 = Schlecht -9 nicht beantwortet
2, 3	1 = Stark 2 = Ein wenig 3 = Gar nicht -9 nicht beantwortet
4-13	1 = Immer 2 = Oft 3 = Manchmal 4 = Fast nie 5 = Nie -9 nicht beantwortet

Contact

Richter, L., Brust, O., & Menold, N.

Related Publications