Designation and psychometric properties of the Short Form Postpartum Quality of Life Questionnaire (SF-PQOL): an application of multidimensional item response theory and genetic algorithm

Background: Utilizing multidimensional item response theory (MIRT) and genetic algorithm (GA) we aimed to design and test the psychometric properties of the short form Postpartum Quality of Life Questionnaire (PQOL). Methods: In this methodological study, 500 women aged 18 to 42 were enrolled through a multistage random sampling scheme in Tabriz, Iran. We used MIRT model and GA to identify a short form of the 40-item PQOL measure (SF-PQOL). Construct and criterion validity of the SF-PQOL was assessed by confirmatory factor analysis (CFA) and the correlation between SFPQOL scores with a 12-item short form of QOL (SF-12) and Edinburg Postnatal Depression Scale (EPDS) scores, respectively. The internal consistency, test-retest reliability and feasibility of the measure were evaluated. Results: sixteen- and 13-item SF-PQOL were identified based on MIRT and GA, respectively.The results indicate the better performance of the MIRT based 13-item SF-PQOL; Construct and criterion validity, the test-retest and internal consistency reliability, and the feasibility were confirmed in the MIRT based SF-PQOL, but not in the GA-based SF-PQOL. Conclusion: The MIRT suggests a 13-item SF-PQOL with adequate content which demonstrated satisfactory validity, reliability, and feasibility. SF-PQOL could be used across the population for both research and clinical objectives.


Introduction
The postpartum complications affect the maternal and neonatal health, profoundly. It directly influences the infant's development, 1-3 postpartum maternal health, including Postpartum Quality of Life (PQOL). So it gained special research attention recently and could be particularly important in health promotion planning. 4 There are few specific measures to assess PQOL in the literature, each has some limitations: Mother-Generated-Index, which is limited by its qualitative and subjective nature and the absence of cognitive skills about quality of life for women in developing countries, 4-7 maternal PQOL, which does not address the productive health rights as well as employment status, time for rest, 8 rural PQOL, which is limited to the only rural women's viewpoints to assess QOL. 9 A PQOL questionnaire was developed based on standard methods that addresses all aspects of quality of life and some aspects of reproductive health. According to the definition by the World Health Organization (WHO), this PQOL measure includes physical, psychological, and social aspects of quality of life in the postpartum period. 10 The translation and psychometric properties of this self-administered measure were evaluated by Nikan et al in Iran. 11 However, this measure has 40 items which is not suitable for clinical practice. In the other words, this measure should be brief enough to be more useful in both research and clinical practice. In addition, the Short Form-PQOL (SF-PQOL) should meet psychometric properties to be a reliable and valid measure, and should comprehensively measure all the content areas covered by the full PQOL. Based on an extensive search in the literature, there is no such a measure.
On the other hand, common utilization of multivariate models in the constructing health research instruments triggered an increased appeal for psychometrically inclusive and short scales. Abbreviating the scales and creating the short forms, saves the management time, while may result in a poor measure, because of changing the internal structure, leading to inferior reliability, lacking discriminate between persons on the ability range, and shrinking in the test-criterion relationships. 12 There are few approaches which have concerned the issues, we have focused on the efficiency of the genetic algorithm (GA) as a new and meta-heuristic approach and multidimensional item response theory (MIRT) item selection methodologies which have been recommended in this regard. Both of them preserve the optimal psychometric properties of the measure when constructing the short form. 12,13 We would compare MIRT and GA to develop the short form of PQOL; The GA may assemble short versions that had the optimal properties, in comparison to the long version. MIRT can be used to appraise the psychometric properties of an existing scale, to ideally shorten the scale, and to appraise the performance of the abridged scale in the context of above-mentioned properties. [13][14][15] The objectives of the study were to (1) utilize the MIRT and GA to construct the SF-PQOL, (2) to compare the MIRT and GA to the best selection between them and construct the final optimal SF-PQOL (3) to evaluate the test-retest and internal consistency reliability of the final SF-PQOL, (4) to evaluate the construct validity of the final SF-PQOL, and (5) to test the feasibility of the final SF-PQOL regarding the floor and ceiling effect. As well, we try to have the same algorithms for scoring to place the SF-PQOL scores in the same range as scored from the full PQOL item pool.

Study participants
This study was methodological in nature and crosssectional considering data collection. Information on study participants and full PQOL was published elsewhere, 11 however, a brief description will be presented here. Participants of this study consist of 500 women aged 18 to 42 which were enrolled into the study through a multistage random sampling scheme from half of the healthcare centers of Tabriz, Iran, during November 2014 -January 2015. To conduct factor analysis for the full form with 40 items and taking into account at least 5 subjects per item, 16 a sample size of 200 was required. However, multiplying by 2, to account for the design effect (=2) in the multistage sampling design of the study, 17 and to conduct required analyses on separate sub-samples (calibration and validation sub-samples), the sample size increased to 500. To address the item response theory analysis, Tsutakawa and Johnson recommend a sample size of about 500 for truthful parameter estimates. 18 However, other studies showed that 200 or fewer observations can be adequate. 19,20 Also, this sample size was sufficient for using GA in our data. 21 The inclusion criteria were: having a singleton, healthy, and term newborn weight over 2500 g, guidance school or higher education, being Iranian and having access to the phone or mobile. In Iran, the second month vaccination of all children is conducted in public health centers/posts, and a list of individuals referred for vaccination and also their phone number is recorded in a specified notebook. We used this information to achieve eligible participants. Among the 42 health centers and 33 health posts in Tabriz, 22 health centers and 15 health posts were randomly selected, respectively. The samples were proportionately selected based on the code of referring to the selected centers/posts for giving the second-month vaccination. In each health center/post, the registered mothers, who were in the 60-67 postpartum days, were selected randomly. The potentially eligible mothers were invited to refer the health centers/posts to participate into the study. After a brief explanation of the objectives and procedure of the study, eligible subjects were asked to complete a set of paper-based questionnaires.

Measures
Concerning optimal properties (to be a useful measure for both research and clinical setting to screen quality of life problems, and to be cost-effective and covering all aspects of quality of life), we choose the full PQOL after permission from the developers. 10 The full PQOL is a 40 item self-administered measure that comprises of 4 dimensions; physical functioning, child care, psychological functioning and social support with 8, 12, 8 and 12 number of items, respectively. Each item is answered with a 5-point Likert scale, to assess the intensity such as: ''(1) Not at all, (2) Slightly, (3) Moderately, (4) Very, (5) Extremely'' , the frequency, such as: ''(1) Never, (2) Rarely, (3) Sometimes, (4) Often, (5) Always'' and evaluation such as: ''(1) Very dissatisfied, (2) Dissatisfied, (3) Neither satisfied nor dissatisfied, (4) Satisfied, (5) Very satisfied'' . The normalized scores were computed which ranged over a 0-100 interval with 0 and 100 indicating the poorest and the best PQOL score respectively. Along with full PQOL, demographic questionnaire, Edinburgh Postnatal Depression Scale (EPDS) 22 and Short Form Health Survey (SF-12) 23 were given to the mothers. The psychometric properties of the PQOL were evaluated by Nikan et al. However, briefly, the PQOL indicate a good internal consistency (Cronbach's alpha ranged over 0.70-0.88), and a good test-retest reliability (intraclass correlation coefficients [ICCs] ranged over 0.87-0.92). The construct validity as assessed by exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), indicate an acceptable fit for a 4-factor solution model. The strong correlation between the PQOL and SF-12 supported the criterion related validity of the measure. Finally, the discrimination ability of the PQOL was confirmed, differentiating the EPDS-based depressed and un-depressed women. 11 Both EPDS and SF-12 measures can discriminate well between known-groups (supporting their discriminant validity) and showed good internal consistency reliability (Cronbach's alpha >0.70).

Statistical analyses
The data were expressed by mean (SD) and by frequency (%) for the numeric and categorical variables, respectively. The normality assumption of the numeric variables was assessed by skewness and kurtosis indices; the values greater than 3 and 10 in skewness and kurtosis, suggest a serious problem in normality, respectively. Above mentioned analyses were performed on entire samples, however, for succeeding analyses, data were randomly divided into the test (n = 250) and the validation (n = 250) samples and will be mentioned where it would be used.

Genetic algorithm
GA, introduced by Holland, relies on the fundamental Darwinian evolution principles of selection, crossover, and mutation. 12 GA was utilized to abbreviate the PQOL measure. We applied the default optimization offered in the R package GAabbreviate. 24 The cost reduction function was: Cost = Ik + 1 -R 2 where I and k denote a fixed item cost and the number of items reserved by the GA in each iteration, respectively. R 2 is the amount of variance explained for by a linear combination of individual item scores for all subscales. By shifting the values of I, we can place higher or less stress on the shortness of the measure compared to its inclusiveness. 25 High and low values of I would lead to a relatively short and a comparatively longer measure, respectively. GA aims to reduce the redundancy within a scale, and so to abbreviate the items that the best capture the scale of interest. 12,26 In the present study, we change I to reach the maximum amount of the R 2 and hence the minimum amount of the cost, which leads to the SF-PQOL with 16 items. We set the GA in binary type, a population size of 100, number of generations of 1000, elitism of 5, crossover probability of 0.8 and mutation probability of 0.1.

Multidimensional item response theory
IRT is a set of latent variable techniques specifically designed to model the interaction between a participant's ability/latent trait, with item level stimuli (difficulty, discrimination, guessing, etc). The Unidimensional IRT, model each item with only one latent trait. While the MIRT, model each item with more than one latent trait, due to its added flexibility. In this paper, considering the 4 latent traits in the questionnaire, 11 the MIRT with graded response models (GRMs) was utilized to model ordinal items. 13,27,28 R package "mirt" was used to fit GRMs. "mirt" is an open-source software, useful for real data analysis and research. "mirt" provides multidimensional estimation techniques. We perform the exploratory MIRT models with Metropolis-Hastings-Robbins-Monro (MHRM), quasi-Monte Carlo EM (QMCEM) estimation method 13 in the test sample. To reach the best number of the items, we used Akaike information criteria (AIC), Bayesian information criteria (BIC), corrected Akaike information criteria (AICc), sample size adjusted Bayesian information criteria (SABIC), which are based on the -2log Likelihood index. Smaller values of the criteria indicate a better fit of model to the data. Additionally, the fit of the model with the best number of items has also been examined and confirmed in the validation sample. The adequacy of the model was assessed by the goodness of fit indices. Reasonable values are: chi-square/df <2, root mean square error of approximation (RMSEA) <0.05, standardized root mean square residual (SRMR) <0.05 and also, comparative fit index (CFI) > 0.95, Tucker-Lewis index (TLI) >0.95. 16,29 Procedure to conduct the MIRT in constructing the short form We used the item fit analyses with the S-χ 2 statistic to identify the SF-PQOL items. This statistic compares observed and expected response frequencies under the used MIRT model, and measures the differences between these frequencies. The significant S-χ 2 statistic shows an item with model deviation so that removing these items, would lead to a model with better fit. 15,30 We consecutively removed the items with the least significance Hochberg adjusted probability in each step. Finally, we reached the model with 13 items. After which there was no significant S-χ 2 statistics.

Preliminary validation of the short form
The validation of the full PQOL in all types including scale translation validity, linguistic edit, content validity, face validity, construct validity, discriminant and criterion validity have comprehensively been assessed and the scale properly been modified. 11 However, to assess the psychometric properties of the SF-PQOL, some procedures are detailed.

Construct validity
To assess the construct validity and to compare the results between full measure and SF-PQOL, we conducted CFA for both measures in the validation sample. The CFA was conducted by weighted least squares estimation method.
The covariance matrix and asymptomatic covariance matrix were considered as the input and weight matrix, respectively. The adequacy of the model was assessed by the goodness of fit indices. Reasonable values are: chisquare/df <2, root mean square error of approximation (RMSEA) <0.05, standardized root mean square residual (SRMR) <0.05 and also, comparative fit index (CFI) >0.95, Tucker-Lewis index (TLI) >0.95. 16,29 Criterion validity The SF-12 has been established as a standard tool to assess the quality of life in the Iranian population. 23,31 Additionally, studies have shown a reverse correlation between quality of life and depression in postpartum women 32 as measured by EPDS. In this study, these measures were used to evaluate the criterion validity of both full and SF-PQOL measures. The Pearson's correlations between the SF-12 total and domains' scores with full and SF-PQOL were tested. Values less than 0.1, between 0.1 and 0.3, between 0.3 and 0.5, and greater than 0.5 indicated non-significance, poor, medium and strong correlations, respectively. 33

Reliability
Internal consistency was assessed by Cronbach's alpha coefficient. 34 Alpha coefficients higher than or equal to 0.60 were considered acceptable. Test-retest reliability was assessed by completing the questionnaire two times within 2 weeks by the same 30 randomly selected women. ICC was computed to test the stability over time. ICCs ≤ 0.4 were considered poor to fair, ICCs: 0.41-0.60 moderate, ICCs: 0.61-0.80 good and ICCs >0.80 excellent. 35

Feasibility
To assess the feasibility of the measures, the percentages of possible minimum and maximum scores were computed as floor and ceiling effects respectively.
All statistical analyses for above mentioned properties were performed using STATA 14. P values less than 0.05 were considered as significant.

Scoring system of SF-PQOL
We tried to have the same algorithms for scoring to place the SF-PQOL scores in the same range as scored in the full PQOL. Therefore we utilized the normalized score based on the following formulae: Normalized score = (raw score -minimum) ÷ (possible range) × 100 Which project the item responses in the range of 0-100. The scores of the total and subscales in the full and SF-PQOL were created based on the average over the related items. The scores obtained by this formula are compatible with the short and full forms.

Results
All data were collected during a 2-month period in 2014-2015 comprising 500 pregnant women. The missing data were imputed using the multiple imputation method. For imputation, the PQOL items were used both as predictor and imputation in an expectation-maximization (EM) algorithm. The missing data comprised less than 5% of all the items.
The mean age of the participants was 28 (SD 5) years, half of them had secondary education. About half (53%) of the women were primiparous, and 372 of them (74%) had a cesarean section ( Table 1).
The item content and percentages of responses for full PQOL is presented in Table 2.

Splitting data to test and validation samples
The descriptive results in Tables 1 and 2 were produced in the whole sample. However, for the succeeding analyses, the data were randomly divided into a test (n = 250) and a validation (n = 250) sample which would be indicated where it would be used.

Results of genetic algorithm
We first fit the models with 4, 8, 12…and 32 items (considering the 4 subscales in the PQOL, we choose 4 and multiplies of 4). A graph of cost versus the number of items, led to a set of 16 items (Figure 1). In this point, the cost reached the minimum value (cost=1.054) compared to the models with 12, 8 and 4 items. The mean convergent correlation in training and validation models were 0.87 and 0.88, respectively.  -Social support: PQOL29, PQOL32, PQOL33, and PQOL34

Results of MIRT
To determine the optimal number of factors, we fit the models with 1, 2 …and 6 factors. A graph of Information criteria (Figure 2) versus the number of items, led to a model with 4 factors. In this point, the information criteria reached the minimum value approximately, compared to the models with 5 and 6 factors. In addition, the theoretical support of 4-factor solution, led us to choose the model with 4 factors.

MIRT-selected items
The item fit analyses were used along with the S-χ 2 statistic to identify a shortened instrument. The significant S-χ 2 statistic shows an item which produces a model deviation according to the Hochberg's adjusted P value. The items with the least adjusted significance probability in each step were consecutively removed. Finally, the model with 13 items was achieved, after which there was no significant S-χ 2 statistic ( Table 3).
Since the results showed a better model fit for MIRT compared to the GA, therefore, we choose the MIRTbased SF-PQOL as our optimal measure and subsequent analyses have been just conducted on this measure.

Preliminary validity, reliability and feasibility of the MIRT based SF-PQOL
The results of the validity and reliability presented in the following sections are obtained in the validation data set and a data set consisting of 30 randomly selected women was used for test-retest reliability.
The CFA results for the SF-PQOL in the validation data set showed an acceptable fit, but the results were suboptimal in the full PQOL ( Table 4).
The results showed a satisfactory internal consistency for subscales (α ranged over 0.68-0.85) and the total score (α > 0.7), and a good stability reliability for subscales (ICC ranged over 0.86-0.88) and the total score (ICC > 0.7).
Negative and significant correlations were observed between the sub-scales' and total scores of PQOL, with the EPDS score in both full and SF-PQOL. Alternatively, positive and significant correlations were observed between the sub-scales' and total scores of PQOL, with the scores of mental and physical subscale and the total score of the SF-12. Additionally, a positive and strong correlation (r = 0.75) was observed between the full and SF-PQOL total scores (Table 6).

Discussion
The results of the study supported the calibration of final version of the SF-PQOL abbreviated from the full PQOL Abbreviation: CFA, confirmatory factor analysis; df, the degrees of freedom; χ 2 /df, normed chi-square; RMSEA, root mean square error of approximation, SRMR; root mean square residual, CFI; comparative fit index, TLI; Tucker-Lewis Index; PQOL, postpartum quality of life. All item scale relationships were statistically significant P < 0.001   item pool utilizing the MIRT and the GA. Construct and criterion validity, test-retest and internal consistency reliability and the feasibility of the SF-PQOL were confirmed by a validation data set. SF-PQOL demonstrated satisfactory validity, reliability, and feasibility. Thus SF-PQOL may be used across the population for both research and clinical objectives.

Rationale behind utilizing the MIRT and the GA
To the best of our knowledge, this is the first study to apply the MIRT and the GA to the PQOL instruments in Iran, evaluating the ability of the questionnaires' items to identify individual latent traits. Globally, this is the first study utilized the MIRT and the GA in constructing a short form of the PQOL. The IRT especially the MIRT as well as the GA were used by many studies to construct the short form of health outcomes. 12,14,[36][37][38][39] Additionally IRT has been recommended for selecting the items that are most informative to develop tailored instruments. 40 The MIRT was very useful and informative in constructing a well-organized short form instrument to be working in both clinical practice and research; wide enough to be research based and brief enough to be practical. Although the SF-PQOL is shorter than most commonly utilized PQOL measures, it has maintained adequate content. The satisfactory results of IRT were reported in other studies. 14,41 Graded response modeling of the IRT was used considering the ordinal nature of the responses which is a recommended modeling. 14,28 Although it is suggested that this methodology may not be tailored for all measures and should be used along with other traditional psychometric methods. 42 Especially there are some methodological issues in applying the IRT methodology such as the underlying assumptions in unidimensional IRT which should be fulfilled to have the valid results. Although many of the issues have been solved in MIRT. 13 Motivation to utilize the SF-PQOL As the advantage of the SF-PQOL is its importance to evaluate the quality of life in postpartum women. The SF-PQOL saves time and costs in this evaluation, because of its short form. One of the important concern about consultation on postpartum and postnatal care is the length of time that patients have to consult with their caregiver. 43 The length of time for consultation may be insufficient for patients to adequately describe their health status. Therefore, one of the potential uses of SF-PQOL in clinical practice may be its ability to evaluate the PQOL in a short time period (3-4 minutes). This is well within the typical range of consultation time in general practice. Additionally, due to its self-administered nature, saving time and cost, it can be utilized in clinical settings by midwives, doctors, and nurses who are involved in postpartum care. It can also be properly used in research settings because of its item content and information coverage.

Constructing SF-PQOL utilizing MIRT
The results of this study were in the line of the previous study: at first, we evaluated the psychometric properties of the full PQOL. 10,11 Then, to identify a short form, the GA and item fit statistics from the MIRT were used, 12,13 the results indicate the optimality of the MIRT. Hence based on the MIRT, the SF-PQOL was constructed containing the 30% of the entire PQOL item pool. The SF-PQOL domains cover the content areas found in the widely used measures of quality of life according to the WHO's definition; including child care, physical, psychological, and social aspects, which have to be addressed. 44

Construct validity of the SF-PQOL
The CFA of SF-PQOL was confirmed based on the goodness of fit indices; this is consistent with related studies. 10,11 Additionally, the CFA was assessed for full PQOL in the validation sample; the results showed superior supporting of CFA in the SF-PQOL as compared to the full PQOL. The construct validity of the SF-PQOL was confirmed in the validation data.

Reliability of the SF-PQOL
The SF-PQOL slightly outperformed the full PQOL in the terms of internal consistency reliability assessed by alpha, however, the ICC values showed a good stability reliability of the SF-PQOL total score and subscales. The results are consistent with other studies in full PQOL. 10,11 Criterion-related validity of the SF-PQOL Negative and significant correlations were observed between the sub-scales' and total scores of the PQOL, with the EPDS score in both SF-PQOL and full PQOL (however the correlation between social support and EPDS in original form was weak). This is in the line with many studies which showed that increased postpartum QOL is associated with a lower EPDS score. [45][46][47] On the other hand, positive and significant correlations were observed between the sub-scales' and total scores of the PQOL with the scores of mental and physical subscales and the total score of the SF-12, (except for the social support domain in original form PQOL). This is consistent with other studies which showed that improved postpartum QOL is correlated to higher SF-12. 11,48 Additionally, a positive and strong correlation (r = 0.75) was observed between the full and the SF-PQOL. The similar amounts of correlation were observed between short and original forms of health outcomes in other studies. 14,41 As expected, high scores on the SF-PQOL along with the SF-12 and its reverse relation with EPDS, was an indication of the satisfactory quality of life, [45][46][47] which finally confirmed the criterion validity of the SF-PQOL. Additionally, relatively the similar amounts of correlations were observed among the scores in the full and the SF-PQOL with other scales. 11

Feasibility of the SF-PQOL
The percentage of floor and ceiling effect were all less than 15% in the PQOL total score, physical functioning, child care, psychological functioning and social support subscales in the SF-PQOL, indicating the feasibility of the SF-PQOL. This is consistent with the previous study on this measure. 11

Strengths and limitations
This is the first study which introduces the short form of a specific measure of QOL in postpartum women utilizing the MIRT and the GA. Our study has several limitations; First, differences in the values and cultural systems among rural and urban areas may limit the generalizability of the results to the country's rural areas. Second, all subjects were Tabriz residents, the fifth largest city in Iran with another language (Azeri), therefore, studies on the reliability and validity of the SF-PQOL in other parts of Iran and in other sub-groups of women (with different languages and cultures) are recommended.

Conclusion
The evidence presented from this study suggested that we successfully achieved our goals of developing a brief measure of the quality of life; a 13-item short form and a relatively precise measure of the PQOL with good content coverage. Also, the study confirmed the psychometric properties of the SF-PQOL in the Iranian women. Utilizing this measure can solve the obstacles in evaluating the postpartum women's quality of life in both clinical and research settings. In addition, it is recommended to be used by those involved in postpartum care, such as midwives in health centers. It appears that the SF-PQOL can facilitate the postpartum care and help evaluate the women's quality of life and identify potential problems in this important period.

Ethical approval
First, the protocol of the study was approved by the institutional review board of Tabriz University of Medical Sciences (ethical approval code: IR.TBZMED.REC.1395.290; date: 12, June 2016). In addition, participants were informed of the research procedure, comprehensive information on the objectives; they also were informed about confidentiality, privacy, the right to end their participation and benefits. A signed informed consent form was obtained from all participants before data collection. All questionnaires were anonymous, and files that included participants' contact were shredded after all data were collected. Only the research-related personnel could access and use the data. The study was conducted according to the world medical association Declaration of Helsinki.