3 Scaling success is the percent of tests out of all possible tests in wh icli an item'3 correlation with its hypothesized scale is > 2 SEs h ghef than its correlation wilh other scales.

Relative validity compares individual scales with regard to their performance on validity tests. Relative validity in our tests was based on a ratio of F-statistics generated by multiple regression. The dependent variable in each test was self-reported work productivity, and the independent (predictor) variable was a specific scale. Relative validity was the ratio of F for the scale being compared to the F of the best performing scale in the comparison. Relative validity ratios range from a score of 0 (minimum) to a score of 1 (the maximum, which represents the best performing scale in the comparison). In this comparison, we assessed the relative validity of the WLQ vs. the SF-36 role limitation scales, self-reported absences and self-reported job effectiveness for predicting self-reported work productivity loss. In separate regression models, the WLQ Output Demand scale had the best relative validity.

In a further validity test, we evaluated the WLQ's ability to detect differences between chronic condition groups. In the sample of 121 chronically ill patients, the WLQ detected several important differences; WLQ scale scores varied significantly by condition. Additionally, within each scale, the pattern of limitation was logically consistent with the characteristics of the different conditions. For example, headache syndrome involves sleep disturbance, fatigue, and extreme pain, which disrupt activities. The headache group was the more limited than either the rheumatoid arthritis group (p=0.02) or the epilepsy group (pD0.001) on the Time Demands scale. Headaches also involve visual and neurological disturbances, depressed affect, and irritability. Compared to either of the other two groups, the headache group was most limited on the Mental-Interpersonal Demands scale (in either comparison, pD0.01). On the Physical Demands scale, patients with rheumatoid arthritis were more limited than those with headache (pD0001) or epilepsy (p=0.03).

A sample of employed chronically ill workers and job-matched healthy coworkers (n=65) participated in a test of recall error. It compared retrospectively-reported WLQ scores to weekly diary reports of work limitations. Diaries were considered to be the gold standard of accuracy, requiring minimal recall.

Each subject completed the weekly diaries for four consecutive weeks. In addition, half of the sample completed a 2-week version of the WLQ at the end of study weeks 2 and 4. Half completed a

4-week version of the WLQ at the end of study week 4. Subjects were randomized to one or the other WLQ group.

Because this test involved repeated administrations of work limitations questions, half of the WLQ item pool was included in the diaries (asking about work in the past week) and another half were embedded in the questionnaires. The content on the two forms was equivalent.

Multiple regression models tested the impact of memory on each WLQ scale. In separate models, each subject's data (questionnaire and diary data ) was regressed on indicator variables measuring "subject" and a "method" variable. According to the intraclass correlation coefficient, recall error was not significant. Intraclass coefficients ranged from .76 to .90, thus, surpassing the recommended minimum .70 level.

Criterion Validity

The relationship of WLQ scores to objectively-measured work productivity was tested in a sample of approximately 919 employees of a large New England firm that monitors employee productivity electronically. The WLQ was administered (on-site) during three months to non-absent employees assigned to eligible jobs (n= 2,185). Survey responses were compared to work productivity data obtained for the same period. The sample consisted of workers in sedentary and manual jobs. Sedentary Models. In 16 multiple regression models adjusted for correlated errors (4 scales x 4 productivity criteria), WLQ scores predicted work productivity in 9 models. The Output Demands scale predicted productivity, which is conceptually-related to productivity, in all 4 of its tests. Manual Models. In 4 adjusted models (4 scales x 1 productivity criterion), scores on the Time Management and Output Demands Scales explained a significant portion of the variation in productivity. WLQ Index. Results from the prior regressions confirmed that the Output Demands scale was a valid indicator of work productivity. To further explore the construct validity of the three other scales (Time Management, Physical and Mental/Interpersonal), we sought to identify the best combination of items for predicting productivity. In stepwise regressions, 8 items drawn from the scales were highly predictive of productivity in either sedentary or manual jobs.

The criterion validity of the WLQ was further tested in a study of short-term disability insurance claimants with low back pain. This sample included 167 claimants from firms throughout the nation, who had filed claims within four weeks of baseline. They were followed by survey for 12 weeks. In this study, the WLQ Physical Demands scale scores, obtained longitudinally, predicted duration of disability. The higher the WLQ Physical Demands scale score (WLQphydem), the greater the number of days spent on disability and not working (Duration).

Considerations for Users.

The WLQ is available royalty-free for non-commercial applications.

Commercial applications are charged a user's fee. If you would like to obtain a copy of the WLQ, please contact us and ask for a user form. Complete and return the form and we will send you the WLQ. We ask that users consider sharing WLQ data they have collected so that we may continue to test the instrument and build a data base for users.

The user form can be obtained by contacting Dr. Debra Lerner's office at: 617-636-8636, The Health Institute, New England Medical Center Box 345, 750 Washington Street, Boston, MA 02111. Email: [email protected].

