Microsoft word - 7.4_eva elvers- andreas persson_strategy at scb to test data collection instruments

A strategy at Statistics Sweden to test data collection instruments
Cognitive methodologist, Process Department, Statistics Sweden, SE-701 89 Örebro, Process owner Design and plan & Build and test, Process Department, Statistics Sweden, Box 24 300, SE-104 51 Stockholm, Sweden, [email protected] There are many methods to test questionnaires and other data collection instruments (for example, expert reviews, cognitive interviews, and experiments). The methods have different strengths and weaknesses, and the costs also vary. All surveys cannot be tested with all methods; there has to be a balance with regard to survey importance, consequences of data errors, available resources, and costs. Statistics Sweden (SCB) has developed a strategy how to test different surveys, taking the abovementioned factors into account. This strategy is based on survey characteristics mainly taken from a database with a set of classifications for many surveys. Three characteristics have been chosen: official statistics (yes or no), importance for society (two categories), and the importance of correctness (three categories). For other surveys similar characteristics have been decided. Based on how a survey is classified in the chosen characteristics, the strategy proposes different levels of testing. These levels vary in both ambition and the methods included. SCB introduced this strategy in the autumn 2010, and since then many surveys have successively been taken 1. Methods to test data collection instruments
The questionnaire plays a fundamental role in the production of survey statistics. The questionnaire is the tool via which the data is collected from the respondents. Flaws in the questionnaire can lead to negative consequences in many different areas. As such, it is important to test the questionnaire in An important question is, then, how the questionnaire should be evaluated. Survey methodology and practice have established a number of different methods for this purpose [4]. At Statistics Sweden, we regularly use expert reviews, quick reviews (an initial screening of potential problems in a questionnaire), cognitive interviews with probing, vignettes and think-aloud protocols [5], and debriefings with interviewers or data editors. In specific cases, we also use monitoring, behavior coding [3], and quantitative data, with or without experimental design, to evaluate individual questions or the questionnaire as a whole. These methods differ in many ways. For example, some require collected quantitative data, substantial resources and developed hypotheses, whereas others do not. An important question is, then, in what way these methods best should be applied. That is, given a certain set of circumstances, which method or combination of methods is optimal? There are a few studies in the literature examining how different evaluation methods compare or overlap [2], [6]. In general, however, the literature lacks research comparing evaluation methods. Moreover, research also show that different practices within methods can lead to different results [1]. Hence, the results from the few conducted studies comparing evaluations methods might be difficult to generalize since many of the evaluation methods (for example, cognitive interviews or interviewer debriefings) lack standardized practices. Thus, research that compare different evaluation methods is lacking. Which method(s) that should be applied in a given situation is therefore still a rather open question. 2. Questionnaire testing at a statistical agency
In practice, at a statistical agency, further factors than methodological usually play a role in the choice of evaluation methods. Time and financial resources often put restraints on the evaluation of the questionnaire. All surveys have fixed budgets and even though questionnaire evaluation is important, it is still only one of many important factors in good survey design, all competing within the same survey budget. In practice at a statistical agency, one has to consider costs, benefits, and the big picture – the total design of the survey. As such, evaluating the questionnaire is important (see above) but perhaps not always or indiscriminately. The total design, the expected resulting quality, and the budget of the survey have to be considered. Different surveys have questionnaires that differ in status. One important difference, among many, is whether the questionnaire is new (i.e. draft status) or established and already evaluated. Questionnaire evaluation does obviously not fill the same purpose in these two situations. Such factors should influence in what way, and perhaps even whether, the questionnaire should be evaluated at all, given the perspective of total design and set budgets. Statistics Sweden has developed a set of conditions that stipulates whether a survey’s questionnaire should be evaluated before data collection or not. The situations where testing is required are the following: – The questionnaire has been changed, for example by adding new questions or a new data – The context has changed in ways that could influence the measurement (for example, changes in society or in words’ meaning). – There are indications of problems with the questionnaire (based on process data, logs, – The questionnaire has not been tested previously according to the routines of Statistics – None of the above conditions are valid, but the client wants to evaluate the questionnaire or testing is necessary due to policy reasons (for example if the questionnaire involves a These conditions decide whether the questionnaire should be evaluated. Given that a questionnaire should be tested, however, how extensive should the testing be and which method(s) should be used? As shown above, the answer should be different for different surveys, depending on, among other things, the status of the questionnaire and characteristics of the survey. In addition, there is no clear answer from a methodological standpoint. Moreover, resources and total design have to be considered. As such, how extensive the testing should be and which method(s) to apply appear to require some individual investigation for each survey, based on the specific conditions at hand. Unfortunately, such survey-specific investigations do not correspond very well to the production of statistics at a statistical agency where hundreds of surveys are conducted every year in a steady stream, where in-house communication can be challenging, where there is a conflict of resources, where the measurement error has not always been of highest priority, and where both the survey manager and the cognitive lab have to plan and allocate resources for testing well in advance. In contrast, ideally there would be an explicit strategy which proposes different evaluations for different surveys and, in that way, facilitates the testing process. To overcome the problems outlined above, it seems that such a strategy cannot be survey-specific but must operate on a more general level. That is, the strategy must discriminate between different surveys’ needs concerning resources and methodological issues but not to the extent that it becomes too complicated or too complex to communicate and apply in the regular production. Such a strategy should help both the survey managers och the cognitive lab in planning for questionnaire testing. Although such a general strategy undeniably would mean standardization, with the accompanying disadvantages of not acknowledging uniqueness, it should just as well promote that questionnaire testing becomes a part of each survey’s plans and not a last-minute resort. Thus, an explicit strategy for questionnaire testing should facilitate questionnaire testing at a statistical agency and, thus, improve the 3. The development of the test strategy
The test strategy for data collection instruments should, hence, take survey characteristics into account to discriminate between surveys, but without being too complex. Concerning resources, the strategy has to be rational and not, for example, propose major testing for a survey of minor importance. Another question is, then, how to determine survey importance. One perspective is that of risk. A risk consists of the factors likelihood and consequences. The likelihood is difficult to estimate in advance (especially with new questionnaires). The consequences, however, can be better forecasted. Flaws in a questionnaire influence for example the respondents and their responses, the data collection and the editing, and, in the end, the quality of the statistical output. The consequences of flaws in the questionnaire depend on the uses of the statistics. Are important decisions based on the statistics? Are the statistics widely used? Hence, flaws in the questionnaire have different consequences for different surveys. Great consequences should therefore merit more extensive testing to identify and re-design potential problems in the questionnaire. Such reasoning is, of course, not unique for questionnaires but applicable also for other tests, for example tests of IT-systems. Here such reasoning with consequences was used to determine survey importance and, thus, the level of testing for different surveys. There is a database at Statistics Sweden describing surveys through many, mainly administrative, variables. Its information is used, for instance, in systems for publishing statistics, metadata, and economic administration. The database covers official statistics from Statistics Sweden and all other responsible agencies and further regular statistics from Statistics Sweden. The use has broadened over time and increased considerably lately for various management and evaluation issues. In this case three characteristics were chosen to discriminate between surveys’ needs for testing: The Official Statistics Act states that official statistics are statistics for public information, planning and research purposes in specified areas produced by appointed public authorities in accordance with the provisions issued by the Government. Official statistics shall be objective and made available, The survey has this characteristic if its output is considered important for one or more government agencies in case of a critical situation for society or during times of alert. The assignment of category for the consequences of incorrect information shall consider errors in decisions, reduced confidence, and costs due to breaches of contract (and in general). The categories mean (1) no or little harm, (2) harm, and (3) serious harm due to the incorrectness. These three characteristics are all relevant when considering amount of testing for different surveys. Together these characteristics give twelve possible categories for surveys. A few more characteristics in the database were considered. Especially one of them was meaningful but was not added, since its classification was strongly correlated with those already obtained. The strategy is based on these three characteristics. How a survey is classified in them determines the level of testing. The assigned level represents a minimum. The survey manager and the management team of the survey can decide to test on a higher level. The main goal with the strategy is not to capture survey uniqueness in terms of questionnaire testing (for example, that an interviewer debriefing would suit a specific survey particularly well) but to establish a baseline, based on the aforementioned characteristics. Three levels were considered appropriate. They are called B, C, and D with increasing ambition. – There is another categorization with other characteristics (such as the size of the survey and the sensitivity of the survey topic) and with a further level A. It is used for surveys not featured in the database. Many of these surveys are financed by fees. The survey, or sometimes just the data collection, is then commissioned by a The numbers of surveys at different levels were studied in order to get a balance between testing and resources. Only surveys with questionnaires were included (not surveys based on administrative data or secondary statistics). The number of surveys at different levels is based on an investigation in 2010 and are shown in Table 1. There are nearly 90 surveys in all. There are relatively few surveys on the highest level (D). Two categories dominate in number, one in each of levels B and C. Table 1. The twelve survey categories, the corresponding level of testing, and numbers 2010. Official
Number of
for society
of testing
surveys in 2010
There is one more important feature of the strategy: whether the survey is new or not – otherwise expressed whether there is prior information available that can be used for test purposes or not. This influences the possible set of testing methods (some methods require prior information, see above). There are four testing levels A–D and two possibilities depending on whether there is prior information available or not. Hence, there are eight combinations in all, as shown by Figure 1 Figure 1. Testing combinations and the testing methods used in each case There is no previous collection to analyze (N) (new or changed questionnaire or a changed context) Testing level A
Testing level B
Testing level C
Testing level D
Combination AN
Combination BN
Combination CN
Combination DN
(indications of errors or an untested questionnaire) Testing level A
Testing level B
Testing level C
Testing level D
Combination AT
Combination BT
Combination CT
Combination DT
The upper part of the figure shows the situation when there is no prior information available. This is primarily relevant for new surveys or new questionnaires. In such cases, methods that require already collected data (for example, monitoring or debriefings) cannot be applied unless a pilot study is conducted (see combination DN). The choice of method(s) is fairly straight-forward. On the highest level, D, the survey manager and the management team have to make an appropriate choice together with the cognitive lab, based on the perspective of survey needs and total design. When there are previous data, there are more possible methods, as the lower part of the figure shows. On the higher levels an appropriate choice is made from the list in the figure. Statistics Sweden aims to get certified according to the international standard ISO 20252:2007 for market, opinion and social research. The requirement of the standard on pre-testing of questionnaires is fulfilled by all four testing levels. 4. Successive implementation of the test strategy
The test strategy has been developed over a long time period. The first phase was a joint development project for IT, cognitive methods, and statistical methods on testing. It was then decided to move forward for data collection instruments. A group of four persons worked in this second phase towards a strategy. Many co-workers served as critical and constructive readers from different points of view in both phases. When there was a preliminary version for the test strategy, a more formal way was used to communicate within Statistics Sweden. A referral was sent to the departments most affected, i.e. the data collection and the subject matter departments, and also to the group working on implementation of the ISO 20252 standard. The feedback from the referral was very good with many constructive comments. Two referral rounds were used. This procedure improved the result, and it also meant that the future users both got an understanding of the strategy A formal decision was taken by the Director General in October 2010. This decision stated the principles for the testing (i.e. the strategy) and also an implementation plan. Even if some surveys were already tested, at least to some extent but perhaps not to their assigned level in the strategy, the new strategy implied increased testing of surveys overall. Everything could, of course, not be achieved at once. An ordering and a pace were decided for the next few years. The order of the surveys was again related to risks. Surveys on a high level were put early in the plan, but their testing may be implemented successively over a few years. A survey on level D, for example, can start on level B and then proceed successively to C and D. New surveys and redesigned surveys all follow the strategy since the beginning of 2011. Plans for the next calendar year are made in the autumn. This is a suitable time to plan for implementation in surveys that have not yet been tested on the appropriate level. Most repeated surveys have a management team with some different roles. The methodologist of the team has been given the task to point out the strategy, if need be. The testing is done by cognitive experts. 5. Conclusions
The strategy has several strengths. Since it is based on given classifications, it makes it possible to plan for testing well in advance. Another strength is that the strategy is rational – limited resources are used where they are best needed. Moreover, the classifications are not new for this test strategy but taken from a database that many already are familiar with. The strategy therefore includes relatively simple principles, which both cognitive and other staff can grasp and follow. In addition, the many co-workers and departments involved in the developmental work assure that the strategy has taken many perspectives into account and fits the big picture. The strategy also has limitations. Even though it is based on ideas concerning how to best mix test methods (to combine qualitative and quantitative data, or empirical methods with those including primarily individual judgment) the proposed combinations of methods might not be optimal for specific surveys. The strategy is a somewhat crude way of capturing survey’s needs of particular test methods. However, the main goal was not to present optimal testing from a methodological view for every survey but to facilitate and establish a baseline for testing in general. There is some flexibility in the strategy (to adjust according to a specific survey’s needs) but simplicity was regarded as highly important in the initial implementation. Other factors that have contributed to the successful implementation are: – standardization overall in statistics production; – an increased understanding for the danger of measurement errors and the importance of well – the ISO 20252:2007 standard with pre-testing of questionnaires; – an understanding and positive reception of the test strategy. Eight months after the formal decision on the test strategy around 80 surveys had been tested. About half of these were quick reviews (level A), and the other tests were on higher levels. Some of the tested surveys had not yet reached their minimum level but been tested on a lower level. There were some reasons to wait, e.g. a redesign of the survey in the near future. This strategy has got quite a bit of attention at Statistics Sweden for two major reasons. Questionnaires are tested in a structured and well motivated way that makes planning possible. The approach has been appreciated as such and can be used also in other areas. 6. References
[1] Conrad, F.G. and Blair, J. (2009). Sources of Error in Cognitive Interviews. Public Opinion [2] DeMaio, T.J. and Landreth, A. (2004). Do different cognitive interview techniques produce different results? In Methods for testing and evaluating survey questions. Presser, S., Couper, M., Lessler, J.T., Martin, E., Martin, J., Rothgeb, J.M., and Singer, E. (eds.). Wiley. NJ. [3] Fowler, F.J., 2001. Coding the behavior of interviewers and respondents to evaluate survey questions. In Questions Evaluation Methods. Madans, J., Miller, K., Maitland, A., and Willis, G. [4] Presser, S., Couper, M., Lessler, J.T., Martin, E., Martin, J., Rothgeb, J.M., and Singer, E. (2004). Methods for testing and evaluating survey questions. Public Opinion Quarterly, Vol. 68, [5] Willis, G. (2005). Cognitive Interviewing: A tool for improving the questionnaire. Thousand [6] Willis, G.B., Schechter, S., and Whitaker, K. (1999). A comparison of cognitive interviewing, expert review, and behavior coding. What do they tell us? Proceedings of the Section on Survey Research methods, American Statistical Association, 28–37.


Norte de salud mental 1

Juan Medrano. Médico psiquiatraPablo Malo. Médico psiquiatraJosé J Uriarte. Médico psiquiatramilisegundos, y se sabe que por encima de los500 milisegundos existe un alto riesgo de A Novartis introdujo cambios impor- arritmia ventricular y muerte súbita. El meca- tantes en la fich a técnica del Mellarilnismo por el que la tioridazina ejerce esta(Meleril), relacionados con la creciente


PAMIR Conference on Fundamental and Applied MHD Magnetism and magnetic particles in biology MAGNETIC PARTICLES FOR APPLICATION IN BIOMEDICINE M. Timko, M. Konerack´a, N. Tomaˇsoviˇcov´a, P. Kopˇcansk´y, V. Z´aviˇsov´a Institute of Experimental Physics SAS, 47 Watsonova, 043 53 Koˇsice, Slovakia Introduction. Magnetic drug delivery by particulate carriers is a very ef-

Copyright © 2009-2018 Drugs Today