Surveys provide evidence on practice, attitudes, and knowledge. However, conducting good survey research is harder than it looks. The authors aim to provide guidance to both researchers and readers in conducting and interpreting survey research. Like all research, surveys should have clear research question(s) using the smallest possible number of high-quality, essential, survey questions (items) that will interest the target population. Both researchers and readers should put themselves in the position of the respondents. The survey questions should provide reproducible results (reliable), measure what they are supposed to measure (valid), and take less than 10 min to answer. Good survey research reports provide results with valid and reliable answers to the research question with an adequate response rate (at least 40%) and adequate precision (margin of error ideally 5% or less). Possible biases among those who did not respond (nonresponders) must be carefully analyzed and discussed. Quantitative results can be combined with qualitative results in mixed-methods research to provide greater insight.
“It’s always further than it looks. It’s always taller than it looks. And it’s always harder than it looks.”
—The Three Rules of Mountaineering
Surveys remain the foundation of social science research but can be employed in almost any discipline, including medical research. However, good survey research is harder than it looks. Anesthesiology researchers use surveys to research behaviors, attitudes, and knowledge of both physicians and patients or determine population characteristics, such as disease states, practices, or outcomes. Examples of surveys include transfusion practices among American Society of Anesthesiologists (ASA) members,1 use of ultrasonography for regional anesthesia,2 and parental understanding of informed consent for research.3 However, many journals are reticent about publishing survey research because of poor quality.4–6 Some organizations, such as the Australian and New Zealand College of Anaesthetists and the Society for Pediatric Anesthesia, have introduced formal vetting processes to improve the quality of survey research and decrease respondent fatigue and burden.7,8
Several survey research errors (biases that divert from the truth) were seen in what is widely regarded as the greatest survey disaster: the Literary Digest survey of 10 million Americans that incorrectly predicted that Roosevelt would lose the 1936 Presidential election in a landslide when, in fact, the absolute opposite occurred.9 Problems (errors) with this survey included an unrepresentative sample (affluent Americans with phones), a low response rate (20%), and nonresponder bias (Roosevelt voters tended not to respond). As we will discuss, all these errors can be avoided or at least minimized. As Dillman et al.10 note, the entire survey process (from design to reporting) needs to be tailored to the question asked, which in turn is the first step: ask a clear question.
In the absence of international consensus guidelines on conducting and reporting survey research, we aim to discuss the elements of good survey research, outline some of the pitfalls, and introduce some newer approaches to collecting and analyzing survey data. We provide pragmatic toolboxes for survey researchers (table 1) and survey report readers (table 2) and suggest minimum standards for submitting a survey (table 3).
The primary aim of any survey is to answer a good research question that is interesting for the broader target population.4–6,10,11 A good, clear survey has further interrelated advantages: shorter, simpler items that decrease the time to complete and enhance the response rate. Further, effective surveys focus exclusively on “need to know” questions, not those that might be simply “nice to know.”5 The aims of any survey should also be clearly stated in concrete terms, e.g., “To describe the current practice patterns of Nordic anesthesia departments in anesthetic management of endovascular therapy in acute ischemic stroke.”12
The choice of survey design will depend on the questions being asked, the population of interest, and available resources.5,10 Each type of survey has advantages and disadvantages (table 4). The questions (items) in a survey should reflect the objectives of the study.4,6,13 Whereas some surveys are designed to simply measure knowledge, others measure constructs, practices, or behaviors. Thus, researchers should consider the research goals when writing and formatting the questionnaire (instrument). In general, surveys should be short, relevant, focused, interesting, easy to read, and complete. Surveys that lack these attributes often suffer from poor response rates and decreased reliability.10,14
When designing a survey, it is important to know your audience. Researchers and readers should put themselves in the position of the intended respondents. How might they react to being approached and how might they respond to the questions asked? Motivated participants, for example, may be more willing to answer more detailed or probing questions. Questions should be written by using simple language4 at a reading level commensurate with the literacy of the intended audience. In the United States and most developed countries, surveys of the general population should be written at no more than an eighth-grade reading level and avoid abbreviations, jargon, colloquialisms, acronyms, or unfamiliar technical terms. Further, language and cultural differences may also be important considerations, including tendencies to want to please, or conversely avoid, perceived authority figures such as doctors. Surveys for professionals, such as physicians, can have more complex technical words, but simple and clear structure and wording help everyone.
Questions (items) validated in previous research should be used whenever possible.4,15 For new or revised questions, Peterson16 developed a guide with the acronym BRUSO: brief, relevant, unambiguous, specific, and objective.4 First, questions should be brief to reduce the length of the survey. Questions should include complete sentences but not be long-winded. Questions should also be relevant to the survey’s purpose and focus on “need to know” information. Questions that may not appear intuitively relevant but are deemed necessary require a brief explanation about why the questions are important. Questions must be unambiguous. For example, asking respondents how often they check social media on a “typical work day” may mean different things to different people, i.e., what is “typical?” Questions that evoke a double-negative require logical thinking and are often answered incorrectly. Questions should also be specific so that the respondent is clear as to their intent; questions should be unidimensional. For example, “Do you consider yourself an empathetic and sympathetic person?” could evoke different responses because one can be sympathetic without being empathetic. This example would be better split into two questions addressing sympathy and empathy separately. Unless a primary focus of the study, demographic questions should be placed at the end of the survey and kept to a minimum. Objective questions should not contain words that “nudge” the answer or reveal the researchers’ beliefs or opinions.
The choice of questions and response options (scales) depends on the type and goals of the survey.5,10,11 Interviews and certain types of written surveys are better served by open-ended questions with responses that can be electronically recorded or manually transcribed. Online and postal surveys typically employ closed-ended questions in which the respondent chooses a response from a structured list of options. Both open and closed responses have advantages and disadvantages. Open-ended responses allow the respondents to answer in their own words in a manner that reflects their personal experiences or beliefs and are less likely to be influenced by the expectations of the investigator.10,17–19 Open-ended questions are particularly helpful when the researchers are unclear how respondents might respond and for developing new response options for closed-ended questions. One example is “Under what circumstances would you cancel anesthesia for the child with an upper respiratory tract infection?” The major disadvantages of open-ended questions are that responses can be long, difficult to transcribe, and difficult to classify and may need experts to identify underlying themes. Further, surveys with a lot of open-ended questions may have incomplete or missing answers because of response fatigue.
Closed-ended (structured) questions differ from open-ended by providing a list of options to choose from.5,10,20 Closed-ended questions are optimal for postal and online surveys because they provide standardized responses, take less time to complete, and are easier to analyze. The major disadvantage of closed-ended questions is that they can be more difficult to write5,10 because the response options must be both exhaustive (include all important options) and mutually exclusive (each option should be distinct). Including every possible option can result in excessively long lists of responses that increase survey fatigue and nonresponse. One strategy to limit the number of responses while avoiding missing important data is to include an “other” response with a clarifying “please describe/specify.” Further, for all surveys, a final open question of “Any further comments?” allows respondents to freely comment on both the topic and the survey itself.21
Including too many questions can result in satisficing, where respondents increasingly fail to carefully consider the questions and subsequently provide answers that are not well thought out.10,19 Survey Monkey (http://surveymonkey.com) report from their data22 that respondents will spend an average of 5 min to answer 10 questions in an online survey but only 10 min to answer 25 questions. This suggests that as the number of questions increases, the time spent on each question decreases, i.e., satisficing. Further, if a survey takes 10 min to complete, data show that up to 20% of respondents will abandon the survey before completing it.22 Respondents may also be more likely to abandon surveys with compulsory questions, particularly if they do not include a “Don’t know” type option. Compulsory questions should be minimized and, if used, should always include a “Don’t know,” “Not applicable,” or “Don’t wish to answer” option.
Response scales (fig. 1)5 are typically categorical/nominal (e.g., male/female, true/false); ordinal, in which the responses are ordered (e.g., very anxious to very calm); or numerical (e.g., age, height). Categorical and ordinal response options (table 5) typically take the form of Likert scales23 with different levels of response, for example, “I preoxygenate patients before general anesthesia” could be answered by using the following list formatted vertically:
□ Strongly disagree
□ Strongly agree
When formatting these scales, the endpoints should be mirror opposites, be balanced, be presented from negative to positive, include equal intervals, and be presented as a vertical rather than horizontal list. Vertical formatting is less subject to mistakes when responding and easier to code.
Depending on the degree of precision required, questions should offer three to seven responses, with five probably optimal. Some survey researchers omit a “neutral” response option to force respondents one way or another or because the researchers argue that a neutral option discourages respondents from answering. Others argue that a neutral response provides a natural choice. Decisions regarding the number of response options and inclusion/exclusion of a neutral response should be made during pilot pretesting. Pilot testing with and without a neutral response option can provide a sense of whether responses tend to cluster around a middle point.
Other survey formats include visual analog scales (e.g., visual analog pain scales24 ) that ask respondents to either circle or electronically mark a number (typically 0 to 10) or a 100-mm scale to indicate their level of response. Again, like pain scales, there should be descriptive anchors at each end of the scale to provide context. Other types of scales include ranking scales, where the respondent ranks a set of ideas or preferences; matrix scales, where the respondent evaluates one or more row items using the same set of column choices; magnitude estimation scales; and factorial questions, in which a vignette is presented that requires a judgement or decision-making response.
In addition to consideration of the types of questions and response scales, it is also important to consider how questions transition from one to another. Skip or branch logic is a feature that routes the participant to subsequent questions or page/sections based on their response to a particular question. This is an important process that allows participants to avoid questions that do not apply to them, e.g., “If you replied ‘No’ to question 3, please skip to question 8.” The routes used in skip logic should be thoroughly pretested before implementation. For readers, the easiest way to test the flow of questions is to imagine answering the survey.
Reliability and Validity
Not all surveys require formal reliability and validity testing, e.g., simple descriptive surveys (table 6). However, for surveys that are designed to describe or measure constructs, e.g., pain, sleep quality, altruism, empathy, it is critical to ensure that the items in the survey or instrument actually measure what they are designed to measure. All survey measures, whether quantitative or qualitative, are subject to error.25 These errors can either be due to random chance and/or errors in the survey itself: measurement error.10 Measurement errors reflect the accuracy of the survey, i.e., do the questions measure what they are supposed to measure (validity), and are they reproducible across individuals over time (reliability)? The validity and reliability of questions can be quantified statistically, often by strength of association with other metrics.26
As with all research, the first step in survey research is to review the literature for existing surveys or survey questions that have already been formally tested. It makes no sense to generate a new set of questions as substitutes for ones that have already been validated. Therefore, it is preferable to use or adapt existing questions or surveys that have demonstrated validity, with appropriate acknowledgment or citation. Although some reliability/validity testing may still be required, the burden of formal testing of new questions is greatly reduced.
For developing de novo survey questionnaires (instruments), Sullivan26 provides sage advice: “Researchers who create novel assessment instruments need to state the development process, reliability measures, pilot results, and any other information that may lend credibility to the use of homegrown instruments. Transparency enhances credibility.” Readers’ should look for these points in novel surveys and whether the validity of previously reported items/surveys has been demonstrated.
Reliability is the degree a measurement yields the same results over repeated trials or under different circumstances.25–27 Test–retest reliability reflects the stability of the survey instrument and can be measured by having the same group of respondents complete the identical survey at two points in time. Surveys with good test–retest reliability typically have little variance between the two sets of data. Interrater reliability refers to how two or more respondents respond to the same questions and intraobserver reliability refers to the stability of responses over time in the same individual.
Similar question wording or order is subject to “practice” effects that can be overcome by rewording a question or reordering the responses. Questions with similar responses regardless of wording or order are said to have good alternate-form reliability.
Because not all traits or behaviors are observable or can be measured by a single question, researchers often use several questions to describe the same behavior or trait of interest (constructs). Internal consistency reliability is the degree to which these questions vary together as a group, i.e., the degree to which these different questions consistently measure the same construct. For example, because depression is hard to measure by using a single question (Are you depressed?), researchers employ several different questions that address different but related aspects of depression, e.g., fatigue, trouble concentrating.
Validity measures the degree to which questions in a survey measure what they are intended to measure.25–27 For example, questions designed to measure pain should measure pain and not something else, such as anxiety. Although some validity metrics are relatively easy to measure, some are more complex. Two types of validity that are easy to measure are face and content validity. Face validity refers to how the questions appear (on “face value”) to individuals with little expertise in the survey topic. Although face validity is a somewhat casual assessment, it nonetheless reassures the investigator that the questions will make sense at a layperson’s level. Content validity, on the other hand, requires input from content experts. Neither face nor content validity is statistically quantifiable, yet both can provide important information to ensure that questions are relevant. For example, a survey of pain techniques by anesthesiologists might benefit from pretesting with a small group of surgeons (face validity) and pain medicine specialists (content validity). The value of expert consultation before implementing any survey cannot be understated.
Construct validity is harder to conceptualize but is a measure of the degree to which survey questions, when applied in practice, reflect the true theoretical meaning of the concept. Construct validity is typically established over years of use in different settings and populations. Although there is no simple metric for construct validity, social scientists typically use other quantifiable measures, such as comparing against an existing “gold standard.” This type of validity is termed concurrent criterion validity.
Where there is no gold standard, construct validity can be established by measuring the degree to which the questions in a survey correlate with other measures that should theoretically be associated with the same construct (convergent validity). For example, to validate a new survey instrument to measure sleep quality, it might be important to compare it with other measures of sleep quality (e.g., direct observation). If convergent validity is established, a natural follow-up test would be to see whether the same questions are able to discriminate between sleep quality and other related, but different, measures such as sleep quantity. If these two measures do not correlate, we assume (if other validity measures confirm) that they are measuring two separate constructs and that the sleep-quality questions demonstrate good divergent or discriminant validity.
Ethics committee or institutional review board approval is typically required before testing and implementing any survey. The primary ethical concerns of surveys relate to content (e.g., could items be psychologically damaging?) and how confidentiality will be maintained. Although surveys may not be identifiable by the participant’s name, there are other sources of information, e.g., IP addresses and email addresses, that could potentially link the survey with the participant. This is particularly important when using third-party software services, e.g., Survey Monkey and Qualtrics. Investigators should thus be aware of the security agreements of each company and assure participants that their information will be maintained in a confidential manner, e.g., stored and maintained on password-protected computers or cloud storage and/or how any identifying information will be delinked.
Pretesting (Piloting) the Survey
Although there is no such thing as a perfect survey, pretesting or pilot testing can significantly enhance the effectiveness of any survey. Unfortunately, this step is often missing.6 Pretesting is typically conducted in two phases. First, the research team reviews all aspects of the survey, i.e., the instructions, the order and flow of questions, whether it contains skip or branch logic, how long the survey should take to complete, and whether specific questions are ambiguous and/or are being consistently missed. Second, the survey should be distributed among a small subset of the intended audience before it is administered to the larger target group. This can be done somewhat informally but can also involve structured focus groups followed by thorough debriefing. Even if previously validated surveys are used, questions should be pretested because meaning can often be affected by the context of the survey. No matter what the design, the piloted survey should be submitted as part of any manuscript, possibly as an appendix.
Precise estimates of large populations, up to millions of people, can be derived from survey samples of fewer than 2,000 people.10,28,29 Thus, because it is not always practical to survey an entire population, sampling provides an efficient way to collect data that, if done correctly, can be representative of the population of interest. A representative sample should mirror the characteristics of the broader population, ensuring generalizability and reducing the effect of sample bias. However, although representativeness is a primary goal, the sampling approach will also depend on the type of survey, the target population, inclusion of subgroups, and resources/cost.
Because a survey of the entire ASA membership (53,000 members in 2016) might be impractical, an option would be to generate a sample that is representative of important characteristics of the ASA membership, such as sex, ethnicity, and training. This is best achieved by employing some type of simple random sampling.17,29 However, there may be instances in which investigators may want to focus on a subgroup or oversample groups that are underrepresented, e.g., rural practitioners. In these cases, stratified sampling can be employed in which random samples are drawn from each subgroup or strata, e.g., ASA membership by geography. In cases in which there may be underrepresentation of certain groups, other methods such as oversampling should be employed.
Sample-size estimates should be based on the primary question29 and large enough to be confident (usually 95 or 99%) that results from the entire population will lie within the desired margin of error of the sample (fig. 2).29 Typically, the maximum acceptable margin of error for a proportion (percentage) of the population is set around ±5% (most political polls quote 3 to 5%). That is, if the margin of error is ±5%, and 25% of respondents from a sample of 325 ASA members reply that they use thiopental for induction, the 95% CI (the results of 95 of 100 repeated samples of the sample) shows that between 20 and 30% of ASA members use thiopental. Small samples typically produce wider CIs. By using the same example, a sample of 30 ASA members that produces a margin of error of ±20% would result in a 95% confidence estimate that 5 to 45% of ASA members use thiopental: an unhelpful estimate ranging from very few to almost half. However, although increasing the sample size to reduce the margin of error increases the precision of the data, there is an effect of diminishing returns (fig. 2). As the margins of error are tightened to less than 4%, the number of participants required increases disproportionally. This is important when balancing precision with the practicality, availability of resources, and costs of surveying large numbers of subjects.
For all investigators, we strongly advise working with a statistician for both planning and analysis. For those with a background in statistics, there are several online resources28,31 and statistical packages such as R (available free from R Foundation for Statistical Computing, Austria), STATA (StataCorp LLC, USA), and SPSS (SPSS Statistics, IBM, USA).
Sample-size calculations can also be based on anticipated proportions, but when comparing groups, the anticipated difference may be important. Notably, calculated sample sizes are for the number of completed surveys. Although a response rate of more than 60% is considered good, less than 50% is common. Recent surveys of anesthesiologists and anesthesia fellows, for example, reported 54 and 33% response rates, respectively.32–36 A conservative approach, therefore, is to send the survey to approximately two to three times the calculated sample size. In general, leading journals are unlikely to publish a survey with a response of less than 30 to 40%, except in exceptional circumstances.
Just as observer bias can adversely affect results in a randomized trial, researcher bias (subtle or overt) can affect the way questions are asked. Care must be taken to ensure that questions are objective and that personal opinions do not bias framing questions, e.g., “Do you feel guilty about accepting a Do Not Resuscitate order?”33,36 Interviews can evoke implicit personal bias by both the interviewer and the interviewee. Researchers must avoid words that are potentially charged or could generate an emotional response.10 An extreme version of biased questions is push polling, where the hidden purpose is to drive opinions rather than ask questions,37 e.g., “For fluid resuscitation do you use Dodgy-sol, which is both dangerous and expensive?” Readers’ should look for these biases in questions.
Along with precision, the response rate is a central metric of survey quality.6 Nonresponse is one of the most frustrating aspects of all survey research, and physicians are among the worst offenders.38 Topics that have widespread practice implications may enhance response rate, e.g., video laryngoscopes (67% response)39 or the effect of fatigue in trainees (59% response).40 However, even well designed, hot-topic studies suffer from nonresponse. Although some nonresponse is expected and acceptable, surveys that have large nonresponse rates are subject to bias, particularly if the nonresponse is related to the survey topic (outcome) or if the nonresponders differ substantively from responders. For example, individuals who have experienced a bad or sensitive outcome may be less willing to report it (report bias), and as such, the true outcome may be underreported. Although nonresponse is often simply a function of a lack of respondent time, its impact is often survey-specific. For example, whereas a response rate of 50% may be adequate for postal and online surveys, 85% would be considered minimally adequate for interviews.41 In any case, it is important to determine whether the nonrespondents differ substantively from respondents. The most pragmatic way to do this is to compare the demographics of the responders with the known demographics of the target population. Another way is to send a brief follow-up survey to the nonrespondents requesting basic demographics and the reason(s) for nonresponse. Using this approach for a survey project, one of us (A.R.T.) found that nonrespondents had similar characteristics to respondents and that most nonresponse was due to a lack of participant time.42 This follow-up may be less appropriate with patient surveys. Because nonresponse bias is an important limitation, it should always be discussed in any written publication.6
There are several tactics to improve response rates and mitigate the effects of nonresponse.43–45 Importantly, prenotification of the survey by email or postcard has increased response rates.43 A professionally written cover letter that explains the importance of the study is also critically important to pique interest. Techniques such as increasing “white space,” emphasizing important points with bolding/underlining, and use of color tend to engender better response rates.15,43 For surveys dealing with sensitive topics, response rates will be greater if the data are anonymized or if confidentiality is assured.
For online surveys, there should be email reminders with opportunities to receive additional surveys or access to the online survey link (maximum of three reminders/follow-up attempts).10 Often researchers will provide small (noncoercive) incentives to encourage respondents to complete their surveys, e.g., gift cards, money, or lottery tickets, but these need to be in the planned budget.46 Online surveys typically have poorer response rates than postal surveys44,47 and may also be subject to a “speed through” phenomenon, where respondents satisfice by rushing through the survey without due thought. For some online surveys, it is, however, possible to measure the time taken to complete the survey. If this time is deemed too quick based on pilot-testing estimates, the results may be unreliable. Online surveys are also limited to those individuals with online access and thus may evoke a selection bias. Despite these concerns, however, online surveys are supplanting traditional postal surveys. Although response rates can be lower than with postal surveys,44,47 online surveys also tend to be quicker and cheaper to administer and reach larger or dispersed audiences.
In addition to the issues posed by total nonresponse, problems can also occur when participants choose not to answer certain questions (item nonresponse). Typically, missing values are automatically excluded from the analysis and do not pose a problem. However, if the percentage of missing responses is high, e.g., more than 20%, the investigator may choose to correct for this by imputing the missing data. In any case, it is important that missing data are reported to allow the reader to estimate the potential impact of the item nonresponse.
Recall bias refers to error associated with respondents being unable to adequately recall past events. To minimize recall bias, questions should be framed in time periods calibrated for the events, e.g., “difficult intubations in the last 3 months.”
Often called social desirability bias, this type of bias refers to the tendency for individuals to downplay negative attributes. Asking parents whether they smoke in the house, for example, is likely to be underreported because parents often know that second-hand smoke is inherently bad for their children. Assuring that responses are either anonymized or that confidentiality will be honored will typically reduce the potential for self-report bias.
Analysis of survey data should be based on a predefined endpoint and will depend on the type of data collected and the question(s) asked.17 Most quantitative survey research involves descriptive frequency data involving proportions and measurements of central tendency, e.g., means and medians, and variability, e.g., SD and range. Comparisons between groups will again depend on the type of data collected, i.e., continuous data versus categorical data. For these data, simple statistics, such as Student’s t tests, ANOVA, and the chi-square test, can be used, as appropriate. Analyzing categorical data, such as Likert scales, can present challenges. For example, imagine a five-point Likert scale of “extremely dissatisfied,” “dissatisfied,” “neither dissatisfied nor satisfied,” satisfied,” and “extremely satisfied” used to test the attitude of 1,000 Australian anesthetists to a new laryngoscope: the Bonza-Scope. The proportions giving each response could be stated and compared by using the chi-square test. Another option (with greater statistical power) is to combine “extremely dissatisfied” with dissatisfied” and “satisfied” with “extremely satisfied.” The summed results could thus be that 60% were satisfied, 10% neutral, and 30% dissatisfied with the Bonza-Scope. A simple analysis would be to just compare the proportion who are satisfied with the proportion who are dissatisfied. This provides “headline” statistics, e.g., “In a survey of 1,000 Australian anesthetists, 60% were satisfied with the Bonza-Scope, whereas 30% were dissatisfied (difference 30%, 95% CI: 26 to 34%, P = 0.002).”
Another approach is to create dummy variables for categorical data.17,19 For example, data using the same five-point Likert scale of “extremely dissatisfied” to “extremely satisfied” can also be coded from 1 to 5 (e.g., 1 = extremely dissatisfied, 5 = extremely satisfied). These are not continuous data and should not be assumed to be evenly spaced ordinal data. Parametric statistics, including mean and SD descriptive statistics, are not appropriate. These data can be analyzed as numerical data by using comparative statistics, such as the nonparametric Mann–Whitney U test, which examines rank and not magnitude. In another Bonza-Scope research project, the attitudes of Australian anesthetists might be compared with the attitudes of American anesthesiologists. If the Australian group had a median score of 4 and the American group a median score of 3 on a Likert scale question for satisfaction, rather than saying there is a median difference of 1, it is probably more meaningful to say Australians were more satisfied than Americans (P < 0.005).
With the advent of powerful desktop statistical programs, more complex statistical analysis can also be applied to survey research, including logistic regression for analysis of predictive factors and factor analyses that identify which individual questions or factors explain most of the variance in the data.48,49 This process is important in identifying which factors in a survey are important and which can be safely removed (data reduction). Again, collaborating with statisticians is likely to produce better survey design and analysis.
Open-ended questions from both oral interviews and written surveys are analyzed to identify themes.21,50 A theme is a patterned response within the survey data, e.g., repetitions, recurring topics. For example, the question, “Under what circumstances would you cancel anesthesia for the child with an upper respiratory tract infection?” is likely to evoke different responses that can be sorted into themes. These themes might be related to patient, parent, anesthetic, or surgical factors. The importance of a theme is typically determined by its prevalence or how many respondents articulated that theme. Unfortunately, like many aspects of survey research, the importance and difficulty of thematic analysis is often underestimated.10,21,50
Mixed-methods research (table 7) represents a relatively new approach to analyzing survey data.10 Although most clinical survey data are primarily quantitative, mixed methods allow researchers to integrate both qualitative and quantitative data. By integrating both data types, mixed-methods research provides richer information. Typically, mixed methods are used to corroborate results by using other approaches, develop a theory about a phenomenon, complement the strengths and/or overcome the weaknesses of a single design, or develop and test a new instrument.51
The choice of mixed methods requires a systematic approach, including determining the sequence of data collection, e.g., quantitative precedes or follows qualitative; identifying what method will take priority during data collection and analysis; deciding what the integration of qualitative and quantitative data stage might involve; and deciding whether a theoretical perspective will be used.51 The advantages of mixed-methods designs are that they combine the strengths and diminish the weaknesses of a single design, can provide a more comprehensive understanding of the questions asked, and may be more effective in developing survey instruments. The disadvantages are that they can be complex, time-consuming, and difficult to integrate and interpret. Continuing our example: Americans may be less satisfied (P < 0.005) (quantitative) with a Bonza-Scope with a qualitative theme of “The handle is too big.” Again, we strongly recommend collaborating with a biostatistician or social scientist or both.
Poor methodologic quality of survey research is often a (negative) factor in decisions regarding publication. Producing good-quality survey research is a complex process that is harder than it looks. We hope that this article will provide investigators with useful tools (table 1) to successfully navigate the survey process and publication and provide readers with useful points to judge survey research (table 2). We also have provided a short list of suggested minimum standards (table 3) that we think can be a threshold for submitting surveys to journals. Survey reports failing these minimums will have far less likelihood of success, and submission to major journals will probably be futile. Therefore, the toolboxes and minimum standards can be used by researchers, editors, and the component anesthesia societies (e.g., Australian and New Zealand College of Anaesthetists, Society for Pediatric Anesthesia) to ensure conduct, submission, and publication of high-quality surveys for informed readers.
Dr. Story was funded solely from institutional sources. Dr. Tait is supported by grant No. UL1TR000433 from the National Center for Advancing Translational Sciences of the National Institutes of Health (Bethesda, Maryland) and from departmental sources. The funding agency had no role in the concept, design, data collection, interpretation, or writing of this article.
The authors declare no competing interests.