Satisfaction is considered a valuable measure of outcome of healthcare processes. Only a few anesthesia-related validated questionnaires are reported. Because their scope is restricted to specific clinical contexts, their use remains limited. The objective of the current study was to develop and validate a self-reported questionnaire, Evaluation du Vecu de l'Anesthesie Generale (EVAN-G), assessing the satisfaction of the perioperative period surrounding general anesthesia.
Development of the EVAN-G questionnaire comprised a phase of item generation and a phase of psychometric validation. The patient sample was generated to be proportionally matched to the population of patients undergoing general anesthesia in France. The structure of the questionnaire was identified studying interitem, item-dimension, and interdimension correlations and factor analyses. Data were concurrently gathered to assess external validity. The discriminant validity was determined by comparison of scores across well known patient groups. Reliability was assessed by computation of Cronbach alpha coefficients and by test-retest.
Eight hundred seventy-four patients were recruited in eight anesthesia departments. The EVAN-G includes 26 items; six specific scores and one global index score are available. Correlations between EVAN-G scores and other concurrent measures supported convergent validity. The EVAN-G correlated poorly with age, American Society of Anesthesiologists physical status, total anesthesia time, and number of previous anesthesias. Significantly higher satisfaction was reported by patients older than 65 yr, belonging to the laryngeal mask group. Reliability and reproducibility were shown.
The EVAN-G adds important information oriented toward patients' perceptions. The authors' approach provides a novel, valid, and reliable tool that may be used in anesthesia practice.
HUMAN satisfaction is a very complex concept, involving many components such as physical, emotional, mental, social, and cultural factors. Multiple definitions of this concept and theories of satisfaction have emerged mainly from the field of behavioral sciences and consumerism.1–3Satisfaction is now considered a valuable measure of outcome of healthcare processes.4In several countries, it has become a major step in health institution accreditation processes.2Moreover, satisfaction may have an important influence on various aspects of patients behavior, such as global consumption of healthcare resources, compliance with treatments, or steadiness of relationship with practitioners.5,6However, the complexity of the concept, as well as the multidimensional nature of satisfaction, has often impaired the development of reliable evaluation tools.7In anesthesia, the strong emotional context, the potential effect of drugs on cognition, and the short time interval of the anesthesia process make it even more difficult to assess satisfaction.8Only a few anesthesia-related validated questionnaires have been reported.9However, because their scope is either restricted to specific clinical contexts7or focused on specific anesthetic managements,10their use remains limited. Most of these tools have been developed relying on the views of experts to determine essential patient concerns.10–15Furthermore, they are not always psychometrically sound.16Such weaknesses may lead to unreliable results and may partly explain the high satisfaction ratings usually reported.
Satisfaction should be measured as a multidimensional concept and should reflect the exclusive concerns of the patients. The definition of satisfaction, relying on the theory of expectations,3satisfies this demand because it focuses the discrepancies between patients’ perceptions and their expectations. At last, this tool should meet the needs of clinicians and should therefore be usable for any type of surgery and in routine clinical practice.
The main objective of this study was to develop and validate, following international guidelines,17–19a self-reported questionnaire, Evaluation du Vécu de l’Anesthésie Générale (EVAN-G), assessing the satisfaction of the perioperative period surrounding general anesthesia, that would fulfill those requirements.
Materials and Methods
Patients were recruited between January 2000 and February 2001, in eight anesthesia departments (including four university hospitals) in southeastern France. The study was piloted by a steering committee including nine anesthetists, two surgeons, one psychiatrist, and three epidemiologists from different centers. Approval was obtained from the local ethics committee. The criteria for patient inclusion were consent to participate in the study, elective surgery (except obstetric) or endoscopic procedure requiring general anesthesia (excluding monitored anesthesia care, regional anesthesia), age older than 18 yr, and ability to understand and read French and to fulfill a self-administered questionnaire within the 48 h after general anesthesia. To maximize the return rate, all questionnaires were administered and collected before the patients left the hospital. An independent investigator made sure the patients could complete the questionnaire without any external help or influence. Development of the EVAN-G questionnaire comprised two steps: a phase of item generation and reduction and a phase of psychometric validation.
Step 1. Questionnaire Development: Item Generation and Reduction
Face-to-face semistructured interviews were performed by a trained interviewer within the 48 h after surgery. The interviews comprised two parts. First, patients were asked to report about their satisfaction in a nondirective way. The second part of the interview was based on a guideline issued from the literature review.19,20This guideline listed the main domains related to satisfaction as reported in literature and did not consist of a set of directive questions. The interviews addressed the impact of the perioperative process with general anesthesia on patients’ satisfaction referring to the theory of expectation, defining satisfaction as the discrepancy between expectations and current life experience.2,3Therefore, the views of the patients were gathered regarding both their current experience of health care in the perioperative period and their corresponding expectations. All interviews were recorded and transcribed. Content analysis was performed by three members of the steering committee who were skilled in textual analysis, complemented by a computerized textual analysis (Alceste® software; IMAGE, Toulouse, France). It aimed at identifying recurrent themes that were then used to generate individual questions within the questionnaire.21Patient interviews were also used to determine the wording in question stems and the range of response options. Interviews were conducted until no new ideas emerged in the content analysis performed in real time, up to a total of 24 patients.
Seventy-five questions were selected from these interviews (item generation). These items were answered using a five-point Likert scale, where 1 = much less than expected, 2 = less than expected, 3 = as expected, 4 = more than expected, and 5 = much more than expected.
The acceptability of this 75-item self-administered questionnaire was pretested in a sample of 29 consecutive patients who had not participated in the interviews. Items that were ambiguous, misunderstood, or rarely answered were suppressed, leading to a preliminary questionnaire comprising 66 remaining items.
This preliminary 66-item version of the questionnaire was administered to a new sample of 171 randomly selected patients undergoing surgery. Eleven questions were deleted with regard to redundancy (interitem correlation), low response rates (missing data over 20%), or skewness of the distribution of the answers (floor–ceiling effect).
Concurrently, this pilot study ensured content validity and guaranteed that the questionnaire was a true reflection of the patients’ experience. Patients were asked to comment on any aspects of questionnaire (content, wording, response choices) that they thought were irrelevant or prone to improvement.
Step 2. Questionnaire Validation
Population and Data Collection.
The primary objectives of the validation phase were to check that the questionnaire actually (validity) and accurately (reliability) measured the concept it has been designed for, i.e. , patient satisfaction. Assessment of the validity of an instrument is meant to evaluate the systematic error of measure (drift), whereas assessment of the reliability is meant to evaluate the random error of measure (scatter). A secondary objective of the validation phase was to further reduce the number of items to make the instrument shorter and usable in routine clinical practice. Patients included in the first stage could not be included in this validation phase.
The patient sample regarding surgical procedures was generated to be proportionally matched to the yearly population of patients undergoing general anesthesia in France,22excluding monitored anesthesia care and regional anesthesia. A total of 977 patients who met the inclusion criteria were included, over 15 consecutive days in eight anesthesia departments in southeastern France. Sociodemographic and other clinical data were collected by a senior anesthetist.
Within 4–48 h after surgery, the patients were given the self-administered 55-item questionnaire, along with two other questionnaires detailed in the next section.
Statistical Analysis: Psychometric Validation.
The multidimensional structure of the questionnaire was identified studying interitem, item–dimension, and interdimension correlations (Pearson r ) and principal component factor analyses with Varimax rotation.18,19More specifically, item internal consistency was assessed by correlating each item with its scale, which was corrected for overlap. A correlation of 0.4 is recommended as the standard for supporting item internal consistency. Item discriminant validity was assessed by determining the extent to which items correlate more highly with the dimensions they are hypothesized to represent than with the other ones.18,19The construct of each dimension was assessed using Rasch Rating Scale model analyses.23,24This model allows one to ensure the unidimensionality of each dimension. The more meaningful and psychometrically sound solution was kept to produce the final version of the EVAN-G questionnaire, consisting of 26 items assessing six dimensions: attention, privacy, information, pain, discomfort, and waiting times ( appendix: EVAN-G questionnaire).
To assess external validity of the EVAN-G, other external data were concurrently gathered from patients and clinicians. Relations were investigated between specific potential dimensions of EVAN-G (e.g. , pain, anxiety) and other instruments, such as validated questionnaires (McGill Pain Questionnaire [MGPQ],25Spielberger State-Trait Anxiety Inventory [STAI]26) or specific visual analog scales (VAS) assessing miscellaneous domains: anxiety, fear, pain, discomfort, trust in medical staff, ability to ask for information, quality of information delivered, kindness of medical staff, paid attention, and overall satisfaction with care. The underlying assumption was that dimension scores of the EVAN-G were higher correlated with scores of similar dimensions from the other concurrent instruments than with dissimilar ones (convergent validity).
The discriminant validity of EVAN-G was determined by dimension mean scores of dimensions across patient groups that were known to differ in their sociodemographic (age, sex) or clinical features (American Society of Anesthesiologists [ASA] physical status, number of previous anesthesias, anesthesia duration) using analysis of variance, the Student t test, the Mann–Whitney U test, or the Pearson correlation.
To check content validity, patients were asked, in an open-ended question at the end of the questionnaire, to point out the different important domains of their life that were not mentioned on the EVAN-G.
Reliability was assessed by computation of Cronbach α coefficients (internal consistency) for each dimension score, and by test–retest using an intraclass correlation coefficient conducted on a subsample of 36 patients evaluated twice with a 15-day interval. To ensure data quality by excluding those answers likely to be unreliable, the validation analysis was performed on records with less than 25% of missing data on the questionnaire to be validated, as recommended in international guidelines. This procedure reinforces the generalizability of the final validated questionnaire.18,19
To avoid confusion with intermediate development versions of the questionnaire, only the results of the final psychometric validation phase are reported.
Of the 977 recruited patients, 103 (10.5%) answered less than 75% of the questions and were excluded from the analysis. The 874 remaining patients included in the validation analyses underwent a wide range of surgical procedures: 102 gynecologic (11.7%), 83 orthopedic (9.5%), 80 digestive (9.2%), 71 ear, nose, and throat (8.1%), 71 vascular (8.1%), 65 esthetic (7.4%), 50 endocrine (5.7%), 44 maxillofacial (5.0%), 41 ophthalmologic (4.7%), 30 spine (3.4%), 29 urologic (3.3%), 14 intracranial (1.6%), 12 thoracic (1.4%), 160 endoscopic (18.3%), and 22 other (2.5%). The mean patient age was 51 ± 17 yr, 397 of the patients (45.4%) were male, 678 (77.5%) received a premedication, 118 (13.5%) were ambulatory patients, and the mean number of previous surgeries was 3 ± 2. ASA physical status scores, rated by physicians, were as follows: I, 422 (48.5%); II, 371 (42.6%); III, 74 (8.5%); IV/V, 3 (0.4%). The mean MGPQ score was 6.3 ± 8.6 (range, 0–57), and the STAI mean global score was 51.3 ± 2.9. The mean duration of anesthesia was 110 ± 86 min (median, 90; range, 10–720).
The scores for negatively worded items were reversed so that higher scores indicated a higher level of satisfaction. For each individual, the score of each dimension was obtained by computing the mean of the item scores of the dimension. If fewer than one half of the items were missing, the mean of the nonmissing items was substituted for the missing items. All dimension scores were linearly transformed to a 0–100 scale, with 100 indicating the best possible level of satisfaction and 0 indicating the worst. The global satisfaction score was computed as the mean of the dimension scores.
Principal component analyses after Varimax rotation isolated 26 questions, grouped into six dimensions. The six-factor structure accounted for 64% of the total variance (table 1). Each dimension was named according to its constitutive items: attention (5 items), privacy (4 items), information (5 items), pain (5 items), discomfort (5 items), and waiting (2 items).
As expected, correlations between items and their corresponding dimension (item internal consistency) ranged from 0.55 to 0.92, whereas correlations between items and the other dimensions (item discriminant validity) ranged from 0.02 to 0.53 (table 2). Correlations between dimension scores were low to moderate, ranging from 0.14 to 0.57 (P < 0.001). The global mean satisfaction score was 75 ± 14. The worst mean dimension score was found for information (64 ± 22), and the best was found for discomfort (84 ± 19) (table 2).
Overall, the levels of correlation between EVAN-G scores and other concurrent measures confirmed the convergent validity of the instrument (table 3). High levels of correlation were found between the MGPQ and pain (r =−0.52, P < 0.01) and between the MGPQ and discomfort (r =−0.36, P < 0.01). Each EVAN-G dimension score correlated higher with its domain-related VAS. VAS score assessing overall satisfaction with care correlated with all of the EVAN-G dimensions (r = 0.16–0.36, P < 0.01) and the global score (r = 0.40, P < 0.01). The EVAN-G score was not correlated with the STAI score, except for attention. All these results are reported in table 3
Overall, EVAN-G scores correlated poorly with age, ASA physical status, total anesthesia time, and number of previous anesthesias. The only significant correlations were reported between (1) age and pain (r = 0.18, P < 0.01), discomfort (r = 0.20, P < 0.01), waiting (r = 0.18, P < 0.01), and global index (r = 0.18, P < 0.01); (2) total anesthesia time and pain (r =−0.29, P < 0.01), discomfort (r =−0.13, P < 0.01) and global index (r =−0.13, P < 0.01); and (3) ASA physical status and pain (r = 0.10, P < 0.01) and waiting (r = 0.13, P < 0.01).
Findings regarding the comparisons of EVAN-G scores among clinical groups met the assumptions expressed a priori by the steering committee relying on clinical experience and literature. Significantly higher global satisfaction was reported by patients older than 65 yr, belonging to the laryngeal mask group and undergoing minor surgery. Information dimension score was significantly higher in the premedicated group. The pain dimension score was significantly higher in the following groups of patients: ASA physical status greater than II, minor surgery, outpatient, unpremedicated, and laryngeal mask. Discomfort dimension scores were significantly higher in patients older than 65 yr, males, patients with ASA physical status greater than II, patients undergoing minor surgery, outpatients, and patients in unpremedicated and laryngeal mask groups. Waiting dimension scores were significantly higher in patients older than 65 yr, patients with ASA physical status greater than II, and patients in unpremedicated and laryngeal mask groups (table 4).
The steering committee confirmed that the content of each dimension was meaningful and that the six-factor structure dealt with the major domains reported in the patients’ interviews and open comments, reflecting a relevant content validity.
The internal consistency reliability of EVAN-G was high: The Cronbach α value ranged from 0.73 to 0.91 (table 2). The stability of the EVAN-G score evaluated by test–retest correlation conducted on 36 patients without any intercurrent health events was satisfactory: The intraclass correlation coefficient ranged from 0.72 to 0.81.
The average time of completion of the EVAN-G was 9 ± 7 min, fully compatible with clinical practice. Missing data per dimension were low, ranging from 0.6 to 2.7% (table 2).
The objective of this study was to report the stages of development and validation of a self-administered satisfaction questionnaire of the perioperative period surrounding general anesthesia based on issues pertinent to patients.
The first issue to be discussed refers to the psychometric validity of the questionnaire. None of the satisfaction instruments commonly used in anesthesia studies concurrently include development from the exclusive point of view of patients, perioperative period as time frame, and data demonstrating high psychometric validity.
The perspectives to which the questionnaire refers are an important issue because of discrepancies between patients’ and physicians’ points of view.27Therefore, physicians should favor standardized questionnaires that are not only self-completed by the patients but also ask them questions about their own concerns. The EVAN-G was developed with 24 individual interviews, conducted by a nonclinician who was not part of the medical staff, including a wide range of surgical procedures, followed by cognitive debriefing with patients at different steps of the development. These qualitative data were collected. Content validity was ensured by developing items on the basis of in-depth interviews with patients rather than relying on literature or expert opinions. The content of the questionnaire encompasses experiences of great importance to patients. Any items that were criticized by patients for being inappropriate or ambiguous were removed. The domains covered by the EVAN-G express the two-fold originality of its development process, conveying patients’ views and focusing on the perioperative period. The EVAN-G describes themes commonly reported when evaluating patients’ satisfaction or perceptions related to surgery, such as pain and discomfort. In that, there is overlap with the Quality of Recovery questionnaire,28for example. Nevertheless, the EVAN-G explores complementary aspects specific to satisfaction, such as attention and privacy. These dimensions of anesthesia are seldom evaluated in studies on patients’ satisfaction relying on expert-based questionnaires, but patients believed these aspects of anesthesia care were important.
Internal consistency reliability of the six domains and reproducibility have been shown to be high. Construct validity was explored by the comparison to well-established measures (ASA, STAI, MGPQ, and others). Moreover, because there was no accepted criterion that can serve as a definitive standard of satisfaction, following the lead of previous investigators, we chose to compare the EVAN-G to the VAS score representing different domains of satisfaction.29To limit the risk of bias due to improper patient selection, a large sample was taken from eight hospitals, and the recruitment mimicked the repartition of the surgical population in France.22This reinforced the reliability and validity of the EVAN-G as a measure of patient perioperative satisfaction.
The acceptability of the EVAN-G was good. The rate of missing data of the final 26-item version, less than 3% for all the dimension scores, and the rate of spontaneous refusal were low, ranging from 1 to 2%, along with the various steps of development and depending on the surgical procedure. The average completion time of the 26-item EVAN-G is short enough (approximately 9 min) to allow its use in routine practice, unlike other instruments, comprising 40 items, which are too lengthy.28,30
The second issue concerns the relevance of developing a specific questionnaire assessing satisfaction in the perioperative period and the clinical interest of the EVAN-G. The previous results did not aim at explaining the links between clinical and sociodemographic status and satisfaction; clinical and sociodemographic data were only required to test hypotheses to reinforce the validity of the questionnaire. However, our findings are consistent with those reported in literature, especially regarding sex and age, described as high predictors of patients’ satisfaction level.31–33
The perioperative period, considered the time frame of interest, lasting from the first visit with the anesthetist before surgery up to the 48th hour after surgery, is remarkable. This issue has often been discussed in previous published works.32So, it becomes possible to take into account the perioperative period, making it possible to examine the factors related to anesthesia care that influence satisfaction, in addition to the more usual factors reflecting surgical procedures. Moreover, the perioperative period is a unique process and a global experience from a patient’s point of view, and EVAN-G items cover both the successive steps in the care process and the different caregivers involved in the process.
Restriction of the period of questionnaire administration to the first 48 h after surgery is also found in other studies exploring anesthesia care management.34Conversely, the assessment of satisfaction at a later point could mainly reflect perceptions related to surgery. This probably led to an underestimation of the reliability of the questionnaire using the test–retest procedure in 36 patients at a 15-day interval.
Increasingly, clinical researchers and healthcare providers have focused on measuring patients’ perceptions of the outcomes of care, namely patient-reported outcomes. In anesthesiology, efficacy and safety drive anesthesia providers’ determination of the quality improvement. Anesthesia-related mortality has been dramatically reduced during the past 20 yr,35but still, little consideration is given to integrating patients’ satisfaction in the decision-making process. Nevertheless, the evaluation of services by patients should be an integral part of continuous quality improvement in anesthesia. Therefore, it is important to identify the factors for patient dissatisfaction. An important question is, what exactly does the questionnaire evaluate? Because there is no accepted standard of patient satisfaction, the global value of the EVAN-G in a population is not an absolute value reflecting a given level of satisfaction. Moreover, one can imagine that the simple fact of having passed the perioperative period without any major adverse event would overestimate patient satisfaction. This is probably true and may explain very high levels of satisfaction when the question is whether a patient is “satisfied,”“somewhat dissatisfied,” or “dissatisfied.”36But a 26-item questionnaire based on patients’ expectations is obviously less contaminated by the happy feeling of a good surgical outcome. It could also be argued that one severe discomfort (e.g. , postoperative pain or frequent nausea or vomiting) might lead to a low satisfaction level even if the other expectations were met, but the multidimensional construction of the scale, with low correlation values between dimensions, should allow analysis of the causes of dissatisfaction.
Furthermore, the EVAN-G may be useful to assess the comparative effects of alternative anesthesia procedures. From this perspective, one obvious limitation of the EVAN-G is that it cannot forecast for regional anesthesia or monitored anesthesia care. Further work should explore patients’ satisfaction in these specific contexts. One alternative would be to develop modules specific to these special anesthetic contexts in addition to a core questionnaire. Currently, research with patients undergoing regional anesthesia has already been conducted, confirming the relevance of this modular approach.
Last, for anesthetists, one major interesting quality of the EVAN-G is the wide range of surgical procedures studied. In most satisfaction studies, validated questionnaires focus on a single type of surgery, meeting the specific needs of surgeons but restricting the usefulness of the questionnaire with regard to anesthetists’ decision making. For example, the pain component is very important for major surgical procedures but is less relevant to procedures associated with a low level of pain. Therefore, the same global score could reflect various perceptions of the anesthesia process.
The EVAN-G is not intended to replace conventional outcome measures such as mortality, adverse events, or other information dealing with recovery. The EVAN-G adds important information, oriented toward patients’ views and perceptions, to the information traditionally collected in anesthesia. Further work is needed to test its strengths and weaknesses in different cultural contexts. However, reliability and validity of the EVAN-G are evidenced by our results. Our approach provides a novel, valid, and valuable tool that may be used in routine anesthesia practice.
The authors thank Jean-Pierre Carpentier, M.D. (Chief of Department, Department of Anesthesia, Hopital Laveran, Marseilles, France); Claude Churlaud, M.D. (Staff Anesthesiologist, Department of Anesthesia, Hopital Laveran); Marc Dupont, M.D. (Chief of Department, Department of Anesthesia, Hopital St Joseph, Marseilles, France); Béatrice Eon, M.D. (Staff Anesthesiologist, Department of Anesthesia, Ste Marguerite University Hospital, Marseilles, France); Florence Ettori, M.D. (Staff Anesthesiologist, Department of Anesthesia, Nord University Hospital, Marseilles, France); Sylvie Le May, M.Sc., Ph.D. (Staff Anesthesiologist, Department of Anesthesia, Montreal Heart Institute, Montreal, Quebec, Canada); and Laurent Michot, M.Sc., Ph.D. (Ingenior, Nancy, France), for their valuable contributions to this study.