Gastric sonography can provide information about gastric content and volume that can help determine aspiration risk at the bedside. The primary objective of this study is to assess the intrarater and interrater reliability of a previously validated method of gastric volume assessment based on gastric antral area. The secondary objective is to evaluate the agreement between two different methods to measure gastric antral area.
Three independent raters performed a standardized gastric ultrasound assessment in healthy subjects who had been randomly allocated to ingest a predetermined volume of clear fluid (apple juice) from 0 to 400 ml. Each rater measured the gastric antral area, using twice the two-diameter method and twice the free-tracing method. The rater order was allocated at random and raters were unaware of the volume ingested and of one-another’s measurements. The Guidelines for Reporting Reliability and Agreement Studies were followed for conducting and reporting this study.
Twenty-two volunteers were studied. Ultrasound assessment of antral cross-sectional area and volume was found to have “nearly perfect” intrarater and interrater reliability (correlation coefficient >0.8) with maximum differences within 13%. A Bland–Altman analysis suggests that the free-tracing method and the two-diameter method are essentially equivalent, within a clinically acceptable level of agreement.
Ultrasound assessment of gastric volume by clinical anesthesiologists is highly reproducible with high intrarater and interrater reliability. The free-tracing method to measure antral cross-sectional area is equivalent to the two-diameter method.
We attempt to bring patients to the operating room with small quantities of gastric content to prevent aspiration.
Gastric sonography can provide information about the volume of the stomach. Ultrasound assessment of gastric volume by clinical anesthesiologists is highly reproducible with high intrarater and interrater reliability.
ASPIRATION of gastric contents is a serious perioperative event associated with morbidity and mortality.1–3 Patients with a “full stomach” are at increased aspiration risk when sedation or general anesthesia impair their lower esophageal sphincter tone and protective airway reflexes.4 Gastric ultrasound assessment is the first validated noninvasive imaging tool that can provide information about the nature and volume of gastric content at the bedside.5–8 Furthermore, several studies suggest that the cross-sectional area of the gastric antrum (antral CSA) can accurately predict gastric fluid volume.6–8 However, the reliability of ultrasound gastric volume assessment has not yet been established. The primary aim of this study is to evaluate the interrater and intrarater reliability of ultrasound measurements of antral CSA and gastric volume. The second objective is to evaluate the agreement between the traditional two-diameter method (TDM) and a less-reported but simpler and more convenient free-tracing method (FTM) of antral CSA measurement.7,9–11
Materials and Methods
After obtaining Research Ethics Board approval from the University Health Network, (Toronto, Ontario, Canada) and informed consent, we conducted this prospective cross-sectional study on healthy volunteers. The study took place at the Toronto Western Hospital in Toronto, Ontario, Canada. Healthy volunteers were sought by posting study adds on community message boards within the hospital. Volunteers were enrolled between November 2011 and June 2012. Inclusion criteria were: age of 18 to 85 yr, American Society of Anesthesiologists’ physical status class I to II, body mass index less than 35 kg/m2, height greater than 145 cm, and the ability to understand the protocol and provide informed consent. Exclusion criteria were: pregnancy, diabetes mellitus, a history of upper gastrointestinal disease (including hiatus hernia and gastric tumors), and previous surgical procedures on the esophagus, stomach, or upper abdomen.
Ultrasound examinations were conducted with a low-frequency (2 to 5 MHz) curvilinear-array transducer using a Philips (CX 50) (Bothell, WA) or Sonosite (M-Turbo) system (Bothell, WA), with image-compounding technology. After an 8-h fasting period for both solids and liquids, subjects underwent a baseline gastric ultrasound examination by a certified sonographer in the supine and right lateral decubitus positions to rule out the presence of significant gastric volume at baseline. After the baseline assessment, each subject was randomized to ingest one of five predetermined volumes of apple juice (0, 100, 200, 300, or 400 ml). Randomization was performed with a computer-generated list of random numbers and concealed in opaque envelopes. A standardized scanning protocol was carried out beginning 3 min after ingestion. Subjects underwent three gastric ultrasound examinations by three independent raters in random order.
The three raters had variable proficiency levels, but all had previous experience in gastric sonography. The first rater was a certified sonographer with more than 10-yr clinical experience and 4-yr experience (>500 gastric scans) in gastric ultrasound assessment. A second rater was a clinical anesthesiologist with more than 10-yr experience in other ultrasound clinical applications and 4-yr experience (>500 previous gastric scans) in gastric sonography. The third rater was an anesthesia fellow with 3-yr experience in other ultrasound applications and 6-month experience (>50 scans) in gastric sonography.
The gastric antrum was imaged in a sagittal plane, between the left lobe of the liver and the pancreas, at the level of the aorta, with the subjects in the right lateral decubitus, as previously reported (fig. 1).5–7 Frequent peristaltic contractions are a normal occurrence after ingestion of fluid and are readily recognized during gastric ultrasound assessment as temporary decreases in antral diameter. Raters were instructed to obtain the images between (and not during) peristaltic contractions to avoid underestimating antral CSA and gastric volume. Raters were instructed to obtain the images within 5 min. Each rater obtained six independent still images of the gastric antrum. Each image was labeled and stored for review by the rater at the end of the study day. These steps were followed to minimize the time interval between the three raters, which could negatively impact reliability due to gastric emptying. At the end of the study day, each rater independently reviewed his/her generated images and measured antral CSA using both the TDM and the FTM. The TDM is frequently reported in the literature. It consists on measuring two perpendicular diameters of the antrum (fig. 2) and calculating the antral area, assuming that the antrum has a perfect elliptical shape, using a standard formula for the surface area of an ellipse as follows:
where AP represents the antero-posterior antral diameter, CC represents the cranio-caudal antral diameter, and π is 3.1416.
The FTM consists on measuring the antral area by using the free-tracing caliper of the ultrasound unit (fig. 3). This method is simpler to apply and does not need an intermediate calculation. It would therefore be a preferable method to use at the bedside if it is proven to be accurate.
The first set of three images was used by each rater to generate one data point for the TDM (an average of three measurements) and one data point for the FTM (an average of three measurements). Similarly, the second set of three images was used by each rater to obtain a second data point for both TDM and FTM as an average of three measurements in both cases. Using three images per data point is a standard practice in gastric sonography and has been previously reported by several authors.7,8,10 Each rater generated a total of four measurements per subject (twice with the TDM and twice with the FTM). The examiners were blinded to the volume ingested and unaware of the other raters’ findings.
The antral CSA was used to calculate total gastric volume based on a previously validated model as follows:
where Right-lat CSA is the antral CSA measured in the right lateral decubitus position.
We followed the Guidelines for Reporting Reliability and Agreement Studies in conducting and reporting our investigation.12
We estimated the sample size required to test the null hypothesis H0: ρ = ρ0versus the alternative hypothesis H1: ρ > ρ0, where ρ is the “true” reliability coefficient and ρ0 a specified value for ρ. We aimed to demonstrate an intrarater reliability coefficient of 0.8 and an interrater reliability coefficient of 0.6 (denoted by ρintra and ρinter) which have been previously considered to express “almost perfect” and “substantial” reliability, respectively.13 Referencing the graphs by Eliasziw et al.,13 we estimated that 22 subjects, 3 raters, and 2 measurements per rater would result in a power larger than 80%, with a significance level of α = 0.05.
The intraclass correlation for random-effects models based on repeated-measures ANOVA14 was used to evaluate intrarater and interrater reliability as initially described by Shrout and Fleiss.15 In addition, we estimated the absolute and relative differences between the two measurements using the same method (TDM or FTM) among raters as an indicator of interrater agreement.16
The level of agreement between the TDM and the FTM was estimated with a Bland–Altman16,17 analysis to place the magnitudes of the differences in a clinical context. This analysis plots the difference between the two methods against the mean of both methods for each subject. The 95% limits of agreement for the differences were also calculated. The assumption of normal distribution of the differences was checked by the Shapiro–Wilkinson test for normal data. We also estimated proportions of agreement between the two methods among raters within specific limits.12,18
The concordance correlation coefficient between the two methods was calculated.14,19 This coefficient measures the variation of their linear relation from an ideal 45° line through the origin (perfect agreement). It measures how far each observation deviates from the line that best fits the data (precision), and also how far this line deviates from the 45° line through the origin (accuracy).20 Although precision is expressed by the Pearson correlation coefficient, which as a sole measure could potentially over-estimate agreement, the accuracy is expressed by a bias correction factor.16 To visually represent what concordance correlation coefficient evaluates, the TDM measurements were plotted against the FTM measurements, and the line that best fit the data, compared with the line of perfect agreement. Statistical analyses were performed using Stata/IC 12.0 for Mac (Stata Corp, College Station, TX).
Twenty-two volunteers (7 women and 15 men) were enrolled and completed the study. There were no missing data. Demographics are presented in table 1. Ultrasound assessment of gastric volume showed “nearly perfect” overall interrater reliability with intraclass correlation coefficient of 0.96 for both the TDM and FTM (table 2). The interrater reliability remained substantial for each gastric volume level between 0 and 400 ml, even though this secondary calculation may not be sufficiently powered as only four to five subjects were randomized to any given volume level (table 2). Similarly, the intrarater reliability was nearly perfect for all three sonographers with intraclass correlation coefficients of 0.96 to 0.99 (table 3). To place these results in a clinical context, the median absolute difference from mean values was 9.5 ml (p25–p75: 3 to 22 ml), whereas the median relative difference was 2.7% (p25–p75: 1.1 to 5.2%) when using the TDM. The FTM displayed differences of similar magnitudes. The maximum relative difference within each method was not higher than 13%.
Regarding the secondary outcome, a Bland–Altman analysis showed a mean observed difference of −0.33 cm2 with an SD of 0.88 cm2 (table 4). The upper and lower values of the 95% limits of agreement band were 1.4 and −2.07 cm2, which correspond to volumes of 13.1 and −20.5 ml, respectively (fig. 4). Ninety-two percent, 96%, and 100% of the values fell within an absolute difference to the mean observation of 15, 20, and 25 ml, respectively. In other words, the TDM and FTM yield similar volume assessments within a difference of 15 ml in 92% of cases, within 20 ml in 96% of cases, and within 25 ml in 100% of cases. For greater clarity, in figure 1, we can appreciate that, for example, for a volume of 300 ml in the stomach, the difference between the two methods is not greater than 25 ml. Thus, the two methods are essentially equivalent. Furthermore, the concordance correlation coefficient was nearly perfect (0.995) with high precision and accuracy (Pearson correlation coefficient of 0.996 and bias of 0.999) (table 4 and fig. 5).
We previously validated a mathematical model for calculating gastric volume (up to 500 ml) based on an ultrasound assessment of gastric antral CSA.7 Gastric sonography is a novel point-of-care application of diagnostic ultrasound that allows the clinical anesthesiologist to evaluate a patient’s gastric content and volume and thus aspiration risk at the bedside, and help guide anesthetic and airway management. However, for this new diagnostic tool to be clinically applicable, it needs to be not only valid (accurate under ideal study conditions) but also reliable (i.e., reproducible) with low intrarater and interrater variability.
The results confirm our hypothesis that measurements of gastric antral CSA using bedside ultrasound are highly reliable both within the same rater and among raters. The median relative difference between measurements was only 2.7% with a maximum relative difference not greater than 13%. Because fasted individuals have baseline gastric volumes of up to approximately 1.5 ml/kg (approximately 100 ml for the average adult) without a significant aspiration risk, the absolute volume differences observed of 9.5 ml (interquartile range of 3 to 22 ml) are well within clinically acceptable margins of error.21,22
Intrarater reliability is often higher than interrater reliability in many studies evaluating diagnostic tools, because one potential source of variance, the rater, is eliminated.10,23–26 However, in the current study, both intrarater and interrater reliability was similarly high. This may be a result of a rigorous definition and standardization of the scanning protocol used by all three raters. Lack of such standardization has been implicated in less optimal reproducibility measurements in other studies.10 Four fundamental components of our protocol included: locating the antrum in cross-section in the sagittal epigastric plane that coincides with the long-axis view of the aorta, placing the patient in the right lateral decubitus position, taking measurements between peristaltic contractions when the antrum is at rest, and measuring the antrum from serosa to serosa including the full thickness of the gastric wall. All of these factors affect antral size as well as the sensitivity of the prediction model particularly at low gastric volumes.6
A strength of our reliability assessment lies in the varied training and expertise of the three raters. Other studies assessing inter- and intrarater reliability have created idealized conditions for the reliability assessment which limited the generalizability of their results.23 The results suggest that gastric ultrasound assessment is reproducible not only by very experienced sonographers but also when performed by clinical anesthesiologists marginally exceeding the number of previous scans required to achieve 95% accuracy in qualitative gastric sonography (in this case, just over 50 previous gastric scans and 6 months’ experience).27 However, as with any new diagnostic tool, there are many aspects of training that remain to be defined. A previous study suggested a minimum of 33 gastric examinations followed by feedback are required to achieve a 95% accuracy rate in qualitative gastric sonography (differentiating an empty stomach from clear fluid or solid content).27 Due to the additional steps required for a quantitative volume assessment as evaluated in this study, we expect a greater number of scans would be required to achieve a similar success rate. This remains to be studied.
The secondary objective of the current study was to compare a well-established and frequently-used method to measure antral CSA (TDM) with the less often used but simpler and more convenient FTM. We confirmed our hypothesis that the FTM is equivalent to the TDM. All differences between the two methods fell within 25 ml which is clinically inconsequential. The FTM of area measurement relies on single dimensional measurements and requires no geometrical assumptions.11 By contrast, the TDM extrapolates area from the product of linear diameters and relies on the assumption that a cross-section of the gastric antrum is either a perfect circle or ellipse.5,9,10 The fact that the two methods were essentially identical may be explained by the observation that an antral cross-section is usually close to a perfect circle or ellipse.7–9 An advantage of the FTM is that it can be calculated with the technical capabilities of most portable ultrasound units and does not require an intermediate step of area calculation using the formula of the area of an ellipse. This makes the FTM more attractive and user-friendly for daily clinical application.
There are several limitations to our study. First, this method of gastric volume assessment has been validated for nonpregnant adults with normal gastric anatomy and a body mass index of up to 40 kg/m2. It is therefore not immediately applicable to other patient populations such as children, parturients, or patients with previous gastric surgery. Second, an inherent limitation of any study involving gastric sonography is the dynamic nature of the organ. Peristaltic contractions and gastric emptying start immediately after clear fluid ingestion and this may add an element of variability between successive measurements, artificially lowering the resulting reliability. To minimize this possible confounding factor, we used apple juice (rather than water) as the caloric content in the apple juice prolongs gastric emptying time.28 In addition, we minimized the time over which the three successive ultrasound scans were performed as described in the Materials and Methods. To further reduce the influence of gastric emptying on interrater reliability, the order of the raters was allocated at random.
Many questions regarding the clinical applicability and implementation of ultrasound gastric volume assessment remain. These are related to cost-effectiveness, the time and training required to achieve competence, and its applicability to specific patient subgroups as previously discussed. Further studies are required to establish whether the information obtained from this new diagnostic tool can improve the accuracy of aspiration risk assessment and thus improve clinical decision making and patient outcome.
In summary, ultrasound assessment of gastric CSA and gastric volume by clinical anesthesiologists is highly reproducible with high intrarater and interrater reliability. The FTM to measure gastric antral CSA is equivalent to the TDM.
Dr. Perlas received support for academic time through a University of Toronto Merit Award 2011–2013, Toronto, Ontario, Canada.
Dr. Chan received equipment and research support from SonoSite (Bothell, Washington), Philips (Bothell, Washington), and BK Medical (Peabody, Massachusetts). The other authors declare no competing interests.