Abstract
Undetected apnea can lead to severe hypoxia, bradycardia, and cardiac arrest. Tracheal sounds entropy has been proved to be a robust method for estimating respiratory flow, thus maybe a more reliable way to detect obstructive and central apnea during sedation.
A secondary analysis of a previous pharmacodynamics study was conducted. Twenty volunteers received propofol and remifentinal until they became unresponsive to the insertion of a bougie into the esophagus. Respiratory flow rate and tracheal sounds were recorded using a pneumotachometer and a microphone. The logarithm of the tracheal sound Shannon entropy (Log-E) was calculated to estimate flow rate. An adaptive Log-E threshold was used to distinguish between the presence of normal breath and apnea. Apnea detected from tracheal sounds was compared to the apnea detected from respiratory flow rate.
The volunteers stopped breathing for 15 s or longer (apnea) 322 times during the 12.9-h study. Apnea was correctly detected 310 times from both the tracheal sounds and the respiratory flow. Periods of apnea were not detected by the tracheal sounds 12 times. The absence of tracheal sounds was falsely detected as apnea 89 times. Normal breathing was detected correctly 1,196 times. The acoustic method detected obstructive and central apnea in sedated volunteers with 95% sensitivity and 92% specificity.
We found that the entropy of the acoustic signal from a microphone placed over the trachea may reliably provide an early warning of the onset of obstructive and central apnea in volunteers under sedation.
Pulse oximetry and capnography are recommended by the American Society of Anesthesiologists for monitoring spontaneous breathing in patients receiving moderate sedation.
This study investigated whether changes in the entropy of tracheal sounds provide an early warning of the onset of obstructive and central apnea in sedated patients.
A microphone placed over the trachea detected obstructive and central apnea in sedated volunteers with 95% sensitivity and 92% specificity. These data suggest that the entropy of tracheal sounds may provide an early warning of the onset of apnea in sedated patients.
UNDETECTED apnea can lead to severe hypoxia, bradycardia, and even cardiac arrest.1 Pulse oximetry and capnography are recommended by the American Society of Anesthesiologists for monitoring spontaneous breathing in patients receiving moderate sedation.2 A pulse oximeter detects hypoxemia by continuously measuring the oxygen saturation of arterial blood so that apnea can be discovered, but there can be significant delay between the onset of apnea and oxygen desaturation, especially when the patient receives supplemental oxygen or when hypothermia and vasoconstriction are present.3–5 Capnography detects apnea by the absence of carbon dioxide in expired gas, but reliably sampling exhaled gas with a face mask or nasal cannula is problematic in nonintubated patients.6,7
Tracheal sounds originate from the vibrations of the tracheal wall and surrounding soft tissues caused by gas pressure fluctuations in the trachea.8 The signals from a microphone or a piezo-electric film transducer placed over the trachea have been processed to monitor respiratory rate and to estimate respiratory flow in awake subject,9–12 and to diagnose sleep apnea–hypopnea syndrome during normal sleep.13,14 Entropy provides a measure of the complexity of tracheal sounds and has been shown to be a robust method for estimating respiratory flow.10–12 Our study hypothesis is that changes in the entropy of tracheal sounds will provide an early warning of the onset of obstructive and central apnea in sedated patients.
Materials and Methods
Volunteer Recruitment and Instrumentation
After approval by the Human Institutional Review Board at the University of Utah Health Sciences Center (Salt Lake City, Utah), informed written consent was obtained from 24 healthy adult volunteers. The study was originally designed and powered to develop a propofol–remifentanil pharmacodynamic model and 24 volunteers were chosen because Short et al.15 had shown that 20 subjects would be required to define a model surface adequately using a crisscross study design for assessing drug interaction. The ethics approval was to collect the original volunteer data. We conducted a secondary analysis of the data collected during the pharmacodynamics study. All the data in our study were collected by the original investigators. Eligible subjects had an American Society of Anesthesiologists Physical Status of I or II, were nonsmokers, 18 yr of age or older, and had a body mass index between 18 and 28 kg/m2. Subjects were not eligible if they had a history of significant alcohol or drug abuse, allergy to opioids or propofol, sleep apnea, or chronic drug requirements or medical illnesses that are known to alter the pharmacokinetics or pharmacodynamics of opioids or intravenous anesthetics.
We followed the same study protocol that LaPierre et al.16 used to develop his pharmacodynamic model. Subjects were sedated by injecting a combination of propofol and remifentanil intravenously.16 The amount of each drug was varied according to the chart in figure 1. For each period, the primary infusion stepped through five predetermined effect-site concentration targets, while the secondary infusion target was held at a constant effect-site concentration target. The maximum targeted concentrations were those that would allow insertion of a 42F (14 mm diameter) flexible blunt end bougie (model 215542; Teleflex Medical, Research Triangle Park, NC) through the oropharynx and advancement to 40 cm into the esophagus (as if to prepare a patient for an endoscopy procedure) with no gag reflex, no voluntary or involuntary movement, and no change in heart rate or arterial blood pressure >20% from baseline values.
Volunteers were supine in a study room that resembled a single patient postanesthesia care unit (PACU) room. Subjects were monitored with an electrocardiograph, pulse oximeter, noninvasive blood pressure monitor, and expired carbon dioxide monitor (Ultima, Datex/Ohmeda, Helsinki, Finland). Inspired and expired airway flow and volumes were measured using a pneumotachometer (CO2SMO, Novametrix, Wallingford, CT) attached to a tight-fitting mask. All subjects received oxygen by face mask at 2 l/min. A Mapleson E circuit was used to provide manual ventilation when required to maintain adequate oxygenation. Before administration of the sedative drugs, volunteers received 0.2 mg glycopyrrolate intravenously to prevent bradycardia and decrease secretion, and 30 ml sodium citrate by mouth.
Recording Tracheal Sounds
Tracheal sounds were recorded using a microphone (WM-56A103 Panasonic, Secaucus, NJ) placed in a metal precordial stethoscope cup (Wenger #00-390-C, AINCA, San Marcos, CA). The cup was affixed to the neck with a double-stick disc (#2181 3M, St Paul, MN) just below the larynx and above the suprasternal notch (fig. 2). The audio signals were digitized at 22,050 Hz and recorded to a computer hard drive using an audio sound card (Sound-Blaster Audigy, Creative, Singapore). Because data from the pneumotachometer and the microphone were recorded on different computers, acoustic data were synchronized to the flow rate by recording the time stamp from the flow rate computer on the second channel of the audio data recording.
Of the 24 subjects who participated, data from four subjects were discarded leaving data from 20 subjects to be processed (table 1). Data from two of the subjects were lost during transfer of the audio data between computers. Flow data from one subject were lost during data transfer between two computers. Data from one subject were recorded with a microphone different from that used for other recordings.
After each step increase in drug effect-site concentration (fig. 1), and after the model predicted concentration was within 5% of the target concentration, tracheal sounds and corresponding airway flow were recorded continuously. The recording was divided into 10-min blocks. We analyzed only those 10-min blocks that were recorded when the volunteer was quietly at rest, when the volunteer was not talking, and when there were no other study investigator initiated interruptions such as bougie insertion. The final block in the series was often less than 10 min in length but was included in our analysis if the above criteria were met. Some blocks collected at low drug concentrations, where apnea was not present, were eliminated from analysis so that apnea was present in the signals selected for analysis approximately 20% of the time. This resulted in 90 blocks being used for further analysis (26 of the 90 blocks were less than 10 min in length).
Analysis of Tracheal Sounds
Using the mixed programming approach available in LabVIEW 8.6 (National Instruments Inc., Austin, TX) and MATLAB 2008b (MathWorks Inc., Natick, MA),17 the recorded sounds were bandpass filtered by a fifth order Butterworth filter at a cutoff frequency of 150–800 Hz to reduce the amplitude of heart sounds, common electronic noise at 60 Hz, and high-frequency noise. An example of the filtered tracheal sounds is shown in figure 3A. The bandpass filtered sounds were segmented into windows of 20 ms in duration with 75% overlap between adjacent windows. The window size and overlap were selected based on the results of studies on acoustical flow estimation.10,11 The logarithm of the tracheal sound Shannon entropy (Log-E) in each window was calculated. For a set of events with Probability Density Function of pi, i = 1, …, N, the Shannon entropy is defined as .18 The Log-E signal was sampled at 100 Hz in order to match the 100-Hz sampling rate for the reference pneumotachometer. An offset equal to the absolute value of the most negative value in each 10-min block was added to the Log-E to make it more convenient to compare Log-E to the airway flow signal, as shown in figure 3, B and C.
To find a Log-E threshold that distinguishes between the presence of normal tracheal sounds and apnea, each 10-min block of Log-E data was divided into 200 3-s windows. The data point in each 3-s window with the smallest amplitude was recorded to a data pool. The Log-E threshold (λ) for that subject and that drug plateau was calculated as two times the 80th percentile of the Log-E data in the data pool. We decided to use two times the percentile because of a recommendation from a previously developed tracheal sounds envelope threshold determination algorithm.19,20 We decided to use the 80th percentile based on our experience analyzing our pilot data. Using the Log-E threshold, the tracheal sounds present during an inspiration or expiration period were classified as an “inspiration or expiration” when the Log-E signal crossed the threshold. We were not able to distinguish the difference between inspiratory and expiratory sounds from tracheal sounds alone, and thus we classified each period as an inspiration or expiration. Each crossing in which the Log-E(n) was less than Log-E(n + 1) indicated the beginning of an inspiration or expiration. Each crossing in which Log-E(n) was greater than Log-E(n + 1) indicated the end of an inspiration or expiration.
Figure 4 is an example of the tracheal sounds, Log-E, and airway flow, as recorded during a period of apnea. Figure 4B shows that apnea was identified within the Log-E signal, when the time between the end of the last inspiration or expiration and the start of the next inspiration or expiration was longer than 15 s. The red line in figure 4B shows a 21.6-s period of apnea.
The airway flow signal was used as a reference to identify valid breaths in each data block. For a breath to be valid, the inspired volume had to exceed 50 ml and the expiratory flow rate had to be less than −3 l/m at the start of exhalation and greater than −3 l/m at the end of exhalation, as shown in figure 5. The inspired volume and the expiratory flow threshold ensure that “valid breaths” do not include those breaths (with insufficient inspired volume and expiratory flow) that occurred when the airway was severely or completely obstructed. However, those breaths that occurred during mild or moderate airway obstruction were classified as valid breaths if the inspired volume was >50 ml and if the expiratory flow rate <−3 l/m at the start of exhalation and>−3 l/m at the end of exhalation.
Figure 4C shows that apnea was identified within the airway flow signal when the time between the end of the last breath and the start of the next breath was longer than 15 s. The green line in figure 4C shows a 20.6-s period of apnea.
The entropy algorithm’s performance was classified as true positive (TP) if apnea was detected from both the tracheal sounds and respiratory flow, false negative (FN) if detected from flow but not from sounds, false positive (FP) if detected from sounds but not from flow, and true negative (TN) if detected from neither flow nor sounds. Sensitivity (TP/[TP + FN]), specificity (TN/[TN + FP]), positive predictive value (PPV = TP/[TP + FP]), and negative predictive value (NPV = TN/[TN + FN]) were calculated. Sensitivity and specificity are properties of our test, whereas PPV and NPV also depend on the prevalence of apnea in our subject population.
Tracheal sounds recorded over the neck contain unwanted heart sounds and ambient noise. To remove these artifacts, an inspiration or expiration was considered valid if the time between the beginning of the inspiration and the end of the inspiration, or the beginning of the expiration and the end of the expiration was greater than an appointed time value. We measured the apnea detection algorithm’s reliability for appointed time values that ranged from 0.3 s to 1.2 s, in 0.1 s increments. The lower value of 0.3 s was used because it is very unlikely that a spontaneously breathing adult will have a respiratory rate greater than 75 breaths/min and, hence, an inspiratory or expiratory time less than 0.3 s.21 (60 s/min/75 breaths/min = 0.8 s/breath: inspiratory time = 0.3 s, expiratory time = 0.3 s, expiratory pause = 0.2 s). The upper value of 1.2 s (23 breaths/min) was selected so as not to eliminate too many real inspirations or expirations.
We used receiver operating characteristic (ROC) analysis and a five fold cross-validation method22 to find the appointed time value that best discriminates between normal tracheal sounds and hearts sounds (including other artifacts). The 20 study subjects were first randomly partitioned into five equal sized groups (four subjects in each group). Four of the five groups were used together to find the best appointed time value (training set of 16 patients). The remaining group (four patients) was retained as the validation set to test the algorithm’s performance. Sensitivity and specificity were measured for the training set using an appointed time value of 0.3 s. Sensitivity and specificity were measured again for each of the appointed values from 0.3 to 1.2 s in 0.1 s increments. The sensitivities and specificities of all the tested appointed time values were presented as 10 points in a ROC space. The optimal operating point, which should correspond with the best appointed time value in the artifact rejection algorithm, was determined by finding the sensitivity and specificity pair that maximized the function (sensitivity-m[1-specificity]),23 where m = (CFP/CFN) × ((1-P) P), CFP is the cost of false-positive detection of apnea (alarm fatigue), CFN is the cost of false-negative detection of apnea (missed events), and P is the prevalence of apnea in the studied population. We used a ratio of CFP/CFN of 1/10 assuming that the cost of a missed apnea is 10 times higher than the cost of a false alarm. P is 20% because we selected a mixture of study subject recordings that contained apnea 20% of the time. Using the data recorded from the four patients in the validation set, sensitivity, specificity, PPV, and NPV were calculated using the best appointed time value that had been identified from the training set.
The process was repeated four more times, using each training set to calculate the best appointed time value and then using the four validation sets to calculate sensitivity, specificity, PPV, and NPV based on the appointed time values. During the five iterations, each of the five groups was used exactly once as the validation set. The statistics from the five validation sets were averaged to produce a single estimation of our apnea detection algorithm’s sensitivity, specificity, PPV, and NPV. The average of the five best appointed time values was then used on the data from all 20 subjects to evaluate the entropy algorithm’s performance (TP, FN, FP, and TN). In our analysis, we assumed that apnea events were independent of each other. There is the possibility that the errors in identifying apnea occurred in only select individuals. In our analysis, we did not consider the interindividual aspects of the apnea detection algorithm’s accuracy.
Performance was calculated a second time by counting the number of discrete apnea episodes in the respiratory flow and tracheal sounds recordings. Figure 6 shows an example with one apnea event detected from both respiratory flow and tracheal sounds, one apnea event detected from flow but not from sounds, and one event detected from sounds but not from flow. In order to calculate the number of nonapnea events, we first calculated the total time both methods reported normal breathing (TN1 + TN2 + TN3 + TN4+….). We divided this total by the average length of the periods of apnea detected from respiratory flow. In figure 6, the total time both methods reported normal breathing is (TN1 + TN2 + TN3 + TN4). In figure 6, the average length of the periods of apnea in the respiratory flow signal is (15s+ 25 s) / 2 = 20 s. Therefore, the number of nonapnea events in figure 6 is (TN 1 + TN 2 + TN 3 + TN 4) / 20 s. Statistical analyses were performed with SPSS for Windows version 13.0 (SPSS Inc., Chicago, IL).
Results
Table 1 lists the demographics of the 20 study volunteers. Apnea occurred 322 times during the 12.9-h study as propofol and remifentanil were administered to simulate endoscopy procedures. Table 2 lists the results of the ROC analysis and the five fold cross-validation process. When tracheal sounds that were shorter than 0.84 s were rejected as artifact, the entropy-based apnea detection algorithm had a 95% sensitivity (95% CI: 91–98%), 92% specificity (95% CI: 87–97%), 76% PPV (95% CI: 65–87%), and 98% NPV (95% CI: 97–100%).
Table 3 shows that the volunteers were apneic for a total of 147 min, as detected from both the tracheal sounds and the respiratory flow. Tracheal sounds failed to detect apnea for a total of 8 min. Tracheal sounds misclassified periods of normal breathing as apnea for a total of 44 min.
Table 4 shows that the tracheal sounds correctly detected apnea 310 times of the 322 times it occurred. Meanwhile, tracheal sounds falsely reported apnea 89 times during a period that contained 1,285 normal breaths.
Discussion
The acoustic signal from a microphone placed over the trachea detected obstructive and central apnea in sedated volunteers with a sensitivity of 95% and a specificity of 92%. The entropy of the tracheal sounds proved to be a reliable way to detect apnea. The entropy of tracheal sounds may reliably provide an early warning of the onset of apnea in sedated patients.
We found that the entropy of tracheal sounds might have better performance for the detection of apnea than electrocardiogram, or acoustic or capnographic methods. Mazzanti et al.24 detected apnea in patients in a sleep laboratory with a sensitivity of 87% and a specificity of 85% using an electrocardiography-derived respiration monitoring method. Ramsay et al.25 evaluated a new acoustic monitor from Masimo Inc. (Irvine, CA) and a capnometer from Oridion Inc. (Needham, MA) by collecting data from patients in the PACU. The acoustic monitor detected apnea with 81% sensitivity and 99% specificity. The capnometer detected apnea with 62% sensitivity and 98% specificity.25 Although both monitors had high specificity, their low sensitivity may result in more apneic events being missed than might be tolerated in a PACU.
Suprasternal notch stethoscopes were used in almost all anesthetics 30 yr ago to monitor breathing effectively and accurately during anesthesia.26,27 However, listening to the patient’s tracheal sounds is inconvenient and requires constant attention. Sierra et al.9 were successful in automatically measuring respiratory rate from the amplitude envelope and frequency content of tracheal sounds, but they did not measure the ability of their algorithm to detect apnea. Nakano et al.13 and Yadollahi et al.14 demonstrated that tracheal sounds analysis has a relatively high performance for the diagnosis of sleep apnea–hypopnea syndrome during normal sleep. However, they did not focus on the early detection of the onset of apnea, especially sedation-induced apnea.
Entropy is a term borrowed from Thermodynamics where it is used to characterize the degree of uncertainty (complexity) of the state of a system. Shannon used entropy to characterize the information content of a transmitted signal and the capacity of transmission channels. Shannon Entropy has been shown to be a robust method for estimating respiratory flow rate from tracheal sounds.10–12 In our study, we used the logarithm of entropy because it can better handle the change in the intensity of the tracheal sounds that range from very quiet during restful breathing to very noisy during snoring.
Our second innovation is an adaptive breath detection threshold. Our prototype breath detection algorithm used a fixed amplitude Log-E threshold to identify the start and end of each inspiration or expiration.19,20 It performed poorly over the range of test subjects and propofol and remifentanil concentrations. Our final algorithm identifies and uses an adaptive breath detection threshold that automatically and continuously adjusts the Log-E threshold, thus compensating for differences in tracheal sound amplitude between patients and for changes that accompany deeper levels of sedation.
Our third innovation is the introduction of a ROC method to identify the optimum threshold for our artifact rejection algorithm. Tracheal sounds shorter in duration than 0.84 s are rejected as artifact. As we increased the duration of this appointed artifact rejection time, sensitivity improved as more of the heart sounds and longer periods of ambient noise were rejected. Specificity decreased as sounds produced by shallow inspirations or expirations failed to remain above the Log-E threshold long enough to be detected as valid inspirations or expirations. We measured the algorithm’s sensitivity and specificity when using appointed time values ranging in length from 0.3 to 1.2 s. We used the ROC space to find the appointed time value that resulted in the maximum value for the function: sensitivity-m(1-specificity), where m = CFP/CFN ×((1-P)/P). CFP is the cost of false-positive alarms, CFN is the cost of false-negative alarms (missed events), and P is the prevalence of apnea in the studied population. In our study, apnea was present 20% of the time (P = 0.2) because we selected for analysis the tracheal sounds recorded just before insertion of a bougie, when drug levels were high.
Assigning a cost to false alarms and missed events is complex and is a specialized field in medicine. Financial costs or health costs can be viewed from the perspective of the patient, the care provider, the insurer, society, and others. In our study, we used the ratio for CFP/CFN of 1/10, which results in a sensitivity of 95% and a specificity of 92%. When anesthesiologists monitor sedated patients during endoscopy, the cost of missing a period of apnea is approximately 10 times higher than the cost of a false alarm. Although there is cost associated with false alarms (alarm fatigue and delayed treatment), the cost of undetected apnea (hypoxia, bradycardia, hypercapnia, and cardiac arrest) seems much higher.
If our apnea detection method were to be used by a nurse during an endoscopy procedure, a more appropriate ratio for CFP/CFN may drop to 1/20 because a nurse has less training and experience in the detection of apnea than an anesthesiologist, and the cost of undetected apnea seems higher. The best appointed time value in our artifact rejection algorithm would then be 0.98 s, providing 96% sensitivity and 87% specificity. Tuning the algorithm to perform with higher sensitivity (fewer missed periods of apnea) and lower specificity (more false alarms) seems desirable, given a nurse’s need for support in detecting each period of apnea and a nurse’s increased tolerance for false alarms.
If our apnea detection method were to be used in a PACU where the prevalence of apnea is approximately 5%, a more appropriate CFP/CFN may be 2/10 because PACU nurses care for multiple patients at a time and the cost of false alarms and alarm fatigue is more of an issue. The best appointed time value in our artifact rejection algorithm would be 0.5 s, resulting in 91% sensitivity and 97% specificity. The increase in specificity (fewer false alarms) hopefully decreases alarm fatigue and decreases response time when apnea occurs.
The more conventional approach to determining the optimal operating point is to choose the point closest to the upper left corner (coordinate [0,1]) of the ROC space. This approach would result in a best appointed time value for our artifact rejection algorithm of 0.64 s and 93% sensitivity and 96% specificity. Because this conventional approach assigns equal cost to sensitivity and specificity and fails to consider the prevalence of apnea, we feel our new approach will result in a better outcome.
Limitation
In our study, ambient noise levels were similar to those found in a single patient private room. In the operating room, noise levels can be higher, average levels range from 55 to 86 dB.28 In a five-bed PACU, noise levels are at 67 dB.29 In our tracheal sounds monitor, we used a metal precordial stethoscope cup over the microphone to dampen background noise. Adaptive noise cancellation technology may be necessary to remove ambient noise and improve sensitivity to apnea before our system can be used in these noisier environments.30
Unfortunately, our recordings in young healthy volunteers did not contain sufficient data recorded during severe airway obstruction to measure our algorithm’s ability to detect apnea during severe airway obstruction. When a patient’s airway is severely obstructed, small inspired tidal volumes and low expiratory flow rates can generate high-intensity tracheal sounds (snoring). Because the Log-E is also high during snoring, our algorithm might report the occurrence of a normal inspiration and expiration when a breath’s inspired tidal volume and expiratory flow rate are too small to provide adequate ventilation. However, our algorithm should perform well during complete airway obstruction because there will be no tracheal sounds during complete airway obstruction. Our algorithm should perform well during mild and moderate airway obstruction because snoring sounds will be classified as valid periods of inspiration and expiration, thus minimizing the number of false apnea alarms when breaths are adequate to maintain ventilation.
In our analysis of the results, we assumed that apnea events were independent of each other. There is the possibility that the errors in identifying apnea occurred in only select individuals. However, almost every volunteer had FN and FP events, and the sensitivity and specificity appeared to be acceptable for every individual volunteer. Still, the interindividual aspects of the apnea detection algorithm’s accuracy should be evaluated in a future study that includes patients with abnormal airways or abnormal pulmonary physiology.
We did not explore how stethoscope placement affects performance. We did not evaluate the real-time performance of our algorithm in that the adaptive Log-E threshold for breath detection was determined using recorded data. Performance during start-up should be measured in a prospective real-time study, to see how quickly the Log-E threshold adapts.
In conclusion, a microphone placed over the trachea detected obstructive and central apnea in sedated volunteers with 95% sensitivity and 92% specificity. The entropy of the tracheal sounds may provide an early warning of the onset of apnea in sedated patients. Future work could be directed toward the detection of apnea during severe airway obstruction and the development of noise cancellation to improve the algorithm’s sensitivity.