CNN Model for Sleep Apnea Detection Based on SpO 2 Signal

,


Introduction
Sleep apnea-hypopnea syndrome (SAHS) is a recognized sleep disorder that significantly impacts a patient's health. The most common form of SAHS, obstructive sleep apnea (OSA), occurs when a person's airway becomes completely or partially blocked during sleep, causing respiratory or breathing difficulties (Adrian Shifren MD, Derek E. Byers MD, PhD 2006;Penzel, Schöbel, and Fietze 2018). SAHS is diagnosed using traditional polysomnography (PSG), considered the standard gold method for diagnosis. However, due to the PSG method's high cost and time, 90% of people with SAHS are reluctant to be diagnosed with this method (Lakhan et al. 2018). Therefore, this research aims to help in the diagnosis of SAHS by combining a deep learning technique with SpO 2 signals to offer an alternative approach to maintaining patient health.

Problem Statement
The polysomnography, or PSG, is the approach that is now regarded to be the gold standard for identifying sleep disorders, including SAHS. This method is currently being utilized in hospitals for a complete night's sleep study. Despite the advantages of PSG in diagnosing sleep disorders, it is expensive and inconvenient for patients, involving many sensors attached to the patient's body during diagnosis. In addition, recording apnea and hypopnea occurrences are performed manually, which is difficult, costs a lot of time, and necessitates prior experience (Choi et al. 2018). Those phases can take a long time before a treatment process is reached, which can lose the interest of the patient to be diagnosed (Mathew Dirjish 2017). For those reasons, 90% of people with SAHS do not want to be diagnosed due to the complexities associated with using PSG and its high cost (Lakhan et al. 2018). This unwillingness to diagnose significantly impacts patient health, causing hypertension, coronary artery disease, stroke, and many other serious impacts (Penzel et al. 2018;Rofouei et al. 2011).

Solution
Based on the abovementioned factors, there is a fundamental requirement to develop an alternative method to detect SAHS using fewer and simpler signals than PSG. It will be possible to enhance patient health by detecting SAHS early by creating a model using deep learning methods that will be able to diagnose SAHS based on SpO 2 data exclusively.

Literature Review
Different researchers propose various alternative methods to overcome the drawbacks of using the PSG and detecting SAHS based on using fewer and simpler signals, such as electrocardiogram (ECG) signals, Airflow (AF) signals and Saturation Pulse Oximetry (SpO 2 ). Some researchers have utilized a combination of signals to detect SAHS. Choi et al. (2018) have detected the AH events in real-time using the CNNs with Single-channel nasal (Pressure Signals). Deviaene et al. (2018) have screened out the SAHS patients at home rather than in hospitals using the random forest with Blood oxygen saturation signals (SpO 2 ). Lakhan et al. (2018) have classified the severity of SAHS using the Deep Neural Network (DNN) with Airflow sensing signals (AF). Nakano et al. (2007) have developed an algorithm to detect sleep-disordered breathing (SDB) based on signal processing with Airflow sensing signals (AF). In addition, Ashraf et al. (2021) have developed NRSM (node redeployment shrewd mechanism) for wireless sensor networks for collecting air signals. Gutierrez-Tobal et al. (2016) evaluated the usefulness of boosting algorithm AdaBoost (AB) in detecting and diagnosing SAHS and finding the possibility of achieving high performance utilizing single-channel AF. Xie and Minn (2012) the suggested classifier combination that improves the detection performance of SAHS by using the information supplied by the separate classifiers is an essential part of their work. Two signals were used: electrocardiogram (ECG) and saturation of peripheral oxygen (SpO 2 ) signals. These two signals were used separately as well as in combination. Mostafa et al. (2017) have detected sleep apnea based on the SpO 2 signal with the use of deep learning with an unsupervised technique known as Deep Belief Network (DBN). Almazaydeh, Faezipour, and Elleithy (2012) predict OSA by developing a neural network (NN) based on SpO 2 . Those signals can also be detected and achieved through a wireless system (Ashraf, 2020). Bodacious-instance Coverage Mechanism (tuned BiCM) and FOA algorithm can be used to detect wireless data ).
The literature review shows that various machine learning and deep learning approaches have been used to detect SAHS. The studies were grouped based on the type of signal that was used whether AF signal, SpO 2 signal, or a combination of the signals, as shown in Fig. 1.

System Design and Methodology
This part provides an overview of the entire system design and methodology. To begin, we will discuss the factors that contributed to our decision to use the SpO 2 signal rather than one of the other PSG signals. After that, the datasets used in the construction of the model are shown, and lastly, the suggested technique and its architecture are discussed. The design procedure of the system is shown in Figure 2.

Choosing the Signal
This study has developed a deep learning-based SAHS detecting model by utilizing saturation pulse oximetry (SpO 2 ) to reduce the complexities of using polysomnography with all of its diagnosis signals and also cover some of its drawbacks. This newly developed model can detect SAHS by analyzing the amount of blood in any patient.
There are many reasons for choosing the SpO 2 signal over other PSG signals. The first main reason is that it is easy to collect the signals at home rather than from hospitals or care centers using small and cheap equipment such as smartwatches. Furthermore, the SpO 2 signal is significantly important in detecting the SAHS event, as many apneic events are correlated with oxygen desaturation (Deviaene et al. 2018).

Dataset
This study has utilized two different datasets to build the model, which is, the Apnea-ECG Database (Goldberger et al. 2002;T Penzel et al. 2000) and the St. Vincent's University Hospital/University College Dublin Sleep Apnea Database (UCD database) (Saint Vincent Hospital). Both databases were available with open access on PhysioNet (PhysioNet), which provided a large amount of different physiologic signals for free access. The Apnea-ECG Database had 32 subjects, and eight had SpO 2 signals. The recordings range from less than 7 hours to about 10 hours. Twenty-seven males and seven women between the ages of 27 and 63 made up the subjects. (Almazaydeh et al. 2012;T Penzel et al. 2000). The UCD database had 25 subjects with different signals recorded, including the SpO 2 signals, with recordings that range from 5.9 to 7.7 hours. The subjects were 21 men and 4 women aged above 18 (Saint Vincent Hospital).
We found that the number of men was much higher than that of women in both datasets. This is likely due to the fact that the male-to-female ratio for OSA in the general population is estimated to be between 3:1 and 5:1 (Lin et al. 2008).

Apnea-ECG Dataset Preprocessing
The Apnea-ECG dataset uses a WFDB binary format, and a WFDB-python package (GitHub) was utilized to read the binary files of this dataset. The SpO 2 data in this dataset was sampled at a 100 Hz frequency, resulting in 6,000 data points per minute. The dataset was annotated using "A" (Apnea present) and "N" (No apnea present) tags on a minute-by-minute basis, so each minute of the data has one tag.
Given such a structure, we considered each minute as a separate classification sample, following the approach described by (Mostafa et al. 2017). Each sample was represented by an input vector with a dimensionality of 6,000, containing values of SpO 2 signal for each tick. In the program of this study, the sample was initially represented as a Python tuple object containing input vectors in the form of NumPy vector and class, represented by the Python Boolean type. In order to provide straightforward access to the whole dataset, it was interpreted as a list of tuples and then written to disk in the form of a Python pickle file. Because we believed that there was no need for downsampling at this stage, we refrained from carrying out any explicit downsampling. As our model will employ CNN pooling layers -and a combination of convolution and pooling layers are a "smarter" version of a downsampling process -"trainable" downsampling via max-pooling can be keep most relevant features when compared to simple downsampling. Dataset statistics are summarized in Tab. 1.

UCD Dataset Preprocessing
In the UCD dataset, the SpO 2 data was stored in a binary EDF format. We used a PyEDFLib package to open it. Unlike Apnea-ECG, the UCD dataset annotations stored the time of the exact occurrence of apnea events and did not contain minute-by-minute annotations. The SpO 2 data in the UCD database was sampled at an 8 Hz frequency, resulting in 480 data points per minute.
Annotations in the UCD dataset were denoted in text file format. They recorded the time when the recording was started and the duration of each apnea event. To make this data compatible with the Apnea-ECG technique, we manually divided all the data into 1-minute intervals and generated annotations, labeling each minute as "A" if an apnea episode happened during the provided time segment up to 10 seconds earlier, again using the process stated in (Mostafa et al. 2017).
To determine if an apnea event occurred within a given time segment, we converted string dates into timestamps using a Python Date Time module and computed relative intervals from the start of the recording time by subtracting the start of the recording timestamp from each event time. Finally, we produced a Pickle file with a structure similar to the one described for the Apnea-ECG dataset. Dataset statistics are summarized in Tab. 2. The structure of photoreceptor fields in the human retina served as inspiration for the Convolutional Neural Network (CNN), which was initially developed for image classification tasks (LeCun et al. 2014). They outperform regular neural networks with fully connected layers because they have an important property known as translational invariance. This means that CNNs are able to recognize important image features regardless of their precise spatial location in the image. This gives CNN's a performance advantage over regular neural networks. Since apnea events can also occur at any time within a given timeframe (in our case, 60 seconds), a translational invariance property was also very important for developing the SAHS detecting model. In addition, CNN's have been employed in the past for one-dimensional processing data, such as sound categorization given raw waveform (Tilak Purohit, Atul Agrawal 2018). Additionally, CNNs have been applied to apnea detection, such as by (Choi et al. 2018), which justifies our decision.
In a classical setup, the structure of CNN contains two main types of layers: convolutional layers and pooling layers:  Convolutional Layers: The convolutional layer, which is regarded as the primary fundamental building component of CNN, is consistently the first layer of the model (Seif 2018). In this layer, the convolution operation is performed according to Eq. 1.
Where xlk = kth feature map in layer l, blk = bias of the kth feature map in layer l, wl-1k = kth convolutional kernel from all features in layer l-1 to the kth feature map in layer l, yl-1i = output of the ith feature map in layer l-1, N = the number of elements in layer l-1, conv = the vector convolution, f(.) = the non-linear activation function.
 Pooling Layers: These layers came after the convolutional layers to simplify their information. They were characterized in this manner as they decreased the spatial dimensions and reduced computational costs (Choi et al. 2018;O'Shea and Nash 2015). The most common pooling operation is max-pooling. Max-pooling is achieved by applying a max filter to non-overlapping parts of the initial representation (signal).
In addition, the CNN's used in classification tasks can also contain fully connected layers and classification layers:  Fully Connected Layers: As per their name, they are fully connected to the outputs of the previous layers. Their values are computed according to Eq. 2.
Where x is the input vector, W is the weight matrix, b is the bias vector, and h is some non-linear activation function. W and b are trainable parameters.
The classification layer was usually a fully connected layer responsible for each neuron activation for representing a specific class. In the two-class classification model, the classification layer can be represented with one neuron or two neurons (one neuron per class).

Proposed Model
The architecture of CNN for sleep apnea detection (Fig. 3) consists of 8 layers as follows:  Three one-dimensional convolution layers with Rectified Linear Units activation (ReLU).
 Three max-pooling layers.
 One fully connected layer with a sigmoid function.
 One softmax layer with a cross-entropy loss function.
For the Apnea-ECG dataset, our final model used convolution layers with filter size = 11, stride = 1, 32 feature maps, and a max-pooling layer with pooling window size = 8. The final fully connected layer contained 32 neutrons. For the UCD dataset, our final model used convolution layers with filter size = 5, stride = 1, 32 feature maps, and a max-pooling layer with pooling window size = 8. The final fully connected layer contained 32 neutrons.
A random search, which was demonstrated to be superior to the Grid search approach, was used to find the hyper-parameters, including the number of layers (Bergstra and Bengio 2012), with five total runs. Choosing the architecture that produced the greatest results on the development set was decided. Due to the restricted processing resources, a more thorough hyper-parameter search was not possible.

Experimental Results
This section started with discussion of the application and programming language utilization. Then, it has discussed the training phase and the testing phase of the model. Finally, the optimization details have been discussed.

Application and Programming Language
In this research, we used Python 3.7 as a programming language (Python 2014). The most widely used deep learning frameworks focus on Python as their core language, making Python the current standard for deep learning.

Training and Testing Phase
The model was trained with the backpropagation method, with a mini-batch gradient descent of 50 training samples per mini-batch and a learning rate controlled by the Adam optimization algorithm (Kingma and Ba 2015). The training came to an end when the loss stopped improving on the training set.
The dataset was split into training, validation, and test sets in proportions of 70%, 20%, and 10%, respectively. The validation set was used to determine the hyper-parameter, as described above, and for early stopping, so it was important for us to have enough data in the validation set to avoid sampling error.

Performance Measures
This study adopted a few commonly used measures for SAHS detection to assess the classification performance of our model. The performance measures that were calculated included the accuracy, sensitivity, specificity, precision, and F1 measure.

Performance Evaluation
Our experiments were designed and conducted to study the efficiency and effectiveness of our proposed model. We evaluated the performance of CNN to detect SAHS using the Apnea-ECG and UCD databases. The results are shown in Tab. 3.

Comparison with Other Works
This study has compared the proposed model's performance with other sleep apnea detection techniques to demonstrate its efficacy. The results are shown in Tab. 4 and Tab. 5, respectively. As becomes clear, our proposed model has achieved comparable or better performance to the other proposed models.

Ablation Study
We carry out an ablation investigation to determine how significant the CNN architecture is for identifying SAHS. During this study, the convolutional and pooling layers are eliminated. We have studied a model with a fully connected layer, equivalent to a regular ANN with one hidden layer, and a model with a classification layer only, equivalent to logistic regression. The results are shown in Tab. 6. It is apparent that both networks obtained lower results in all metrics, thus, demonstrating that CNN architecture is functioning well for this task.

Discussion
After analyzing the tables and figures and the results, it can be stated that the Apnea-ECG dataset performs better than the UCD. This may be a consequence that the Apnea-ECG was annotated on the minute level while, for the UCD, we had to convert annotations into a necessary format using an automated procedure. Alternatively, it is also possible that a higher resolution Apnea-ECG data assisted in better predictions.
As we have mentioned before, CNN has outperformed regular neural networks with fully connected layers because of the translational invariance property that recognizes important image features regardless of their exact spatial location in the image. Additionally, because apnea events can occur at any time within a given timeframe (in our case, 60 seconds), a translational invariance property is also important for our task, as we observed that our proposed model had achieved comparable results to those of CNN.

Summary and Conclusion
A Sleep Apnea-Hypopnea Syndrome detection system is important for monitoring and maintaining patient health.
In this research, we implemented a CNN model to detect SAHS based on the SpO 2 signal only. The developed CNN model of this study is applicable for detecting SAHS in an easy way that also proves less costly than using the standard method. This research was developed and carried out to investigate the efficacy and efficiency of our suggested models utilizing data from the Apnea-ECG and UCD databases. We evaluated the performance of the proposed CNN model and compared it with the results of similar approaches in the literature. The results showed that our model achieved comparable or better performance compared to other previous studies. CNN achieved an accuracy of 95.5% with the Apnea-ECG database and 90.2% with the UCD database. Finally, after analyzing the obtained results, we can conclude that using CNN to detect SAHS based only on the SpO 2 signal will reduce complexities and dependency on the PSG test measures. The developed model makes a useful and practical technique to predict whether the patient has SAHS or not; therefore, it can prove crucial for maintaining patient health.