Acoustic lung signals analysis based on Mel frequency cepstral coefficients and self-organizing maps Análisis de señales acústicas de pulmón basado en coeficientes cepstrales de la escala Mel y mapas auto-organizados Análise de sinais acústicos de pulmão baseado em coeficientes cepstrais da escala Mel e de mapas auto-organizados

This study analyzes acoustic lung signals with different abnormalities, using Mel Frequency Cepstral Coefficients (MFCC), Self-Organizing Maps (SOM)


I. IntroductIon
Chronic Respiratory Diseases (CRD) are a critical public health problem in developing countries [1,2].Diagnosis of this kind of diseases can be challenging for the medical staff when they have limited resources (like rural regions far from big cities), making the diagnosis process to vary according to access to medical care of each patient [3].In this way, new technological tools can contribute to clinicians and physicians in diagnostic tasks, providing additional information.In addition, for respiratory diseases diagnosis, traditional methods to assess lung functions are based on auscultation.Disadvantages of these methods are related to the use of the stethoscope, because it is a subjective process that depends on stethoscope characteristics and the capabilities of the physician [4,5].
Computer-aid decision support systems are commonly used in the biomedical fields because of the information they can provide in diagnosis assignments [6].This information is useful for the medical staff when extra help is needed.Most of these systems take advantage of previous stored data, conducting a procedure known as data mining [7], where Artificial Neural Networks are preferred due to their flexibility treating any kind of data.
Artificial Neural Networks (ANN) are mathematical tools for modeling in high dimensional classification problems.The ANN establish a non-linear relationship between input variables and known outputs in a supervised learning system [8].Another applications of ANN are developed in an unsupervised learning, known as tasks of clustering.Examples of ANN in respiratory diseases diagnosis can be seen in [9][10][11], where clinical and epidemiological variables are used to train neural models.
For ANN training, it is necessary to extract from datasets parameters that conform the input vector.These features can be extracted from patients' data by using signal processing representation or image processing parameters.In this study, lung acoustic signal is acquired and processed by Mel Frequency Cepstral Coefficients (MFCC) to obtain representative parameters of each signal.A database was built with these coefficients and then used in the ANN training.This signal processing technique has shown good results in representing acoustic signals of the respiratory system [12][13][14].
The present work studies the use of ANN based on Self Organizing Maps (SOM), as a pre-processing for clustering task developed through K-means algorithm.Subsequently, SOM was used as a classifier of acoustic lung sounds resulting from respiratory abnormalities.Experiments using signal processing to obtain features, which were used as inputs in the ANN training, are presented.Three different classes were defined, two representing abnormal sounds, and a third class representing normal sounds.Results are compared with previous studies, which used Gaussian Mixture Models and Support Vector Machines for the classification [12].Other studies [15,16] utilized methodologies including neural networks, but without using MFCCs.

II. MaterIals and Methods
First, the used database and signal processing implemented to extract features from each signal are presented.Then, characteristics about neural network architecture and training are described.

A. Database
RALE database [17], developed by the University of Manitoba, Winnipeg-Canada, was utilized in this study.This repository is composed by thirteen recordings obtained from patients who exhibited normal breath, crackles, wheezes and other abnormalities found in acoustic lung sounds.These signals were highpass filtered at 7.5 Hz to suppress any DC offset by using a first-order Butterworth filter.Additionally, a second eighth-order low-pass Butterworth filter at 2.5 kHz was applied to avoid aliasing.All signals were sampled at 10 Hz.
Table 1 shows the number of signals for each class, according with abnormalities located in the database.Crackles represent discontinuous explosive adventitious lung sounds, and are obtained from patients with cardio-respiratory disorders.Characteristics of these sounds are the time length, less than 20 milliseconds, and frequency range, which typically ranges from 100 up to 2000 Hz [18].The waveform of the wheeze signals is similar to a sine wave with fundamental frequency around 100-2000 Hz, and time length for this kind of signals varies between 80 and 250 milliseconds [18].In this way, the speech is modeled by a more human answer, allowing a more efficient signal processing [19].
MFCC process computation is carried out first with signal segmentation into frames, and DFT is calculated.Afterward, the spectrum is filtered using thirteen triangular windows corresponding to the Mel frequency scale.Logarithmic functions are applied to the energy computed in the Mel frequency bands, and Discrete Cosine Transform (DCT) is used for each logenergy.Finally, the MFCC correspond to the amplitude spectrum provided after DCT (Figure 1).In this study, signal was divided into frames of 30 milliseconds with a frame shift of 10 milliseconds.For each frame, thirteen coefficients are computed to represent the acoustic lung signal based on performance obtained in previous studies [12,14,19].

C. SOM neural networks
SOM neural networks are capable of arranging the input data into a discretized two-dimensional space known as map, which attempts to preserve the topological properties of the input space.This is motivated by behavior of visual, aural and sensory areas of the human cerebral cortex [20,21].
The main advantage of SOM architectures, compared with other neural network models, is the training because in most of the cases it is unsupervised.This is useful in clustering tasks because similarities in the data can be found by the map [20].
SOM uses the information from the input to do a representation across a nonlinear mapping in an output space with reduced dimensionality.This new space is taken to analyze the original dataset in a graphical way, where different areas of the map preserve characteristics of the classes employed in the training process.
Learning process is composed by three stages: competitive, cooperative and adaptive.In competitive learning, Euclidian distance (weights) from each input to all units or neurons is computed.The unit with the most similar weight to the input is defined as the best matching unit (BMU).Then, a cooperative process is given around BMU, and units close to it are updated based on a neighborhood function.Finally, adaptive process (1) changes BMU weights according to the input [20].This is reached through the expression: ( where ω i (t) are weights of the map, η(t) is a learning coefficient, h ij (t) is a neighborhood function, and x(t) is the input vector.
For training, SOM network provides necessary information such as number of units, size, type of lattice map, and neighborhood function parameters.Number of units and size define the map resolution, type of lattice defines arrangement units from regular or irregular forms, and the base size of the neighborhood function controls cooperative process [20].
There are heuristic rules to compute number of units and map dimension, one of them is based on principal component analysis (PCA).The ratio of first and second principal components from the training dataset can provide an initial value to obtain the length and width relation of the map [20,21].Also, it is necessary to attempt that data activate all units of the map.These rules were followed to determine the number of units and size.Hexagonal topology for lattice was implemented because the distance between adjacent units is similar.
Finally, the neighborhood function (2) establishes the strength in connections between the units.In the present work, it is based on Gaussian distribution, given by: ( where d ij is the Euclidian distance between the j unit and BMU, and σ(t) is the basis of the function in the iteration t.This parameter changes during the training, beginning with a basis of four units and ending with just one unit.The map size and area where the neighborhood function has significant values determine the classification accuracy and generalization [20].

D. Training process
In training process, database size is important to build sets to train the neural networks, and sets to validate the best model.In this case, a high number of events is demanded.Nevertheless, different strategies have been studied to solve these difficulties.Crossvalidation and bootstrap techniques are examples of these solutions [22][23][24].
In this study, where only thirteen registers/events are available, a specific cross-validation technique known as Leave One Out (LOO) was implemented.Therefore, the number of ANN obtained is the same as the events in the database (thirteen).For each network, training set is settled by all but one event, and the use of SOM networks is validated in a general way, through the error of all trained ANN.
The LOO error is a statistical estimator of the behavior when a learning algorithm is used, and it is very useful for model selection because is slightly biased, despite its empirical error.Also, when the algorithm is stable, LOO error is low [21,22].The LOO error (3) can be calculated using: ( where m is the number of samples in the D set, composed by the z i elements, and f i is the function obtained after training.These methods have been used in applications where models of regression or structures in time series are required.
An unsupervised training was developed, where just information extract from acoustic lung signal was established.In this case, a vector with thirteen features was presented to the input of neural network.Here, trained maps were analyzed clustering the neurons, and searching for a map divided into groups that represent the information in the input of the map.
Additionally, a supervised training based on labels of acoustic lung signals was implemented.As shown in Table 1, three classes were established for map training.A map clustering was pre-defined through K-means algorithm, adjusting to three groups the neurons of the trained map output.

E. Post-training process
After SOM training, it was established a clustering process using the K-means algorithm that is based on proximity measures between data representation.In this case, information from neurons in the map were used as the algorithm.Proximity between neurons is measure through synaptic weights [25].
When unsupervised training was implemented, the number of clusters extracted from the map, through K-means algorithm, was evaluated using quality measures for clustering processes.Davies-Bouldin and Silhouette indices were used with this objective [26][27].According to this, the best number of clusters was found, and labels were put for each one of them, corresponding to the number of hits given by each signal in different regions of the map.Then, a classification rate for unsupervised proposal was computed.
For supervised training, the number of cluster was settled to three that corresponds to the number of abnormalities in the database.Then, a classification rate for supervised proposal was calculated.
Computation of the classification rate was based on frames.Each signal was divided into frames to extract the MFCC values, and each frame was presented as an input vector to the SOM+K-means proposal.The classification rate was determined according to the cluster with the highest number of activations given by the frames.In this way, lung signal was classified according to the class with the highest number of activations in each neuron in the network output.This calculation was developed for each neural network from thirteen trained networks by proposal.Then, general efficiency was computed, using the LOO technique.

III. results
According to LOO method, validation is obtained with one sample left out.This means that a single neural network was trained for each acoustic lung signal, obtaining thirteen networks by unsupervised and supervised proposal.The results of the two approaches are shown below.

A. Unsupervised proposal
Figure 2 shows values for Davies-Bouldin and Silhouette indices.The best values for both indices correspond to small number of clusters.Therefore, it was preferable to maintain three clusters, and compare these results with supervised approach.Subsequently, the SOM+K-means proposal was adjusted to this number of clusters.

B. Supervised proposal
For comparison, Figure 4 shows Davies-Bouldin and Silhouette indices from maps obtained by supervised training.Both indices showed that small number of clusters exhibit better results.According to the classes in the database, the number of clusters was fixed to three.Table 3 shows the results of the supervised proposal, and Figure 5 shows the clustering map.In the map, three regions are visualized, according to the supervised proposal.Red, yellow and green colors were used to label the crackles, wheezes and normal clusters, respectively.

IV. dIscussIon
As previously mentioned, the LOO method attempts to obtain results in terms of method efficiency in a general way.This means that there is not a single network that solves the classification problem; however, there is a models based study on neural networks for the classification methodology.
A substantial difference between unsupervised and supervised approaches was the classification efficiency.In the first proposal, results reached 54 % for testing set.In the second approach, 85 % of efficiency was reached.This shows that information included in supervised proposal was important in terms of classification rates.
When results are compared with other approaches [12,28], it was noticeable that classification rates, for supervised proposal, were closer to these results.Table 4 summarizes these comparisons.Index values were comparable in both type of trainings.The best indices were obtained with small number of clusters and in the supervised proposal.
Differences in clustered maps can be noted where clusters have better performance in the supervised proposal with well-defined regions (Figures 3 and  5).Irregular clustering was exhibit in the supervised proposal, where for crackles signals (Map1 to Map4), one cluster was divided into the corners of the map (Figure 5).
The results from testing could be shown using a mean computed with just thirteen maps, according to the available number of signals.Therefore, complementary experiments with bigger databases would provide more information about the use of current methodology.Databases with more samples facilitate the study of other kind of validation methods, without the limitations exposed here.

Figure 3
Figure3shows the first of the thirteen maps obtained with three clusters defined with red, green and yellow colors (crackles, wheezes and normal).

FIg. 3 .
FIg. 3. Map from unsupervised training using the first signal.

FIg. 5 .
FIg. 5. Map from supervised training using the first signal.

table 1
RegisteRs foR each class Transform.The main difference is that the bands are placed in a logarithmic way, according to the Mel scale.

Table 2
shows the unsupervised proposal results.The best classification rate performance from training was 71 % for crackles class.It is possible to see that the Normal class had a low classification rate, showing the poor capability of neural network to learn this pattern.Wheezes class had a classification rate of 69 %.

table 2
UnsUpeRvised pRoposal ResUlts

table 3
[17] can be explained by intra-cluster differences.Signals for this class were obtained from sounds such as vesicular, tracheal, bronchial and bronchovesicular, which are different but belong to the normal class[17].