Predicting Heart Attacks in Patients Using Artificial Intelligence Methods

,


Introduction
Daily increasing development in information technology caused in significant growth in sciences.One of the sciences is medical science.Using artificial intelligence techniques in all subjects of this branch of science especially cardiovascular diseases made it possible to design medical assistant systems.By taking attention to increase in new diseases and also extension of technologies, the diagnosis of diseases gone beyond the internal treatment style, and the most efforts of doctors and specialists is focused on early prediction of diseases using available signs.Medical information retrieval system is the best system for managing clinical data.This system is capable to healthcare operations in diagnosing diseases and has an important role in clinical decision making [1].
Cardiovascular diseases is one of the most spreading causes of death in worldwide.One main type of this disease is "coronary artery disease" (CAD), which about 25% of population without any previous signs, are suddenly subject of this disease, and experience severe heart attack and die [2].At the moment, Angiography uses for determining the amount and location of narrowing of the arteries of the heart, which has high price and several side effects.Using data mining for diagnosis of heart diseases may be very lower in price and very faster [3].
Based on the announced statistics by the World Health Organization in 2005, there was 17.5 million victims from cardiovascular diseases, which is 30% of all death in worldwide, and it was predicted that this value increase to 23 million people up to 2030.Examinations made in Iran showed that 38% of all death subjects caused by cardiovascular diseases, which is increasing in future.Based on the statistics obtained from the evaluation of cardiovascular diseases, it was shown that 16.1% of people have high blood pressure, 43.9% have extra weight, 38.9% have low physical activity, and also 10.8% use tobacco which is of the most important factors of heart diseases.
Diagnosis of heart diseases is a significant and boring task and also an important duty in medical science, which requires extreme attention.However there is some tools for data extraction and analysis.Also existence of huge set of medical data leads to correct diagnosis of disease.Using medical data including age, sex, blood pressure, and blood sugar, it is possible to increase the possibility of heart diseases prediction.These data must be collected in organized manner, which could be used for integrating the prevention system [4].
In different countries using artificial intelligence techniques and various algorithms, predicting this type of death -heart disease-is somewhat possible.In Iran many efforts in this subject made by cooperation of software and medical communities.So the current study focuses on field studies by ail of decreasing the cost and early prediction of events happened for heart patients in Iran.
One of important methods in this field is clustering.In clustering, the data splits to some clusters, in such a way that the data in every cluster have maximum similarity with each other and minimum similarity with data of other clusters.So using clustering data will show that every cluster that has the patient, could help us in predicting that if he/she is under heart attack risk or not.Using this method and reaching to more precise diagnosis for heart patients is our aim.
We know that traditional clustering methods like K-Means often judges on data using the distance between them.But in this study the highest objection is using this property, because available data about heart patients [5] includes binary and nominal data.So using various improved clustering methods, we can use other metrics [6] instead of focusing on distance between data, to focus on qualitative properties, to increase the precision and gain more correct diagnosis.By comparing and clustering the techniques and algorithms which used in heart diseases field and its diagnosis, which showed in Table (1), we will see that there is no algorithm which always has maximum performance, and various factors are effective in performance of algorithms.As it could be seen in Table (1), various studies on various algorithms with various methods and different data sets with special applications and even data with different types, carried out.As a result of evaluation and clustering carried out for diagnosis of coronary artery disease using data mining, this fact shows that never it is possible to introduce an algorithm by taking attention to its internal structure, as an optimized algorithm, and some other factors are effective in performance of these algorithms.In addition to selecting sub-set of properties as an effective factor, another effective factor was focused, which not only made changes in performance of algorithms, but also results in contradictory results.
Generally as the Figure (1) shows, there is several operational stages in preparing a model, which could be used as software application in predicting heart disease, as follows: 1) Selecting Data: First we must use existing database for creating models.
2) Removing Data: Remove incorrect and outlying data using statistical methods.
3) Integrating Data: Collect data from various resources into one general data-set.4) Converting Data: All existing data must be normalized and converted to one format.This could be done using minimum and maximum normalizing methods.
5) Modeling: Creating model is main goal of a data mining project.In this stage some complex techniques and analysis methods used for extracting knowledge and information from data-set.
6) Clustering: The diagnosis of heart patient carries out using a combination of Fuzzy clustering algorithm and genetic algorithm, to gain more precise diagnosis in this disease.b.Value 1: Possibility of narrowing the diameter of vessel is more than50% (patient).
From the 17 describes attributes, the attribute No. 1 is identification attribute and attribute No. 17 is the class title.Generally 14 attributes uses in diagnosis.

Methodology
The goal of this article is obtaining the most important factors and indexing heart patients from all patient's information, which has a vital role in heart disease for individual.So for doing this, we must help to precisely diagnosing the disease by increase the precision, robustness and performance of disease diagnosing in medical community, to bypass inefficient patient's information which is useless in disease diagnosis.Therefore in this article using multi-objective Fuzzy clustering techniques, a new method for diagnosing heart diseases provided, which by taking attention to the fact that qualitative data of database taken as the basis, we hope to help to medical community and also people.Using clustering algorithm based on nominal data with multi-objective fuzzy approach, which using genetic algorithm tries to optimize the internal goals of clusters, made patients grouping a more precise task and differentiates this work from previous ones.
Assume that , in such a way that minimize the following criterion: ( , ) ( , ) Where ( , ) i D m x is the amount of dissimilarity between i m and x .Necessarily i m is not one of the members of i C set.Fuzzy K-mode clustering algorithm is on data x with K clusters for minimizing the following criterion: There is some conditions for probabilistic fuzzy clustering as follows:


The fuzzy power is m, and , and k th discrete object's membership degree in i th cluster is ik u .
Shows the centers of clusters (modes).
The Fuzzy K-mode algorithm is a part of periodic optimizing strategy, which contains the repetition of estimate of clustering matrix, and calculating the new cluster centers (modes).It starts with K random primary modes, and then in each repetition, the fuzzy membership on every data point in every cluster calculates by the following equation: become zero for some j, so {0,..., 0} and for i j = and 1 ik u = .Based on membership values, the cluster centers (modes) recalculates as follows.If the membership value was fixed, then the condition of modes which minimize the goal function are as follows: 1 2 [ , ,..., ] in such a way that ( ) The finish condition of algorithm is when there was no significant improvement in m J value.Finally, every object assigns to a cluster in which has maximum membership value in it.The main disadvantages of Fuzzy K-mode clustering algorithm are as follows:

Displaying Chromosome
Every chromosome is a sequence of property values for using K mode of every cluster.If every discrete object has p properties,

Primary Population
K primary cluster modes coded in every chromosome and selected as K random objects from discrete data-set.This process repeats for every one of chromosomes of population.

Calculating the Evaluation Function
In this study, there is two global compressing and resolution functions, which are two goal functions which considered parallel.First the coded modes extracted, and are as 1 2 , ,..., K z z z .
This means that a level of j A property from center of cluster, i z , is a set of values which maximizes the sum of ij u (membership degree of i th cluster).Based on it, the value of cluster membership recalculates.The diversity, i σ and fuzzy cardinality, i n in i th cluster which 1,..., i K = calculates with the following equation: For calculating the fuzzy separable suitability function, Sep, assume that i z mode on i th cluster is center of a fuzzy set, { |1 , } i z j K j i ≤ ≤ ≠ .So the membership degree of every j z toward i z which j i ≠ , calculates as follows: So the fuzzy resolution defined as follows 1 1, ( , ) It must be remembered that the compression of clusters became possible with π minimization.In return, for obtaining a good separated cluster, the fuzzy resolution, Sep must maximized.So the above two criterion considered as two goal of optimization functions in evolutionary algorithm.So we try to minimize the π and 1 Sep with evolutionary algorithm.

Selection, Cross-Over and Mutation
The goal of multi-objective clustering is simultaneous optimization of more than one goal and increasing the performance, using selection of these goals.The precise selection of goals could provide acceptable results, and foolish selecting goals provides bad results.
In this study we used common genetic operators.The operators used include: binary tournament selection operator, single point cross-over operator and single-point mutation operator.

Discussion
In this section, by taking attention to the provided algorithm for classifying heart patients, and taking attention to the heart patients' data, the In this section, to providing the fuzzy clustering algorithm for predicting heart patients by running multi-objective fuzzy clustering based on genetic algorithm, an optimized Pareto front forms, meaning that finally we will gain a set of optimized final answers of solutions.

Silhouette Validation Method
It defined based on the average distance between every sample of a cluster with all available samples of that cluster, and average distance of total current available samples in other clusters with a specific cluster.Based on this point of view, the amount of diversity and data correlation of data determined, which the maximum values of this index used for determining the number of optimal clusters.
Where a(i) shows the non-similarity of a sample with other samples in a cluster and b(i) shows the non-similarity of a sample with all other samples of other clusters.
The value of Silhouette validation index is between -1 and +1.The higher values of this index (near to +1) shows that clustering made correctly.If index is near zero, means that we can assign a sample to a nearer cluster, and the sample is located in the same distance from both clusters.If the index becomes -1, it means that the clustering did not made correctly.

Method of Validating the Dunn Index
This index defined by the following equation: In the above equation, d(x, y) and ( ) k diam c defined as follows: If a data-set is separable, it is expected to have high distance between clusters and low diameter for clusters.As a result the higher value of this index is more favorable.Finally for evaluating the level of validity in results of every clustering algorithms we used T-test for every one of them in 5%significance level.P-value shows the probability level of chance in obtained information for Dunn index in every cluster.The P-values obtained from every algorithm shows the meaningfulness of results from the statistical point of view.Based on the zero hypothesis, we assume that there is no meaningful difference between values of Dunn index in any cluster.While the other hypothesis says that there is meaningful difference between them.Based on Table (3 7) the relationship between different classes of heart attack probability with age and sex of patient is obvious.In this figure we can see that by studying the second cluster -colored with red-we can diagnose the signs of heart attack and use it for later diagnoses.Figure (8) shows the relationship between different classes of heart attack probability with blood pressure and cholesterol of patient.Also in this figure it is obvious that the second cluster specifies the heart attack probability.

Conclusion
Heart is a pump or pulsating pump which composed of four compound holes with two atriums and two ventricles, which delivers blood to all body organs.So the heart is a vital organ of body.Unfortunately todays an important quantity of death is caused by heart diseases.Todays the cardiovascular diseases are the most important challenges of healthcare in the worldwide.Prevention and management of cardiovascular diseases requires a pervasive and comprehensive system for recording data.Information of patient records are one of the most important data, which must be classified for easy and fast treatment process.Goal of current study was providing a classifying system for cardiovascular diseases to improve the policies of healthcare in opposition against cardiovascular diseases.The main goal in classifying is putting people in groups with predetermined number of patients.Then the future patients will be helped with evaluating their signs.
In all developed countries the diseases classifying system is a fundamental base for addressing healthcare requirements of country with classifying cardiovascular diseases.The national system of classifying cardiovascular diseases will play an effective role in improving the management and prevention of cardiovascular diseases in Iran.Using this approach we can greatly help in early prediction of this disease.
In this article we introduced some of the most useful algorithms and techniques of artificial intelligence which recently used, and briefly described their properties.In recent years several studies carried out in artificial intelligence subject on heart patients' data, and many of algorithms were successful which mentioned earlier.But the important point is that the level of success in these algorithms depends on various factors and it is not possible to choose a method as the best one.Factors like data type of database, selecting sub-set of properties and risk factors, number of properties, the larger size of database, low number of missed data and access to suitable and correct data increases the success chance in exploration and increase the quality of algorithms' results.In this subject we could mention to cases which improves the algorithm including: using self-adaptive neural network which obtained from fuzzy clustering, or using a combination of clustering and SVM classifying, and using fuzzy rules discovery on heart patients data.

Clustering Heart Disease
Trestbps

Figure 1 .
Figure 1.The Provided Methodology the length of chromosome will be K p × , in which the first location of P (gene) shows the P-dimension of first cluster's mode, the second location of P shows the second cluster's mode, etc. Assume that P=3 and K=3.So the chromosome is as follows: Which shows three cluster modes including:

S
Silhouette index used for evaluating the level of similarity between suggested algorithm and heart patients classifying.Figure(3) shows that the results of Silhouette index explains the maximum similarity of formed clusters by suggested algorithm for classifying heart patients.

Figure 3 .
Figure 3.Comparison between Silhouette Index with other algorithms ) all of obtained P-values are less than 0.05.So the obtained results are precise.The Figure (4) shows the comparison between Pareto Front with number of different clusters.

Figure 4 .
Figure 4. Comparison between Pareto Front with different clusters

Figure 5 .Figure 6 .
Figure 5. Relationship between different classes of heart attack probability with age and sex of patient

Figure 7 .Figure 8 .
Figure 7. Relationship between different classes of heart attack probability with blood pressure and sex of patient

Table 1 .
Comparing past works Table (1)shows some samples of heart data.

Table 2 .
Comparing Dunn index with various number of clustersResults of Dunn validation method showed in Table(2).Based on the results it specified that in suggested algorithm, the Dunn index has better and more suitable results in comparison with other provided algorithms.So the quality of made clusters were higher and better than others.The range of classifying data by suggested fuzzy clustering algorithm was 2-5 classes.In this article, based on Tables(3) and (4), by comparing results of Dunn index for every made clusters by suggested algorithm, the most optimized case made by 3 clusters of heart patients' data suggested.