ScienceDirect Available online at

Yüklə 0,67 Mb.

Pdf görüntüsü

səhifə	5/9
tarix	13.09.2023
ölçüsü	0,67 Mb.
	#143150

1 2 3 4 5 6 7 8 9

Diagnostic heart

2. Methodology
2.1. Dataset
The Cleveland dataset (UCI, 1990) used in this study was received from the University of California Irvine (UCI)
Machine Learning Repository heart disease dataset that includes four independent databases contributed by four

590

Kaan Uyar et al. / Procedia Computer Science 120 (2017) 588–593

Kaan Uyar et al./ Procedia Computer Science 00 (2018) 000–000
3
independent medical institutions. The Cleveland dataset have 303 instances of patient data but 6 of them contain
missing values. Table 1 shows the Cleveland dataset attributes with their definitions.
Table 1. Attributes of the UCI Cleveland dataset
No
Name
Definition
1
age
Age in years
2
sex
Sex
3
cp
Chest pain type
4
trestbps
Resting blood pressure on admission to the hospital in mm Hg
5
chol
Serum cholestoral in mg/dl
6
fbs
Fasting blood sugar is greater than 120 mg/dl or not
7
restecg
Resting electrocardiographic results
8
thalach
Maximum heart rate achieved
9
exang
Exercise induced angina
10
oldpeak
ST depression induced by exercise relative to rest
11
slope
The slope of the peak exercise ST segment
12
ca
Number of major vessels (0-3) colored by flourosopy
13
thal
The heart status
14
num
Diagnosis of heart disease (0 = healthy; 1 = Sick1; 2 = Sick2; 3 = Sick3; 4= Sick4)
In this study, 6 of the instances of the Cleveland dataset containing missing entries are omitted. The diagnosis of
heart disease attribute (num) was categorized into two classes denoted as absence (num = 0) and presence (num = 1
or 2 or 3 or 4) of the heart disease. Class distributions are interpreted as 54% absence and 46% presence of a heart
disease. The dataset separated into two sub-sets for training (252 instances) and testing (45 instances).
2.2. GA based trained RFNN
The RFNN used in this study has 13 inputs, 7 hidden neurons and 1 output neuron as shown in Fig. 1. The
weights and biases of the RFNN were coded as 64 bits long genes. GA used with mutation probability 0.05, multi-
point crossover with probability 0.25 and size of population 100.Fig. 2 shows GA based training process (Aliev et
al., 2007).
Fig. 1.The structure of RFNN
4

Kaan Uyar et al./ Procedia Computer Science 00 (2018) 000–000
Fig. 2. GA based training of RFNN network
2.3. Evaluation criteria
The GA based trained RFNN approach performance is evaluated with Root Mean Square Error (RMSE),
sensitivity (recall), specificity, precision, F-score, probability of the misclassification error (PME) and accuracy of
the training set, testing set and overall performance was analyzed by using the Eqs. (1-7) respectively where Y
i
is
actual and R
i
is the result of the i
th
diagnosis of heart disease attribute (num) obtained, True Negative (TN) is the
prediction for the patients without heart disease that were found to have no heart disease, False Negative (FN) is the
prediction for the patients without heart disease that were found to have a heart disease, the True Positive (TP) is the
prediction for the patients with a heart disease that were found to have a heart disease, and the False Positive (FP) is
the prediction for the patients with a heart disease that were found to have no heart disease.




N
i
i
i
R
Y
N
RMSE
1
2
)
(
1
(1)
FN
TP
TP
Sensivity


(2)
FP
TN
TN
y
Specificit


(3)
FP
TP
TP
Precision


(4)

Kaan Uyar et al. / Procedia Computer Science 120 (2017) 588–593
591

Kaan Uyar et al./ Procedia Computer Science 00 (2018) 000–000
3
independent medical institutions. The Cleveland dataset have 303 instances of patient data but 6 of them contain
missing values. Table 1 shows the Cleveland dataset attributes with their definitions.
Table 1. Attributes of the UCI Cleveland dataset
No
Name
Definition
1
age
Age in years
2
sex
Sex
3
cp
Chest pain type
4
trestbps
Resting blood pressure on admission to the hospital in mm Hg
5
chol
Serum cholestoral in mg/dl
6
fbs
Fasting blood sugar is greater than 120 mg/dl or not
7
restecg
Resting electrocardiographic results
8
thalach
Maximum heart rate achieved
9
exang
Exercise induced angina
10
oldpeak
ST depression induced by exercise relative to rest
11
slope
The slope of the peak exercise ST segment
12
ca
Number of major vessels (0-3) colored by flourosopy
13
thal
The heart status
14
num
Diagnosis of heart disease (0 = healthy; 1 = Sick1; 2 = Sick2; 3 = Sick3; 4= Sick4)
In this study, 6 of the instances of the Cleveland dataset containing missing entries are omitted. The diagnosis of
heart disease attribute (num) was categorized into two classes denoted as absence (num = 0) and presence (num = 1
or 2 or 3 or 4) of the heart disease. Class distributions are interpreted as 54% absence and 46% presence of a heart
disease. The dataset separated into two sub-sets for training (252 instances) and testing (45 instances).
2.2. GA based trained RFNN
The RFNN used in this study has 13 inputs, 7 hidden neurons and 1 output neuron as shown in Fig. 1. The
weights and biases of the RFNN were coded as 64 bits long genes. GA used with mutation probability 0.05, multi-
point crossover with probability 0.25 and size of population 100.Fig. 2 shows GA based training process (Aliev et
al., 2007).
Fig. 1.The structure of RFNN
4

Kaan Uyar et al./ Procedia Computer Science 00 (2018) 000–000
Fig. 2. GA based training of RFNN network
2.3. Evaluation criteria
The GA based trained RFNN approach performance is evaluated with Root Mean Square Error (RMSE),
sensitivity (recall), specificity, precision, F-score, probability of the misclassification error (PME) and accuracy of
the training set, testing set and overall performance was analyzed by using the Eqs. (1-7) respectively where Y
i
is
actual and R
i
is the result of the i
th
diagnosis of heart disease attribute (num) obtained, True Negative (TN) is the
prediction for the patients without heart disease that were found to have no heart disease, False Negative (FN) is the
prediction for the patients without heart disease that were found to have a heart disease, the True Positive (TP) is the
prediction for the patients with a heart disease that were found to have a heart disease, and the False Positive (FP) is
the prediction for the patients with a heart disease that were found to have no heart disease.




N
i
i
i
R
Y
N
RMSE
1
2
)
(
1
(1)
FN
TP
TP
Sensivity


(2)
FP
TN
TN
y
Specificit


(3)
FP
TP
TP
Precision


(4)

592

Kaan Uyar et al. / Procedia Computer Science 120 (2017) 588–593

Kaan Uyar et al./ Procedia Computer Science 00 (2018) 000–000
5
Precision
Recall
Precision
*
ecall
R
Score
F



*
2
(5)
FN
TN
FP
TP
FN
FP
PME





(6)
FN
TN
FP
TP
TN
TP
Accuracy





(7)

Yüklə 0,67 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9