Figure 2. Head of initial version of MIMIC IV dataset 3.2 Data pre-processing Data pre-processing is an important step in the machine
learning process as it helps to prepare the data for analysis and
modeling. Some of the key reasons for the importance of data
pre-processing. For example, data cleaning helps to identify
and remove any errors, inconsistencies, or missing values in
the data. This helps to ensure that the data is accurate and
reliable for analysis and modeling. Data normalization needs
to normalize the data by scaling or transforming it to a
common range or format. This helps to ensure that the data is
in a consistent format and that the model is not affected by
outliers or extreme values. Data transformation is also an
essential part that transforms the data into a format that is more
suitable for the machine learning model being used. This may
involve converting categorical data into numerical data or
creating new features from existing data. The next example is
Feature selection helps to select the most relevant features for
the model. This helps to reduce the dimensionality of the data
and improve the performance of the model by eliminating
noise and reducing overfitting. The last two are Handling
missing values and Handling imbalanced data that handle
missing values by filling in missing data or removing instances
with missing data and helping to handle imbalanced data by
oversampling, and under sampling. Overall, data pre-
processing is an essential step in the machine learning process
as it helps to ensure that the data is clean, consistent, and in a
suitable format for analysis and modeling.
In the initial version of the dataset, there was not any clinic
event target whereas PEACE-Home [28] proposed a system
for monitoring patients in a home-based setting using vital
signs such as heart rate, blood pressure, and respiratory rate.
The system uses probabilistic estimation to identify abnormal
clinical events, such as deterioration in a patient's condition,
by analyzing correlations among vital signs and separating
clinic events as target data while clustering and using relied-
on expert system. Data labeling is a process of assigning labels
or tags to data, for it to be used for training or evaluating
machine learning models. In the context of PEACE-Home,
data labeling would likely involve the process of identifying
and tagging instances of abnormal clinical events within the
vital sign data collected from patients in a home-based setting.
This can be done through manual annotation by healthcare
professionals or through the use of algorithms to automatically
identify and label events of interest.
The labeled data is used to train machine learning models to
identify and classify abnormal clinical events using vital sign
correlations. The model can then be used to monitor patients
in a home-based setting and identify potential health problems
early on. Table 1 shows the labeled clinic event data from the
MIMIC IV dataset by using the PEACE-home method.
Labels
Reason
Threshold values
0
1
2
3
4
Hypertension
High BP
(SBP
≥120 and DBP≥80)
or MBP≥105
X
X
X
○
X
Hypotension
Low BP
(SBP
≤90 and DBP≤60) or
MBP≤70
X
○
○
X
○
Tachycardia
High HR
HR
≥
100
X
○
X
○
○
Bradycardia
Low HR
HR ≤ 60
X
X
○
X
X
Tachypena
High RR
RR ≥ 17
X
○
○
○
X
Bradypena
Low RR
RR ≤ 12
X
X
X
X
○
Hypoxia
Low SPO
2
SPO
2
≤ 93%
X
○
○
○
○
Acronym
NNN
THTH
BHTH
TTTH
THBH
Number of
samples
700
500
340
470
410
Ruzaliev R:
Federated Learning for Clinical Event Classification Using Vital
Signs Data
2
VOLUME XX, 2023