4. Experimental Results. This study used the Gachon University Laboratory as the environment for applying performance metrics in machine learning. Federated learning for clinical event classification tasks featured the following environment: 24 GB 3090 RTX GPU, 64 GB RAM, core-i9 4.5Ghz, Python, Cuda. The choice of model parameters can also impact the machine learning model's performance. For example, the number of trees in a random forest model or the regularization parameter in a logistic regression model can affect the model's performance. Also, the choice of evaluation metrics is an integral part of the environment. Different metrics may be more appropriate for different types of problems and data. There are several ways to compare machine learning models, such as employing performance metrics. One of the most common ways to compare machine learning models is to evaluate their performance using relevant metrics, including accuracy, precision, recall, and F1-score. These metrics quantitatively assess the model's ability to solve a specific problem. Model accuracy refers to how often the model makes correct predictions. Precision (3) is the proportion of the model's accurate positive predictions among all positive predictions. Recall (4) (Sensitivity) is the proportion of accurate positive predictions the model makes among all positive cases. Eventually, F1-Score (5) is the harmonic mean of precision and recall. Overall, it is essential to consider combining these factors when comparing machine learning models to determine which model best suits a specific problem.
Table 5 illustrates the machine learning performance on the MIMIC IV dataset on clinic event classification using Flower federated learning techniques. This study investigated the performance of various machine learning models, including Random Forest, Logistic Regression, Stochastic Gradient Descent (SGD), AdaBoost, and Gaussian Naïve Bayes, in a federated learning setting. The models were tested with different numbers of clients (3, 5, and 10) and communication rounds (5, 10, and 15). The goal was to assess the impact of these factors on the overall performance and determine the most effective combination for clinical event classification.
Table 5. Evaluating the performance of machine learning models in federated learning with varying rounds and clients.