Non-iid data and Continual Learning processes in Federated Learning: a long road ahead

participants. That is, making it possible for each user

Yüklə 1,96 Mb.

Pdf görüntüsü

səhifə	15/31
tarix	11.06.2023
ölçüsü	1,96 Mb.
	#128584

1 ... 11 12 13 14 15 16 17 18 ... 31

1-s2.0-S1566253522000884-main

4. Non-IID Data in Continual Learning: Concept Drift

participants. That is, making it possible for each user 𝑖 ∈ 𝑁 to have
a somehow personalized model 𝑀
𝑖
, distinct from the others. Some of
the strategies of personalization discussed in Section
3
can be adapted
to deal with different behaviours (see
Fig. 3
). For instance, having a
proper metric of the error would allow the Group-level personalization
strategies to cluster the participants according to their behaviour. This
approach is very similar to Cohort-based Federated Learning [
107
],
which precisely organize clients into cohorts with very similar data
distributions.
There are few approaches specifically designed to deal with this
kind of non-IID data. One of the closest techniques to tackle this issue,
although it does not specifically talk about FL, is the one deployed
in [
108
]. This strategy consists of including additional pieces of infor-
mation to the input data, such as a task identifier 𝑧. This allows 2 very
similar data samples 𝑥
𝑘
1
, 𝑥
𝑘
2
to be distinct: (𝑥
𝑘
1
, 𝑧
1
), (𝑥
𝑘
2
, 𝑧
2
)
.
To conclude, there are also works in the crossroad between FL and
Multi-task Learning [
48
,
109
]. The first one was already discussed in
Section
3.2
. The latter also presents the multi-task framework com-
bined with the federated setting. It performs experiments using the
MNIST, EMNIST, and Shakespeare datasets, and compares itself with
well-known federated algorithms such as FedAvg and FedProx.
4. Non-IID Data in Continual Learning: Concept Drift
In this section, we are going to consider Continual Learning (CL)
problems, which involve the training of models over time. In the
standard ML setting, the objective is to build a prediction model using
a certain amount of data. A key point to discuss is that the training
dataset is typically assumed to be fully available from the beginning,
and this may conflict with realistic situations, where data is collected
progressively and changes over time. For that reason, it is convenient
to talk about CL, a ML setting in which models continuously learn
and evolve using new streams of data samples, while aiming to retain

Information Fusion 88 (2022) 263–280
270
M.F. Criado et al.
preceding concepts. This kind of framework has been given different
names over the years [
110
], like Lifelong Learning [
111
,
112
], Never
Ending Learning
[
113
,
114
] and Incremental Learning [
115
–
117
], but all
of them rely on the same ideas: training a model gradually with data
collected over different periods of time, adapting to the new instances
and trying to preserve the previous knowledge.
We introduce the CL framework because we aim to talk about
the time-evolving condition of FL problems. However, throughout this
section we are going to cite and briefly describe works focused on CL
that do not necessarily consider the FL framework. This is because,
as we already mentioned, there are almost no works that focus on
both FL and CL simultaneously [
9
,
10
,
118
]. Nonetheless, the works we
consider are, from our point of view, the ones that would be more easily
adaptable to the FL framework, with multiple devices collaborating
to achieve the same global model. We will further explain how each
strategy could be modified when talking about them.
Training a model using CL techniques presents some specific prob-
lems, which have already been studied in recent literature. The most
challenging ones are, as it occurred with FL, related to the data dis-
tribution. CL was conceived as a centralized paradigm of ML so, even
though non-IID data across devices has not been discussed nor handled
so far, it can evolve in time. This is a complication, as the model could
be unable to converge to a solution if the training data shifts constantly.
Another undesirable situation, named catastrophic forgetting, is that the
model completely and abruptly forgets previously learned concepts if
they are not present in the current data anymore [
119
,
120
]. For these
reasons we are going to focus on how data behaves as time goes by, and
how to act if the data shifts drastically, in unpredictable ways. This is
commonly known as concept drift [
7
,
110
].
4.1. Concept drift definition
The non-stationary data distribution is caused by changes in data
over time. These changes can be seen as variations in the frequencies
certain kind of data appears: a concept has frequency zero if it has
not appeared yet in the dataset, and when it shows up its frequency
becomes a positive number. This kind of variation, called concept drift,
is one of the most important CL challenges [
110
,
121
]. We can formally
define them as follows:

Yüklə 1,96 Mb.

Dostları ilə paylaş:

1 ... 11 12 13 14 15 16 17 18 ... 31