Non-iid data and Continual Learning processes in Federated Learning: a long road ahead

participant to obtain its own model after the training process. When

Yüklə 1,96 Mb.

Pdf görüntüsü

səhifə	10/31
tarix	11.06.2023
ölçüsü	1,96 Mb.
	#128584

1 ... 6 7 8 9 10 11 12 13 ... 31

1-s2.0-S1566253522000884-main

participant to obtain its own model after the training process. When
trying to personalize a global model trained in a distributed setting, a
multitude of options and ideas have been studied and proposed. These
approaches are gathered into two different classes:
(A
) On the one hand, we find previously existent techniques of ML
adapted to the FL framework to develop a better global model,
such as Transfer Learning or Multi-task Learning.
(B
) On the other hand, we encounter different implementations of a
simultaneous two-level learning, local and global. Once the global
model is obtained, each participant combines their local model,
which was trained simultaneously with the global one, and attain
their personalized model.
The first type of solutions (A) include proposals such as Transfer
Learning techniques to adapt pre-trained models over public datasets
to a bunch of devices [
40
,
41
]. Nonetheless, there are other approaches
in this line that search for common representation of data at each
client, so that the local updates are resilient against domain shifts, and
hence the global model can perform well under non-IID data distri-
butions [
42
]. There are also adaptations of Regularization Methods,
which add a penalization term on the loss function. It is the case of
pFedMe
[
43
] and pFedAtt [
44
]. The first of them, pFedMe, consider a
term in the loss function that penalizes very different updates from
different clients. Moreover, the resultant loss function is a Moreau
envelope, a well-known mathematical object that allows the authors to

Information Fusion 88 (2022) 263–280
267
M.F. Criado et al.
provide convergence guarantees and further accuracy analysis. On the
other hand, pFedAtt introduces a regularization term that augments the
contribution of similar updates, grouping analogous clients updates and
facilitating convergence in non-IID scenarios where the differences rely
on the input distributions. Another idea is adapting algorithms from
the Model-Agnostic Meta-Learning (MAML) setting, such as Reptile, to
a federated setting [
45
,
46
]. These works aim to train a model that
can easily adapt to different tasks. In order to achieve that, they
explore how different data representations affect the model training,
and choose the more general representation, because it would adapt
faster to any of the goal tasks. There are other approaches similar to
these ones, where some authors consider the devices particularities
constitute enough difference to assume that the training participants
are performing different tasks [
47
,
48
], thus bringing up methods based
on Multi-task Learning to the FL paradigm, such as MOCHA [
48
]. All of
the above ideas are reinterpretations of what one could understand by
personalization, using different points of view to reuse already known
techniques.
The latter class of approaches (B) focuses on training two distinct
models in parallel for each device. One of those two models is the
global one, trained jointly by every client with its own data following
the standard federated baseline, whereas the other one remains private
and will be used to adjust the result from the federated training. The
proposals vary, both in the way of achieving the global model and
how the private model is taken into account. Here we present four
alternatives. In [
49
,
50
], clients train a Deep Neural Network (DNN),
with the requirement that the last few layers are not shared, and each
client trains them separately. In this case, the shared layers play the role
of the global model, while these latter layers act as the personalization
model, allowing different participants to obtain different results for
similar inputs. In [
51
], clients follow probabilistic steps of training.
A certain probability 𝑝 ∈ (0, 1) is fixed at the beginning, and in each
round of training participants execute locally one step of Stochastic
Gradient Descent with probability 𝑝, and share the local models of
their device to the server with probability 1 − 𝑝. Unlike in common
federated frameworks, the global model is computed and distributed
to each client, but they do not add the next updates over that model.
Instead, they calculate another model using a weighted mean over the
global model and their local one, so in the end each client obtains a
unique model. On the other hand, the FedProx method [
52
] focuses on
training a model similar to FedAvg, but allowing some variation in the
different local updates before they converge, and keeping a measure
of the updates dissimilarities through the training process. At the end
of the training procedure, each participant receives the global model,
but they are allowed to modify it briefly according to their dissimilarity
measure. To conclude, in [
53
] devices train a global model as it is done
in general in FL, but whenever a client participates in a training round,
it trains a local model at the same time it trains the global one. Once
training is finished, each client adjusts the global model obtained using
the local model.
The experimental results of some of these proposals are quite re-
markable. For instance, in [
42
], authors experimentally show that the
global model achieved with their method, trained over the MNIST
dataset, also performs well over a rotated version of that same dataset,
whereas standard FedAvg drops its accuracy significantly. [
45
] com-
pares their proposed meta-learning method with some other strategies
of personalization, such as a fine-tuning baseline [
54
] and a k-nearest
neighbour baseline [
55
], obtaining an accuracy higher than these two
algorithms by over 10%. [
49
] trains two well-known DNNs, MobileNet-

Yüklə 1,96 Mb.

Dostları ilə paylaş:

1 ... 6 7 8 9 10 11 12 13 ... 31