v 1 and ResNet-34, adding some personalization layers. They use the
CIFAR-100 and FLICKR-AES [
56
] datasets, and split the data among
clients imposing a non-iid division. They only compare both DNNs with
standard FedAvg, however, they highly improve the accuracy, reaching
80% in both datasets, while FedAvg barely gets 60% with CIFAR-100,
and 40% with FLICKR-AES. [
52
] compares FedProx with FedAvg using
the MNIST and Shakespeare datasets in both IID and non-IID situations.
In the IID scenario, FedAvg performs slightly better. However, FedProx
outperforms it by a huge difference in the non-IID setting, and also
improves the convergence ratio in both cases. Lastly, [
53
] uses the
MNIST, EMNIST and CIFAR-10 [
57
] datasets to test the accuracy of
their APFL [
53
] strategy. They obtain an accuracy of 89% over a non-
IID split of the dataset, in opposition to FedAvg and FedAvg with
fine-tuning, which obtains 32% and 83% of accuracy respectively.
3.2.2. Group-level personalization In contrast with attaining a distinct model for each client, another
alternative is what we call Group-level personalization, which consists
of gathering the devices in clusters and training a different model for
each of them. This line of research emerged a couple of years ago, so
it presumably has not achieved its full potential yet. For this reason,
there are few different approaches to be mentioned, and all of them
are very recent.
The most discussed topic in this area of research is how to split (or
assemble) participants in order to get the groups that would benefit
the most from sharing a model. One set of approaches are based on
adopting hierarchical clustering techniques to partition the clients [
58
–
60
]. Their strategy is based on using a measure of distance between
the weight updates the clients send to the server to gather [
58
,
59
]
or divide [
60
] them. However, there is no mathematical guarantee
that participants who send similar updates would benefit each other. It
could be the case of some participants who collect very different data
but the updates they generate are close to each other. Similarly, nothing
ensures that the ones who send different updates would be better off
apart. The only justifiable statement is that these participants would
reach convergence faster than others, regardless of the performance of
the obtained model.
The other approach to perform this kind of personalization concerns
the global data distribution, and the local distributions of the clients.
The main point of the works concerning this line of research [
61
–
63
] is
that if data is non-IID among the devices, then a shared global model
cannot fit all of the data samples belonging to any client. When this
happens, the global distribution may not represent the singularities of
some participants, and thus the global model should not be trained
according to that distribution. What these works propose is a method
to determine the global distribution 𝐷 𝛬 that best represents the dif-
ferent clients. This distribution does not necessarily coincide with the
weighted mean global distribution (Eq.
(1)
). Clients are then grouped
according to some private parameters that depend on 𝐷 𝛬 . This strategy
avoids the usage of the local updates to form the clusters and develops
theoretical guarantees to justify that clustering the participants this way
benefits the final model.
Concerning the experimental results, [
58
–
60
] perform training on
MNIST, FEMNIST, and CIFAR-100 [
57
] benchmark datasets, dividing
the samples among the clients and swapping labels in some of them
to produce different behaviours, 𝑃 (𝑦 |𝑥). They achieve higher accuracy
and convergence ratio than standard FedAvg. However, these methods
do not compare themselves with any other algorithm than FedAvg,
which is designed to tackle problems only in IID scenarios. On the
other hand, [
61
] improves the result obtained by FedAvg on the task
of digit image recognition using the MNIST dataset, in a centralized
framework. [
62
,
63
] compare themselves with FedAvg in decentralized
settings using the datasets of Fashion MNIST and Extended MNIST
(EMNIST) [
64
], and they obtain a similar accuracy. On the whole,
the most remarkable improvement accomplished with these kinds of
personalization methods so far is their convergence speed. In
Table 2
we summarize the datasets used in these works.
3.3. Statistical taxonomy for non-IID strategies Once we have explained the existing personalization methods in FL,
we now want to deepen into the other classification we made, based
strictly on the kind of non-IID data the different works face. Going
Information Fusion 88 (2022) 263–280 268
M.F. Criado et al.