Information Fusion 88 (2022) 263–280 266
M.F. Criado et al. own data samples unbalanced over the possible classes, the difference
between probabilities would lie in the terms 𝑃 (𝑦) or 𝑃 (𝑦 |𝑥). However, in
this work we are going to focus on the factorization given by Eq.
(3)
,
since the conditional probability 𝑃 (𝑥 |𝑦) in Eq.
(4)
may seem contro-
versial with the natural way of training a model, which consists of
predicting 𝑦 based on 𝑥 and not the other way around.
If the data probabilities belonging to 2 participants, 𝑃 𝑖 (𝑥, 𝑦)
and
𝑃 𝑗 (𝑥, 𝑦)
, are the same, then both factors would be equal: 𝑃 𝑖 (𝑥) = 𝑃 𝑗 (𝑥)
and 𝑃 𝑖 (𝑦 |𝑥) = 𝑃 𝑗 (𝑦 |𝑥). Notice that every time we say two probabil-
ity distributions are the same we mean they are alike in statistical
terms, i.e., they cannot be recognized as different using a standard
hypothesis testing. If, on the contrary, the joint probabilities are not
the same, there are three possible scenarios according to the previous
factorization:
(i) 𝑃 𝑖 (𝑥)
≠ 𝑃 𝑗 (𝑥)
and 𝑃 𝑖 (𝑦 |𝑥) = 𝑃 𝑗 (𝑦 |𝑥). In this kind of situation,
clients own data samples from different domains, but they share
the same goal. This could be the case of, for example, participants
collecting data for training an autonomous car. Some users may
drive on the left and some others on the right, and they will face
different circumstances. That will make the input space of the
different participants skew. However, they gather data with one
common objective, and they are expected to act similarly.
(ii) 𝑃 𝑖 (𝑥) = 𝑃 𝑗 (𝑥)
and 𝑃 𝑖 (𝑦 |𝑥) ≠ 𝑃 𝑗 (𝑦 |𝑥). This sort of scenario occurs
when the input spaces perceived by the clients are analogous, but
their outputs are not. A real situation where this could happen, re-
lated to the training of an autonomous car, is a yellow traffic light.
When encountering a yellow traffic light, the correct output for
some participants would be to stop the car, and for some others
to continue driving without changes. This causes incompatibilities
among the clients.
(iii) 𝑃 𝑖 (𝑥)
≠ 𝑃 𝑗 (𝑥)
and 𝑃 𝑖 (𝑦 |𝑥) ≠ 𝑃 𝑗 (𝑦 |𝑥). This is a combination of
the two previous situations: Participants want to learn a common
task, such as driving, but their input spaces are significantly
unequal, and their reactions to some of the inputs are different
too.
On the whole, this gives us a total of four different situations to
account for. We represent them in
Table 1
, along with the different
works that deal with each situation. There are lots of techniques that
could fit the cell of IID data, such as FedAvg [
11
]. However, that
situation is out of the scope of our work, and we will focus on the
non-IID scenarios. Most of the works that consider heterogeneous data
problems do not provide a classification of non-IID data, neither worry
about the kind of heterogeneity they are trying to deal with. However,
we locate them in
Table 1
according to the kind of non-IID data
situation that they can solve. For this reason, we establish two possible
classifications of the FL non-IID research.
The first way of classifying the different strategies is based on
how the works place themselves in the FL context. Most of the FL
works that deal with heterogeneous data describe their approach as a
Personalization approach to increase their accuracy over the different
clients. This personalization can be performed at different levels. In
Section
3.2
we briefly explain these kinds of methods. It should be
noticed that, although these strategies deal with data heterogeneity,
they are unaware of which probability density function is varying in
each situation.
On the other hand, once we have discussed and explained the
different types of non-IID data that could exist, it seems very reasonable
to classify the strategies according to the type of non-IID data it faces.
This classification encompasses the other one, and at the same time
it opens the door to consider other ML techniques that also deal with
heterogeneous data and are close to FL. We describe these techniques
in Section
3.3
.
Table 1 Non-IID learning scenarios in Federated Learning, and the strategies that could
potentially solve each situation. Strategies that deal with changes in both the input
space and the behaviour are placed only in the last column, and not in the previous
ones.
3.2. State-of-the-art classification: Personalization strategies In this section, we focus on the first classification we just mentioned.
Some of the works present in the FL literature try to balance the
generalized knowledge learned from the whole set of clients with
the specificity of each of them. This kind of thinking gives rise to
the personalization techniques, which precisely aim to grant more
importance to the particular information of individual clients. It is
empirically shown that in realistic situations one global model cannot
fit the particularities of all clients [
37
]. In fact, some clients may
have opposite interests sometimes, so we must open the door to the
possibility of having more than just one global model. Personalization
arises as an intermediate agreement between the model generalization
and individualization, so that the model can learn not only the general
knowledge but also the uncommon one.
Personalization can be implemented at different levels: each partic-
ipant could have its own model, distinct from all of the others, which
we will refer to as Client-level Personalization; or there could be groups
of clients sharing the same model, i.e., Group-level Personalization. Both
options present some advantages, as they accomplish a better model
performance, although their main drawback is that their computational
requirements are more demanding.
3.2.1. Client-level personalization Client-level personalization refers to the approaches that allow each