Scenario D. Behaviour heterogeneity. This last scenario emulates
a situation where some of the participants present different behaviours.
As in Scenario B, each client only has data samples from 1 domain,
and the clients selected for testing are randomized. However, we have
altered all of the labels from the SVHN dataset, so 10 of the clients
present data labelled differently with respect to the others, and 3 of
them are selected for testing. In
7(g)
and
7(h)
we can see that FedAvg and FedProx obtain similar accuracies, 63% and 64% respectively,
because neither of them is designed to face this kind of heterogeneity.
6.2. Temporal non-IID scenarios In this scenario, data varies along time, but it presents the same
characteristics and properties in all of the clients, and it varies at the
same time for all of them. This type of temporal evolution is much
predictable and affordable than what we would expect in a real-life
problem, but it is enough to deteriorate the federated models, as we
illustrate ahead. Again, for our experiments we selected two different
algorithms, FedAvg and CDA-FedAvg. Recall that CDA-FedAvg [
8
] is a
method designed to face changes in 𝑃 (𝑥) along time. On the other hand,
FedAvg is not a method designed for CL. It expect to have all the data
available from the beginning of the training process, and that is not the
case here.
To naively adapt FedAvg to this setting, we have fixed that each
client selected for training processes a significant quantity of data
samples from their dataset, 800, and use them to perform the training
stage. After that, those data samples are stored and the next time that
client trains a model will be using those 800 samples as well as new 800
data samples more. We have also fixed a maximum amount of data,
5000
, that each client can store and use for training, i.e., after that
number of samples is processed, the training dataset will start forgetting
the first samples collected. The purpose of implementing this restriction
is trying to emulate a real problem, in which the different devices
may not be capable of handling all the information they capture. This
same quantities and limitations are fixed for CDA-FedAvg, although
this method only stores data samples when detecting a drift, so its
management of resources is more efficient.