Non-iid data and Continual Learning processes in Federated Learning: a long road ahead

participant could carry out these calculations and the central server

Yüklə 1,96 Mb.

Pdf görüntüsü

səhifə	13/31
tarix	11.06.2023
ölçüsü	1,96 Mb.
	#128584

1 ... 9 10 11 12 13 14 15 16 ... 31

1-s2.0-S1566253522000884-main

participant could carry out these calculations and the central server
could build the common input space based on all of the local spaces
estimated by the clients.
A serious difficulty when trying to deploy this kind of method is that
feature spaces in realistic problems tend to be very high-dimensional
spaces, and that causes problems such as needing more processing
capacity, or inaccurate results due to the curse of dimensionality.
For instance, [
69
–
71
] consider each domain may have its own set of
features to characterize the samples, causing incompatibilities across
domains, and they develop methods to extract a common feature
representation. A different approach, performed in [
72
–
75
], consists
of constructing a factorization of the feature space with some prop-
erties. [
72
,
73
] split the feature space into two orthogonal subspaces:
one of them contains the domain variations, whereas the other one
keeps the common parts, and both are used separately to perform
learning. [
74
,
75
] divide the input space into an arbitrary number
of low-dimensional spaces and apply techniques of Distance Metric
Learning (DML) [
76
,
77
] in each of them.
On the other hand, Domain Adaptation methods [
78
–
80
] work with
a distinct situation, close to Transfer Learning, and they are in general
deployed in centralized frameworks. However, some works aligned
with this line of research talk about Federated Transfer Learning [
81
],
and consider the FL settings. In these cases, users are assumed to train
a model with their data belonging to some domains, but they then
apply that same model to get predictions on other domains. In this
kind of setting, it is commonly said that the samples drawn for training
belong to Source Domains, while the samples used for prediction belong
to Target Domains. The most important difference with the previous
category is that in this case there are no training data from the Target
Domains, so it is not possible to create an appropriate feature space for
learning those domains. Some other works in this line of research [
82
–
85
] present a variety of methods to measure the dissimilarity between
the source and target domains, such as Maximum Mean Discrepancy
(MMD) [
82
], or Moment Matching for Multi-source DA (M
3
SDA) [
83
]
and adjust the training model in an unsupervised manner [
84
,
85
].
Another possibility for adapting the domains that also consists
on calculating distances among the data distributions, make use of
that information for re-weighting the samples from the closest ones
to improve the model performance. [
86
,
87
] apply this method us-
ing the Kullback–Leibler [
88
] divergence, a well-known metric from
information theory.
A different approach for Domain Adaptation is based on Generative
Adversarial Networks (GANs) [
89
–
93
]. This strategy, that achieves
remarkable results, consists of training two neural networks simulta-
neously: one of them is designed to create fake input data from the
different domains, and the other one aims to distinguish the real data
samples from the fake ones. Article [
93
] is particularly interesting
because they employ GANs in federated settings. In these works, the
method presented is compared with some other pre-existing meth-
ods, such as Deep Adaptation Network (DAN) [
94
], Deep Domain
Confusion (DDC) [
95
] and Residual Transfer Network (RTN) [
96
].
These are some state-of-the-art techniques in Domain Adaptation. How-
ever, GANs methods outperform them in well-known tasks, such as
object recognition using the Office-31 [
97
], Office-Home [
98
] and
VisDA2017 [
99
] datasets, and digit recognition using the MNIST [
33
],
USPS and SVHN [
100
] datasets.
In general, comparing the different methods is a tough issue. Each
work is free to choose different synthetic datasets to perform ex-
perimental results, and also modify them to generate the required
heterogeneity that they want to face (see
Table 3
). For these reasons,
it is impossible to fairly compare the diverse strategies we presented.
However, there are some remarkable results that we would like to high-
light: regarding Domain Transformation methods, the experimental

Information Fusion 88 (2022) 263–280
269
M.F. Criado et al.
Table 3
Summary of the Datasets employed in the works presented in Section
3.3.1
. Asterisks
indicate that the datasets have been modified in particular ways, making it impossible
to fairly compare each other. Some of the datasets mentioned were not referenced so
far: Bing-caltech256 [
102
], COREL5000 [
103
], ImageNet [
104
] and NYUD [
105
].
Article
Datasets used in experiments
[
69
]
MNIST + SVHN + USPS
[
71
]
Office-31;
Bing-caltech256
[
74
]
COREL5000;
Trecvid2005
a
[
75
]
MNIST*;
Olivetti FR
b
[
79
]
MNIST + SVHN
[
80
]
Office-31;
ImageNet;
VisDA2017
[
82
]
Office-31;
Image CLEF-DA
[
83
]
Digit5;
Office-31
[
84
]
MNIST + MNIST-M + USPS;
VisDA2017
[
89
]
MNIST + SVHN + USPS;
Office-31;
NYUD
[
90
]
Office-31;
Image CLEF-DA
c
[
91
]
Office-Home;
VisDA2017
[
92
]
MNIST + SVHN + USPS;
Office-31
a
Available at
http://www-nlpir.nist.gov/projects/trecvid
.
b
Available at
http://www.uk.research.att.com/facedatabase.html
.
c
Available at
http://imageclef.org/2014/adaptation
.
results of two of the works stand out [
71
,
75
]. They present a complete
variety of experiments and contrast their results with other well-known
methods, getting significantly better error ratios and accuracies. On the
other hand, the most outstanding results achieved with Domain Adapta-
tion methods are the ones from [
84
,
91
,
92
]. The first one, [
84
] propose
their method SimNet and experimentally compare their results with
some other methods like DAN, RTN and a baseline method over the
datasets of MNIST, Office-31 and VisDA2017. It improves the accuracy
obtained by every other method in the three cases. Concerning [
91
,
92
],
they both employ the Office-31 dataset, and obtain impressive results
compared to the other methods they test.
Besides all of the strategies we just talked about, there are a bunch
of other methods to deal with the domain shifts. One of them is [
101
],
which also mentions the Source and Target Domains, but also factorizes
the input space to search for a Grassmann Manifold that fits all of
the data samples. Afterwards, the training is performed only on that
manifold, instead of in the whole feature space. Lastly, some of the
federated strategies of personalization explained in Section
3.2
can also
deal with the kind of heterogeneity brought up in this Section [
40
,
41
,
45
,
46
,
52
,
63
].
3.3.2. Changes in the behaviour throughout clients
Differences in the behaviour of the clients refer to discrepancies in
their conditional probabilities 𝑃 (𝑦
|𝑥). A variation of this nature means
that for, at least for some data samples, the correct output is not the
same for all of the clients. More formally:

Yüklə 1,96 Mb.

Dostları ilə paylaş:

1 ... 9 10 11 12 13 14 15 16 ... 31