Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə195/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   191   192   193   194   195   196   197   198   ...   423
1-Data Mining tarjima

INPUT NODES










INPUT LAYER










X 1







X 1

























i







i













HIDDEN LAYER







w1







2




Xi2

OUTPUT NODE










w2




Xi

OUTPUT LAYER




X 3

w3




Z

X 3

Z




i

w4




i

i

i






















X 4

w5







4







i










Xi







X 5










5







i










Xi










(a) Perceptron







(b) Multilayer




Figure 10.10: Single and multilayer neural networks


A question arises as to how the learning rate η may be chosen. A high value of η will result in fast learning rates, but may sometimes result in suboptimal solutions. Smaller values of η will result in a convergence to higher-quality solutions, but the convergence will be slow. In practice, the value of η is initially chosen to be large and gradually reduced, as the weights become closer to their optimal values. The idea is that large steps are likely to be helpful early on, but may result in oscillation between suboptimal solutions at later stages. For example, the value of η is sometimes selected to be proportional to the inverse of the number of cycles through the training data (or epochs) so far.

10.7.2 Multilayer Neural Networks


The perceptron model is the most basic form of a neural network, containing only a single input layer and an output layer. Because the input layers only transmit the attribute values without actually applying any mathematical function on the inputs, the function learned by the perceptron model is only a simple linear model based on a single output node. In practice, more complex models may need to be learned with multilayer neural networks.


Multilayer neural networks have a hidden layer, in addition to the input and output layers. The nodes in the hidden layer can, in principle, be connected with different types of topologies. For example, the hidden layer can itself consist of multiple layers, and nodes in one layer might feed into nodes of the next layer. This is referred to as the multilayer feed-forward network. The nodes in one layer are also assumed to be fully connected to the



10.7. NEURAL NETWORKS

329

nodes in the next layer. Therefore, the topology of the multilayer feed-forward network is automatically determined, after the number of layers, and the number of nodes in each layer, have been specified by the analyst. The basic perceptron may be viewed as a single-layer feed-forward network. A popularly used model is one in which a multilayer feed-forward network contains only a single hidden layer. Such a network may be considered a two-layer feed-forward network. An example of a two-layer feed-forward network is illustrated in Fig. 10.10b. Another aspect of the multilayer feed-forward network is that it is not restricted to the use of linear signed functions of the inputs. Arbitrary functions such as the logistic, sigmoid, or hyperbolic tangents may be used in different nodes of the hidden layer and output layer. An example of such a function, when applied to the training tuple Xi = (x1i . . . xdi), to yield an output value of zi, is as follows:





d

1










zi =wj

+ b.

(10.70)










j




j=1

1 + e−xi










The value of zi is no longer a predicted output of the final class label in {−1, +1}, if it refers to a function computed at the hidden layer nodes. This output is then propagated forward to the next layer.


In the single-layer neural network, the training process was relatively straightforward because the expected output of the output node was known to be equal to the training label value. The known ground truth was used to create an optimization problem in least squares form, and update the weights with a gradient-descent method. Because the output node is the only neuron with weights in a single-layer network, the update process is easy to implement. In the case of multilayer networks, the problem is that the ground-truth output of the hidden layer nodes are not known because there are no training labels associated with the outputs of these nodes. Therefore, a question arises as to how the weights of these nodes should be updated when a training example is classified incorrectly. Clearly, when a classi-fication error is made, some kind of “feedback” is required from the nodes in the forward layers to the nodes in earlier layers about the expected outputs (and corresponding errors). This is achieved with the use of the backpropagation algorithm. Although this algorithm is not discussed in detail in this chapter, a brief summary is provided here. The backpropa-gation algorithm contains two main phases, which are applied in the weight update process for each training instance:



  1. Forward phase: In this phase, the inputs for a training instance are fed into the neural network. This results in a forward cascade of computations across the layers, using the current set of weights. The final predicted output can be compared to the class label of the training instance, to check whether or not the predicted label is an error.





  1. Yüklə 17,13 Mb.

    Dostları ilə paylaş:
1   ...   191   192   193   194   195   196   197   198   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin