Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə315/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   311   312   313   314   315   316   317   318   ...   423
1-Data Mining tarjima

15.5. HIDDEN MARKOV MODELS

519

recursion. This approach requires O(n2 · m) time. Then, the overall probability is computed by summing up the values of αm(V, sj ) over all possible states sj . Therefore, the final fit F (V ) is computed as follows:


n
F (V ) = αm(V, sj )


j=1

This algorithm is also known as the Forward Algorithm. Note that the fit probability has a direct application to many problems, such as classification and anomaly detection, depend-ing upon whether the HMM is constructed in supervised or unsupervised fashion. By con-structing separate HMMs for each class, it is possible to test the better-fitting class for a test sequence. The fit probability is useful in problems such as data clustering, classification and outlier detection. In data clustering and classification, the fit probability can be used to model the probability of a sequence belonging to a cluster or class, by creating a group-specific HMM. In outlier detection, it is possible to determine poorly fitting sequences with respect to a global HMM and report them as anomalies.


15.5.3 Explanation: Determining the Most Likely State Sequence for Observed Sequence


One of the goals in many data mining problems is to provide an explanation for why a sequence fits part (e.g. class or cluster) of the data, or does not fit the whole data set (e.g. outlier). Since the sequence of (hidden) generating states often provides an intuitive explanation for the observed sequence, it is sometimes desirable to determine the most likely sequence of states for the observed sequence. The Viterbi algorithm provides an efficient way to determine the most likely state sequence.


One approach for determining the most likely state path of the test sequence V = a1 . . . am would be to compute all the nm possible sequences of states (paths) in the HMM, and compute the probability of each of them, based on the observed sequence, symbol-generation probabilities, and transition probabilities. The maximum of these values can be reported as the most likely path. Note that this is a similar problem to the fit probability except that it is needed to determine the maximum fit probability, rather than the sum of fit probabilities, over all possible paths. Correspondingly, it is also possible to use a similar recursive approach as the previous case to determine the most likely state sequence.


Any subpath of an optimal state path must also be optimal for generating the corre-sponding subsequence of symbols. This property, in the context of an optimization problem of sequence selection, normally enables dynamic programming methods. The best possible state path for generating the first r symbols (with the rth state fixed to j ) can be recursively computed in terms of the corresponding best paths for the first (r − 1) observable symbols and different penultimate states. Specifically, let δr(V, sj ) be the probability of the best state sequence for generating the first r symbols in V and also ending at state sj . Then, the recursive computation is as follows:




δr(V, sj ) = M AXin=1δr−1(V, si) · pij · θj (ar)

This approach recursively computes the maximum of the probabilities of all the n different paths for different penultimate nodes. The approach is iteratively applied for r = 1 . . . m. The first probability is determined as δ1 (V, sj ) = πj · θj (a1) for initializing the recursion. This approach requires O(n2 · m) time. Then, the final best path is computed by using


520 CHAPTER 15. MINING DISCRETE SEQUENCES


the maximum value of δm(V, sj ) over all possible states sj . This approach is, essentially, a dynamic programming algorithm. In the anomaly example of student grades, an oscillation between doer and slacker states will be discovered by the Viterbi algorithm as the causality for outlier behavior. In a clustering application, a consistent presence in the doer state will explain the cluster of diligent students.


15.5.4 Training: Baum–Welch Algorithm


The problem of learning the parameters of an HMM is a very difficult one, and no known algorithm is guaranteed to determine the global optimum. However, options are available to determine a reasonably effective solution in most scenarios. The Baum–Welch algorithm is one such method. It is also known as the Forward-backward algorithm, and it is an application of the EM approach to the generative Hidden Markov Model. First, a description of training with the use of a single sequence T = a1 . . . am will be provided. Then, a straightforward generalization to N sequences T1 . . . TN will be discussed.


Let αr(T, sj ) be the forward probability that the first r symbols in a sequence T of length



  • are generated by the model, and the last symbol in the sequence is sj . Let βr(T, sj ) be the backward probability that the portion of the sequence after and not including the rth position is generated by the model, conditional on the fact that the state for the rth position is sj . Thus, the forward and backward probability definitions are not symmetric. The forward and backward probabilities can be computed from model probabilities in a way similar to the evaluation procedure discussed above in Sect. 15.5.2. The major difference for the backward probabilities is that the computations start from the end of the sequence in the backward direction. Furthermore, the probability value β|T |(T, sj ) is initialized to 1 at the bottom of the recursion to account for the difference in the two definitions. Two additional probabilistic quantities need to be defined to describe the EM algorithm:



1   ...   311   312   313   314   315   316   317   318   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin