f (T, Uj )
|
|
f (T, Uj ) =
|
|
|
|T |
|
|
|
|
Then, the anomaly score of the training sequence Ti with respect to the test sequence V is defined by subtracting the relative frequency of the training sequence from the test sequence. Therefore, the anomaly score A(Ti, V, Uj ) is defined as follows:
ˆ ˆ
A(Ti, V, Uj ) = f (V, Uj ) − f (Ti, Uj )
The absolute value of the average of these scores is computed over all the sequences in the database D = T1 . . . TN . This represents the final anomaly score.
A useful output of this approach is the specific subset of comparison units specified by the user that are the most anomalous. This provides intensional knowledge and feedback to the analyst about why a particular test sequence should be considered anomalous. A method called TARZAN uses suffix tree representations to efficiently determine all the anomalous subsequences in a comparative sense between a test sequence and a training sequence. Readers are referred to the bibliographic notes for pointers to this method.
15.5 Hidden Markov Models
Hidden Markov Models (HMM) are probabilistic models that generate sequences through a sequence of transitions between states in a Markov chain. Hidden Markov Models are used
15.5. HIDDEN MARKOV MODELS
|
515
|
for clustering, classification, and outlier detection. Therefore, the applicability of these mod-els is very broad in sequence analysis. For example, the clustering approach in Sect. 15.3.4.2 uses Hidden Markov Models as a subroutine. This section will use outlier detection as a specific application of HMM to facilitate understanding. In Sect. 15.6.5, it will also be shown how HMM may be used for classification.
So how are Hidden Markov Models different from the Markovian techniques introduced earlier in this chapter? Each state in the Markovian techniques introduced earlier in this chapter is well defined and is based on the last k positions of the sequence. This state is also directly visible to the user because it is defined by the latest sequence combination of length k. Thus, the generative behavior of the Markovian model is always known deterministically, in terms of the correspondence between states and sequence positions for a particular input string.
In a Hidden Markov Model, the states of the system are hidden and not directly visible to the user. Only a sequence of (typically) discrete observations is visible to the user that is generated by symbol emissions from the states after each transition. The generated sequence of symbols corresponds to the application-specific sequence data. In many cases, the states may be defined (during the modeling process) on the basis of an understanding of how the underlying system behaves, though the precise sequence of transitions may not be known to the analyst. This is why such models are referred to as “hidden.”
Each state in an HMM is associated with a set of emission probabilities over the symbol
In other words, a visit to the state j leads to an emission of one of the symbols σi ∈ Σ with probability θj (σi). Correspondingly, a sequence of transitions in an HMM corresponds to an observed data sequence. Hidden Markov Models may be considered a kind of mixture model of the type discussed in Chap. 6, in which the different components of the mixture are not independent of one another, but are related through sequential transitions. Thus, each state is analogous to a component in the multidimensional mixture model discussed in Chap. 6. Each symbol generated by this model is analogous to a data point generated by the multidimensional mixture model. Furthermore, unlike multidimensional mixture models, the successive generation of individual data items (sequence symbols) are also not independent of one another. This is a natural consequence of the fact that the successive states emitting the data items are dependent on one another with the use of probabilistic transitions. Unlike multidimensional mixture models, Hidden Markov Models are designed for sequential data that exhibits temporal correlations.
To better explain Hidden Markov Models, an illustrative example will be used for the specific problem of using HMMs for anomaly detection. Consider the scenario where a set of students register for a course and generate a sequence corresponding to the grades received in each of their weekly assignments. This grade is drawn from the symbol set Σ = {A, B}. The model created by the analyst is that the class contains students who, at any given time, are either doers or slackers with different grade-generation probabilities. A student in a doer state may sometimes transition to a slacker state and vice versa. These represent the two states in the system. Weekly home assignments are handed out to each student and are graded with one of the symbols from Σ. This results in a sequence of grades for each student, and it represents the only observable output for the analyst. The state of a student represents only a model created by the analyst to explain the grade sequences and is, therefore, not observable in of itself. It is important to understand that if this model is a poor reflection of the true generative process, then it will impact the quality of the learning process.
Assume that a student in a doer state is likely to receive an A grade in a weekly assign-ment with 80% probability and a B with 20% probability. For slackers, these probability
Dostları ilə paylaş: |