Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə313/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   309   310   311   312   313   314   315   316   ...   423
1-Data Mining tarjima

CHAPTER 15. MINING DISCRETE SEQUENCES




GRADE DISTRIBUTION

GRADE DISTRIBUTION







A=80%

A=20%










B=20%

B=80%







0.99




0.01

0.90




DOER

SLACKER



















0.10










SOME EXAMPLES OF STUDENT GRADE SEQUENCES










AAABAAAABAAAA

DOER (VERY COMMON)










BBBBABBBABBBB

SLACKER (LESS COMMON)










AAABAAABBABBB

DOER TURNS SLACKER (VERY RARE)










ABABABABABABA

UNPREDICTABLE (EXTREMELY RARE)







Figure 15.6: Generating grade sequences from a Hidden Markov Model


values are reversed. Although these probabilities are explicitly specified here for illustrative purposes, they need to be learned or estimated from the observed grade sequences for the different students and are not known a priori . The precise status (state) of any student in a given week is not known to the analyst at any given time. These grade sequences are, in fact, the only observable outputs for the analyst. Therefore, from the perspective of the analyst, this is a Hidden Markov Model, which generates the sequences of grades from an unknown sequence of states, representing the state transitions of the students. The precise sequence of transitions between the states can be only estimated for a particular observed sequence.

The two-state Hidden Markov Model for the aforementioned example is illustrated in Fig. 15.6. This model contains two states, denoted by doer and slacker, that represent the state of a student in a particular week. It is possible for a student to transition from one state to another each week, though the likelihood of this is rather low. It is assumed that set of initial state probabilities governs the a priori distribution of doers and slackers. This distri-bution represents the a priori understanding about the students when they join the course. Some examples of typical sequences generated from this model, along with their rarity level, are illustrated in Fig. 15.6. For example, the sequence AAABAAAABAAAA is most likely gener-ated by a student who is consistently in a doer state, and the sequence BBBBABBBABBBB is most likely generated by a student who is consistently in slacker state. The second sequence is typically rarer than the first because the population mostly contains3 doers. The sequence AAABAAABBABBB corresponds to a doer who eventually transitions into a slacker. This case is even rarer because it requires a transition from the doer state to a slacker state, which has a very low probability. The sequence ABABABABABABA is extremely anomalous because it does not represent temporally consistent doer or slacker behavior that is implied by the model. Correspondingly, such a sequence has very low probability of fitting the model.


A larger number of states in the Markov Model can be used to encode more complex scenarios. It is possible to encode domain knowledge with the use of states that describe different generating scenarios. In the example discussed earlier, consider the case that doers sometimes slacks off for short periods and then return to their usual state. Alternatively,








  • The assumption is that the initial set of state probabilities are approximately consistent with the steady state behavior of the model for the particular set of transition probabilities shown in Fig. 15.6.


Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   309   310   311   312   313   314   315   316   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin