Aviation Data Mining

Yüklə 0,64 Mb.

Pdf görüntüsü

səhifə	3/4
tarix	02.01.2022
ölçüsü	0,64 Mb.
	#46664

1 2 3 4

Aviation Data Mining

Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal, Vol. 2 [2015], Iss. 1, Art. 3

http://digitalcommons.morris.umn.edu/horizons/vol2/iss1/3

2.6

Hidden Markov Models (HMM) and Hid-

den Semi-Markov Models (HSMM)

To understand Hidden Markov Models we must first un-

derstand Markov chains. A Markov chain is a sequence of

states such that the probability of a transition from one state

to the next is dependent on the current state. A Markov

chain in aviation would be the probability of going from one

maneuver to the next (e.g. the pilot is more likely to stop

the aircraft after touching down than stopping the aircraft

after taking off.)

An HMM is a hidden Markov chain (the x level in Fig-

ure 4) with observable states (the o level in Figure 4) that al-

low us to infer the most probable state of the Markov chain.

In the instance of aviation, the Markov chain is hidden as

we are measuring the maneuvers of the aircraft, x

, based

on the inputs of the pilot, o

. For instance, the pilot turning

the yoke to the left, pulling the yoke back slightly, and ap-

plying left rudder would be the observable state, o

, and the

hidden state would be the aircraft turning to the left, x

Figure 4: An example of a Hidden Markov Model

with state transition probabilities ommitted [6].

An HSMM is an HMM that accounts for the change of

transition probabilities between states over the duration of

a state. This is necessary since the duration between actions

could possibly classify the pattern as anomalous. An abnor-

mally long time between a plane touching down and the

plane stopping is an example of a sequence that is anoma-

lous due to duration.

Figure 5:

An example of a Hidden Semi-Markov

Model with state transition probabilities ommit-

ted [6].

METHODS

Next, we will explore different methods of data mining

being used in the aviation industry today. Two of the three

methods search for anomalies in flight recording data. Anoma-

lous data in this context is defined as “unusual order of ac-

tions, unusual duration between actions, forgotten actions,

etc”[6].

3.1

Data Mining Flight Recording Data Using

Multiple Kernel Learning

As discussed earlier in the background section 2.1, the

data to be analyzed can be a mixture of discrete and contin-

uous data. To find anomalies in such mixed data, we must

introduce Multiple Kernel Learning (MKL) methods. Using

multiple kernel learning allows us to analyze discrete and

continuous data simultaneously.

The kernels for discrete and continuous data are combined

using the function:

k(−

→

, −

→

) = ηK

(−

→

, −

→

) + (1 − η)K

(−

→

, −

→

)

In this function K

is the kernel over discrete sequences,

and K

c

is the kernel over continuous sequences. The con-

stant η is the weight of the kernel which was 0.5 for this

research [2]. We use the following function to determine the

discrete kernel:

(−

→

, −

→

) =

|LCS(−

→

, −

→

−

→

−

→

The value of K

is equal to the length of the Longest

Common Subsequence (LCS) of two sequences divided by

the square root of the product of the two sequences length

x

). Consider an example of an LCS made from nonsensical

data:

ABB CBB AC

AB A BA A C B

ABBAC

The LCS is found by using the Hunt-Szymanski algorithm,

which is explained in greater detail in [3].

We use the same function to find the continuous kernel,

but we first preprocess the continuous data to make it dis-

crete. To do this, a variant of the Sample Aggregate ap-

proXimation (SAX) representation was used on the data.

To get the SAX representation, we find the averages of set

intervals along the time series. An example of this is shown

in Figure 6.

Figure 6: The SAX representation of data. This con-

tinuous time series was translated to the sequence

baabccbc [5].

To be able to test the effectiveness of the MKL approach,

the Orca and SequenceMiner algorithms were used as a base-

line. The Orca algorithm specializes mainly in the finding

of patterns in continuous data. SequenceMiner finds pat-

terns in discrete data [2].

To initially test the Multiple

Kernel Anomaly Detection (MKAD) algorithm, data was

Pagels: Aviation Data Mining

Published by University of Minnesota Morris Digital Well, 2015

generated with 12 pre-determined anomalies. Of these 12

pre-determined anomalies, 3 were continuous, and 9 were

discrete. On the generated data, Orca was unable to detect

any discrete faults, but did detect every continuous fault;

SequenceMiner was able to detect 89% of discrete faults,

but no continuous faults; and the MKL method was able to

detect all of both types of faults.

The real world data analyzed using the Multiple Kernel

Anomaly Detection method was a set of 3500 flights con-

sisting of 160 parameters, sampled once per second for an

average flight duration of 1.7 hours [2].

3.2

Data Mining Flight Recorder Data using

HMM and HSMM

Hidden Markov Models and Hidden Semi-Markov Mod-

els are analyzed in this paper to gauge their effectiveness

in finding anomalous patterns in flight recording data. As

discussed earlier, HMMs have a distinct disadvantage versus

HSMMs, as HSMMs have the ability to account for duration

of states, whereas HMMs do not.

The two methods use a dataset of “110 landings under reg-

ular operating conditions” from a flight simulator to define

normal operation. This data came from a flight simulator

called FlightGear, which is introduced in Section 2.1 [6]. For

these simulations, there were 12 discrete pilot commands

being recorded. Five different types of anomalous landings

were then created using FlightGear:

1. Throttle is kept constant and flaps are not put down.

The rest of flight is the same as in normal case.

2. No initial throttle increase, the rest of operation is nor-

mal.

3. The flight is similar to normal, except that the flaps

are not put down.

4. At the end of the flight the brakes are not applied, the

rest of operation is normal.

5. Pilot overshoots the airport runway and lands some-

where behind it.

Each of those scenarios were repeated 10 times for 50

anomalous scenarios.

The log of the probability of a sequence divided by the

length of the sequence was then found to determine the like-

lihood of a the sequence. If a sequence of states were found

to be anomalous, the probability of each state, given the se-

quence of states before it, was used to find the anomalous

state [6].

A simple set of synthetic data was used to check that the

HSMM was able to detect anomalous state durations and

HMM was not. This data set had 25 sequences with nor-

mal duration between states, and 25 of the same sequences,

but with abnormal duration between states. The ability of

HSMM to detect anomalous state durations can be seen in

Figure 7.

To interpret a Receiving Operating Characteristic (ROC)

curve, one must know that as the line is followed from (0, 0)

to (1, 1), the threshold is being changed for how the data is

classified. The knotted line depicts the ability of HSMM to

detect anomalous state durations. Since the area under the

knotted line approaches the coordinate (0, 1), we can see

that HSMM has a threshold value that will produce mini-

mal false positives and catch most true positives. However,

the solid line shows that HMM is fairly unreliable at any

threshold level.

Figure 7: Detection of anomalous state duration of

HMM and HSMM [6].

3.3

Data Mining Aviation Incident Reports

Conventional methods of text classification are found in-

effective when used on aviation incident reports [8]. This

is due to the limited amount of information given on each

incident report. An especially difficult task is the classifica-

tion of minority classes due to the limited number of sam-

ples available for training. A minority class is a cause that

accounts for less than 10% of the incidents. This paper ex-

plores the use of a semi-supervised bootstrapping algorithm

produced to more effectively label causes in aviation incident

reports with an emphasis on minority classes.

The aviation incident reports for this section came from

the Aviation Safety Reporting System (ASRS) database [8].

There are 14 potential labels (or causes of the incident) for

the data.

These labels are called shapers.

Examples of

these shapers are familiarity, physical environment, phys-

ical factors, preoccupation, and psychological pressure [8].

To assign shapers to incident reports, algorithms commonly

search for words, referred to as expanders, which are indica-

tive of certain labels.

The data taken from the ASRS database includes descrip-

tions and opinions of the persons filing the incident report.

These descriptions and opinions are written out in abbre-

viated words and often contain improper grammar. This

data must be preprocessed to be able to be processed by the

algorithms. To do this, the abbreviations are mapped to

their expanded forms. For example, “HAD BEEN CLRED

FOR APCH BY ZOA AND HAD BEEN HANDED OFF

TO SANTA ROSA TWR” expanded to “had been cleared

for approach by ZOA and had been handed off to santa rosa

tower” [8].

Once the data is finished being preprocessed, it can be

run through the baseline software and the bootstrapping

algorithm. This bootstrapping algorithm is able to classify

the text by adding key words (expanders) from pre-labeled

data to a set.

It then checks to see if the reports being

labeled have more than 3 words from the set of key words.

If the report being labeled has 3 or more words in common

Yüklə 0,64 Mb.

Dostları ilə paylaş:

1 2 3 4