Aviation Data Mining

Yüklə 0,64 Mb.

Pdf görüntüsü

səhifə	2/4
tarix	02.01.2022
ölçüsü	0,64 Mb.
	#46664

1 2 3 4

Aviation Data Mining

David A. Pagels

Division of Science and Mathematics

University of Minnesota, Morris

Morris, Minnesota, USA 56267

pagel093@morris.umn.edu

ABSTRACT

We explore different methods of data mining in the field of

aviation and their effectiveness. The field of aviation is al-

ways searching for new ways to improve safety. However, due

to the large amounts of aviation data collected daily, parsing

through it all by hand would be impossible. Because of this,

problems are often found by investigating accidents. With

the relatively new field of data mining we are able to parse

through an otherwise unmanageable amount of data to find

patterns and anomalies that indicate potential incidents be-

fore they happen. The data mining methods outlined in this

paper include Multiple Kernel Learning algorithms, Hidden

Markov Models, Hidden Semi-Markov Models, and Natural

Language Processing.

Keywords

Aviation, Data Mining, Multiple Kernel Learning, Hidden

Markov Model, Hidden Semi-Markov Model, Natural Lan-

guage Processing

INTRODUCTION

On January 31st, 2000 a plane travelling from Puerto Val-

larta, Mexico to Seattle, Washington dove from 18,000 feet

into the Pacific Ocean, losing 89 lives. The cause of this

accident was found to be “a loss of airplane pitch control

resulting from the in-flight failure of the horizontal stabi-

lizer trim system jackscrew assembly’s acme nut threads.

The thread failure was caused by excessive wear resulting

from Alaska Airlines’ insufficient lubrication of the jackscrew

assembly”[2].

The cause of this accident was predictable

through analysis of flight data recordings. There are many

other incidents that would also be preventable through anal-

ysis of the flight data recordings.

Data mining is a broad field of data science that was de-

veloped to make predictions on future data based on pat-

terns found in collected data. Finding patterns in aviation

data manually is impracticable due to the mass amount of

data produced every day. Data mining has been able to

This work is licensed under the Creative Commons Attribution-

Noncommercial-Share Alike 3.0 United States License. To view a copy

of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or

send a letter to Creative Commons, 171 Second Street, Suite 300, San Fran-

cisco, California, 94105, USA.

UMM CSci Senior Seminar Conference, December 2014

Morris, MN.

start addressing this problem. Although they are not yet

optimized for mining of aviation data in their current state,

some common data mining methods, such as kernel meth-

ods, text classification, and Hidden Semi-Markov Models,

are being explored. Kernel techniques have been largely de-

veloped around either discrete or continuous data. This lim-

itation makes it unsuited for use on the combined discrete

and continuous data collected in aviation. Hidden Markov

Models are limited to analyzing sequences without the abil-

ity to take into account the duration of actions. Aviation

incident reports often contain a small amount of informa-

tion per report, while current methods of text classification

requires large amounts of descriptive data. Although these

approaches are not optimal for aviation data, we can use

these concepts to produce new approaches for data mining.

In section 2 of this paper we will be explaining the con-

cepts necessary to understand the approaches to aviation

data mining outlined in sections 3 and 4.

In section 3,

three different methods of data mining in aviation will be

introduced, and the methods explained. The first of these

three methods is data mining using Multiple Kernel Learn-

ing, which finds patterns in combined discrete and contin-

uous data. The second of the three methods compares the

effectiveness of the Hidden Markov Model versus the Hid-

den Semi-Markov Model in detecting anomalies. The third

method analyzes the effectiveness of a text classification al-

gorithm. Section 4 will discuss the relative success found in

the results of these methods, and section 5 will summarize

the effectiveness of the methods.

BACKGROUND

To discuss several methods of data mining used in aviation

today, we need to understand several data mining concepts.

These concepts include supervised, semi-supervised, and un-

supervised learning; text classification; Natural Language

Processing (NLP); Support Vector Machines (SVMs); Hid-

den Markov Models (HMMs); Hidden Semi-Markov Models

(HSMMs); and kernels. To summarize, we classify the data

by searching for key words in text (NLP), by finding clus-

ters of data (kernels), or by observing the probability of a

sequence of events (HMMs and HSMMs).

2.1

Aviation Data

We show implementation of these methods on three types

of aviation data in this paper. The first is Flight Record-

ing Data collected by the flight data recorder. The flight

data recorder is informally known as the black box. Planes

equipped with flight recording data typically record up to

Pagels: Aviation Data Mining

Published by University of Minnesota Morris Digital Well, 2015

500 variables of data per second for the duration the plane is

being operated [2]. Some of the variables described in these

flight data recordings are time, altitude, vertical accelera-

tion, and heading [1]. Some of these variables are discrete

and some are continuous.

The second type of data is synthetic data. This is data

generated with flight anomalies intentionally placed in the

data to test the abilities of the algorithms to recognize the

anomaly. These anomalies are referred to as dispersed anoma-

lies. These anomalies may be an unconventional sequence

of events, an unusual duration between events, etc. Some of

the synthetic data used in this paper is data generated from

a robust flight simulator, FlightGear. The FlightGear simu-

lator is often used in the aviation industry and in academia

due to its accuracy [6].

The third type of data is aviation incident reports. These

reports do not have any strict conventions, do not require

pilots to use specific terms, and include narratives. Since

this data is not uniform, we must find a method to determine

the relevant and important data.

2.2

Labels and Labeled Data

A label is a descriptive word assigned to data based on

some property of the data.

The labels in this paper are

called shaping factors, or shapers, of an aviation incident.

Examples of shapers in an aviation incident might include

illness, hazardous environment, a distracted pilot, etc.

2.3

Supervised, Semi-Supervised, and Unsu-

pervised Learning

There are many methods of finding a function to describe

data. This function is commonly called the model, as it is

made to model some set of data. Three such methods in-

clude supervised, semi-supervised, and unsupervised learn-

ing. Supervised learning uses labeled data to form the func-

tion. Semi-supervised learning uses some labeled data along

with some unlabeled data to form the function. Unsuper-

vised learning uses no labeled data to form the function. The

term supervised in this context means that the labels for the

data have already been found and are being used to con-

struct the new model in a somewhat predictable way. The

set of data used in supervised and semi-supervised learning

is called the training set.

2.4

Natural Language Processing

Natural Language Processing (NLP) is a field of computer

science focused on gathering meaningful data from text gen-

erated by humans. Aviation incident reports are not uni-

form, as they are filled out by humans. To get meaningful

data from these reports, we first have to identify the overall

picture of the data. This process is called text classification.

Text classification is a general term and there are several dif-

ferent methods of text classification. The research outlined

in this paper classifies text by using some prelabeled incident

reports. Using the reports and the shapers associated with

these reports, we can then find words in the reports that

are commonly associated with a shaper. These words are

referred to as expanders. While these expanders are being

found, we can label unlabeled reports that are likely to be

associated with a shaper if it contains a minimum number

of expanders.

2.5

Kernels and Support Vector Machines

A Support Vector Machine (SVM) classifies new data into

one of two categories. This data is represented by vectors

which are denoted by an arrow over a variable. It does this

by separating the data with a hyperplane. A hyperplane is

a line/plane of regression that best separates the two cate-

gories of data. This hyperplane is constructed by the SVM.

For example, the hyperplane in Figure 1 is the line separat-

ing the two clusters. Sometimes, an SVM is unable to pro-

duce a hyperplane. When this is the case, a kernel trick is

used. A kernel trick maps the plane into a higher dimension

so that a hyperplane may be found by the SVM [Figure 2].

The hyperplane in the left image of Figure 3 is the plane

separating the two clusters, it is then shown mapped back

into two dimensions in the right image of Figure 3. These

clusters are considered labeled after the hyperplane is con-

structed. The label is determined by the location of the data

point relative to the hyperplane. A kernel is a function used

to find the similarity between unlabeled data and the data

points, and label it accordingly.

Figure 1: Linearly separable data [4].

Figure 2: In the case of non-linearly separable data,

we can use a kernel trick to map the data to a higher

dimension [4].

Figure 3: Once the data is mapped to a higher di-

mension, we find a hyperplane to separate it [4].

Yüklə 0,64 Mb.

Dostları ilə paylaş:

1 2 3 4