Characterized



Yüklə 494,26 Kb.
səhifə4/4
tarix22.10.2023
ölçüsü494,26 Kb.
#159709
1   2   3   4
12-ma`ruza

Descriptive Statistics


The task of descriptive statistics is to describe states and processes on the basis of observed data. The main tools to tackle this task are tabular and graphical represen- tations and the computation of characteristic measures.
      1. Tabular Representations





n

=
Tables are used to display observed data in a clearly arranged form, and also to collect and display characteristic measures. The simplest tabular representation of a (one-dimensional) data set is the frequency table, which is the basis for many graphical representations. A frequency table records for every attribute value its (ab- solute and/or relative) frequency in a sample, where the absolute frequency fk is simply the occurrence frequency of an attribute value ak in the sample, and the rela- tive frequency rk is defined as rk fk with the sample size n. In addition, columns for the cumulated (absolute and/or relative) frequencies (also simply referred to as frequency sums) may be present. As an example, we consider the data set
x = (3, 4, 3, 2, 5, 3, 1, 2, 4, 3, 3, 4, 4, 1, 5, 2, 2, 3, 5, 3, 2, 4, 3, 2, 3),
306 A Statistics



Table A.2 A simple frequency table showing the absolute frequencies fk , the
ak fk rk .k
i=1 ri


i=1

.fi

k
relative frequencies rk , and the cumulated absolute and
1 2 2 = 0.08 2 2 = 0.08

25

25
2 6 6 = 0.24 8 8 = 0.32


i=1

4 5 5 = 0.20 22 22 = 0.88

.
relative frequencies .k h 25 25






and

k

ri , respectively

25

25



i=1 i
3 9 9 = 0.36 17 17 = 0.68



25

25




25

25
5 3 3 = 0.12 25 25 = 1.00



a1

a2

a3

a4

.

b1 8

3

5

2

18

b2 2

6

1

3

12

b3 4

1

2

7

14




10

8

12

44



Table A.3 A contingency table for two attributes A and B


. 14

which may be, for instance, the grades of a written exam at school.1 A frequency table for this data set is shown in Table A.2. Obviously, this table provides a much better view of the data than the raw data set as it is shown above, which only lists the sample values (an not even in a sorted fashion).


A two- or generally multidimensional frequency table, into which the (relative and/or absolute) frequency of every attribute value combinations is entered, is also called a contingency table. An example of a contingency table for two attribute A and B (with absolute frequencies), which also records the row and column sums, that is, the frequencies of the values of the individual attributes, is shown in Ta- ble A.3.
      1. Graphical Representations


Graphical representations serve the purpose to make tabular data more easily com- prehensible. The main tool to achieve this is to use geometric quantities—like lengths, areas, and angles—to represent numbers, since such geometric properties are more quickly interpretable for humans than abstract numbers. The most impor- tant types of graphical representations are:




1In most of Europe it is more common to use numbers for grades, with 1 being the best and 6 being the worst possible, while in the United States it is more common to use letters, with A being the best and F being the worst possible. However, there is an obvious mapping between the two scales. We chose numbers here to emphasize that nominal scales may use numbers and thus may look deceptively metric.

A.2 Descriptive Statistics 307


Fig. A.1 Pole (a) and bar chart (b) and frequency polygons (c) for the data shown in Table A.2




Fig. A.2 Area chart for the data shown in Table A.2




pole/stick/bar chart


Numbers, which may be, for instance, the frequencies of different attribute values in a sample, are represented by the lengths of poles, sticks, or bars. In this way a good impression especially of ratios can be achieved (see Figs. A.1a and b, in which the frequencies of Table A.2 are displayed).



area and volume charts


Area and volume charts are closely related to pole and bar charts: the difference is merely that they use areas and volumes instead of lengths to represent numbers and their ratios (see Fig. A.2, which again shows the frequencies of Table A.2). However, area and volume charts are usually less comprehensive (maybe except if the represented quantities are actually areas and volumes), since human be- ings usually have trouble comparing areas and volumes and often misjudge their numerical ratios. This can already be seen in Fig. A.2: only very few people cor- rectly estimate that the area of the square for the value 3 (frequency 9) is three times as large as that of the square for the value 5 (frequency 3).



frequency polygons and line chart


A frequency polygon results if the ends of the poles of pole diagram are connected by lines, so that a polygonal course results. This can be advantageous if the at- tribute values have an inherent order and one wants to show the development of the frequency along this order (see Fig. A.1c). In particular, it can be used if numbers are to be represented that depend on time. This particular case is usually referred to as a line chart, even though the name is not exclusively reserved for this case.
308 A Statistics

Fig. A.3 A pie chart (a) and a stripe chart (b) for the data shown in Table A.2


Fig. A.4 A mosaic chart for the contingency table of Table A.3


Fig. A.5 A bar chart for the contingency table of
Table A.3



pie and stripe chart


Pie and stripe charts are particularly well suited if proportions or fractions of a total, for instance, relative frequencies, are to be displayed. In a pie chart propor- tions are represented by angles, and in a stripe chart by lengths (see Fig. A.3).




mosaic chart


Contingency tables (that is, two- or generally multidimensional frequency tables) can nicely be represented as mosaic charts. For the first attribute, the horizontal direction is divided like in a stripe diagram. Each section is then divided accord- ing to the second attribute along the vertical direction—again like in a stripe di- agram (see Fig. A.4). Mosaic charts can have advantages over two-dimensional bar charts, because bars at the front can hide bars at the back, making it difficult to see their height, as shown in Fig. A.5. In principle, arbitrarily many attributes can be displayed by subdividing the resulting mosaic pieces alternatingly along the horizontal and vertical axis. However, even if one uses the widths of the gaps
A.2 Descriptive Statistics 309


Fig. A.6 A simple scatter plot

and colors in order to help a viewer to identify attribute values, mosaic charts can easily become confusing if it is tried to use to many attributes.




histogram


In principle, a histogram looks like a bar chart, with the only difference that the domain of the underlying attribute is metric (numerical). As a consequence, it is usually impossible to simply enumerate the frequencies of the individual attribute values (because there are usually too many different values), but one has to form counting intervals, which are usually called bins or buckets. The width (or, if the domain is fixed, equivalently the number) of these bins has to be chosen by a user. All bins should have the same width, since histograms with varying bin widths are usually more difficult to read—for the same reasons why area charts are more difficult to interpret than bar charts (see above). In addition, a histogram may only provide a good impression of the data if an appropriate bin width has been chosen and onto which values the borders of the bins fall (see Sect. 4.3.1).



scatter plot


A scatter plot displays a two-dimensional data set of metric attributes by interpret- ing the sample values as coordinates of a point in a metric space (see Fig. A.6). A scatter plot is very well suited if one wants to see whether the two represented quantities depend on each other or vary independently (see also Sects. A.2.4 and 8.3).
Examples how graphical representations can be misleading—a property that is sometimes (all too often actually) exploited to convey a deceptively favorable or unfavorable impression, in particular in the press and in advertisements—can be found in the highly recommended books [6, 8].
Yüklə 494,26 Kb.

Dostları ilə paylaş:
1   2   3   4




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin