Observed scatter diagram and estimated least squares line
x
y
ŷ = b0 + b1x
y (actual)
ŷ (estimated)
deviation
Example from SLID 2005
According to human capital theory, increased education is associated with greater earnings.
Random sample of 22 Saskatchewan males aged 35-39 with positive wages and salaries in 2004, from the Survey of Labour and Income Dynamics, 2005.
Let x be total number of years of school completed (YRSCHL18) and y be wages and salaries in dollars (WGSAL42).
Source: Statistics Canada, Survey of Labour and Income Dynamics, 2005 [Canada]: External Cross-sectional Economic Person File [machine readable data file]. From IDLS through UR Data Library.
ID#
YRSCHL18
WGSAL42
1
17
62500
2
12
15500
3
12
67500
4
11
9500
5
15
38000
6
15
36000
7
19
70000
8
15
47000
9
20
80000
10
16
28000
11
18
65000
12
11
48000
13
14
72500
14
12
33000
15
14.5
6000
16
13.5
62500
17
15
77500
18
13
42000
19
10
36000
20
12.5
21000
21
15
41000
22
12.3
52500
YRSCHL18 is the variable “number of years of schooling”
y = β0 + β1 x. x is the independent variable (on horizontal) and y is the dependent variable (on vertical).
β0 and β1 are the two parameters that determine the equation of the line.
β0 is the y intercept – determines the height of the line.
β1 is the slope of the line.
Positive, negative, or zero.
Size of β1 provides an estimate of the manner that xis related to y.
Positive Slope: β1 > 0
x
y
β0
Δx
Δy
Example – schooling (x) and earnings (y).
Negative Slope: β1 < 0
x
y
β0
Δx
Δy
Example – higher income (x) associated with fewer trips by bus (y).
Zero Slope: β1 = 0
x
y
β0
Δx
Example – amount of rainfall (x) and student grades (y)
Infinite Slope: β1 =
x
y
Infinite number of possible lines can be drawn. Find the straight line that best fits the points in the scatter diagram.
Least squares method (ASW, 469)
Find estimates of β0 and β1 that produce a line that fits the points the best.
The most commonly used criterion is least squares.
The least squares line is the unique line for which the sum of the squares of the deviations of the y values from the line is as small as possible.
Minimize the sum of the squares of the errors ε.
Or, equivalent to this, minimize the sum of the squares of the differences of the y values from the values of E(y). That is, find b0 and b1 that minimize: