The Newton Raphson Algorithm for Function Optimization

Yüklə 96,77 Kb.

Pdf görüntüsü

tarix	07.01.2017
ölçüsü	96,77 Kb.
	#4852

The Newton Raphson Algorithm for Function

Optimization

Introduction

Throughout this course we will be interested in calculating maximum likelihood estimates

(MLEs). Such estimates are often extremely complicated nonlinear functions of the observed

data. As a result, closed form expressions for the MLEs will generally not exist for the models

we are working with.

The Newton Raphson algorithm is an iterative procedure that can be used to calculate

MLEs. The basic idea behind the algorithm is the following. First, construct a quadratic

approximation to the function of interest around some initial parameter value (hopefully

close to the MLE). Next, adjust the parameter value to that which maximizes the quadratic

approximation. This procedure is iterated until the parameter values stabilize.

These notes begin with the easy to visualize case of maximizing a function of one variable.

After this case is developed, we turn to the more general case of maximizing a function of k

variables.

The Newton Raphson Algorithm for Finding the Max-

imum of a Function of 1 Variable

2.1

Taylor Series Approximations

The ﬁrst part of developing the Newton Raphson algorithm is to devise a way to approximate

the likelihood function with a function that can be easily maximized analytically. To do this

we need to make use of Taylor’s Theorem.

Theorem 1 (Taylor’s Theorem (1 Dimension)). Suppose the function f is k + 1 times

diﬀerentiable on an open interval I. For any points x and x + h in I there exists a point w

between x and x + h such that

f (x + h) = f (x) + f (x)h +

2

f (x)h

+ · · · +

k!

f

[k]

(x)h

(k + 1)!

[k+1]

(w)h

k+1

(1)

It can be shown that as h goes to 0 the higher order terms in equation 1 go to 0 much

faster than h goes to 0. This means that (for small values of h)

f (x + h) ≈ f (x) + f (x)h

This is referred to as a ﬁrst order Taylor approximation of f at x. A more accurate approx-

imation to f (x + h) can be constructed for small values of h as:

f (x + h) ≈ f (x) + f (x)h +

2

f (x)h

This is known as a second order Taylor approximiation of f at x.

Note that the ﬁrst order Taylor approximation can be rewritten as:

f (x + h) ≈ a + bh

where a = f (x) and b = f (x). This highlights the fact that the ﬁrst order Taylor approxi-

mation is a linear function in h.

Similarly, the second order Taylor approximation can be rewritten as:

f (x + h) ≈ a + bh +

2

ch

where a = f (x), b = f (x), and c = f (x). This highlights the fact that the second order

Taylor approximation is a second order polynomial in h.

2.2

Finding the Maximum of a Second Order Polynomial

Suppose we want to ﬁnd the value of x that maximizes

f (x) = a + bx + cx

First, we calculate the ﬁrst derivative of f :

f (x) = b + 2cx

We know that f (ˆ

x) = 0, where ˆ

x is the value of x at which f attains its maximum. In other

words, we know that

0 = b + 2cˆ

Solving for ˆ

x we ﬁnd that ˆ

x = −

. The second order condition is f (x) = 2c < 0. This

implies that f (−

) will be a maximum whenever c < 0.

2.3

The Newton Raphson Algorithm

Suppose we want to ﬁnd the value of x that maximizes some twice continuously diﬀerentiable

function f (x).

Recall

f (x + h) ≈ a + bh +

where a = f (x), b = f (x), and c = f (x). This implies

f (x + h) ≈ b + ch.

The ﬁrst order condition for the value of h (denoted ˆ

h) that maximizes f (x + h) is

0 = b + cˆ

Which implies ˆ

h = −

. In other words, the value that maximizes the second order Taylor

approximation to f at x is

x + ˆ

h = x −

= x −

f (x)

With this in mind we can specify the Newton Raphson algorithm for 1 dimensional

function optimization.

Algorithm 2.1: NewtonRaphson1D(f, x

, tolerance)

comment: Find the value ˆ

x of x that maximizes f (x)

i ← 0

while |f (x

)| > tolerance

i ← i + 1

← x

i−1

−

f (x

i−1

)

f (x

i−1

)

x ← x

return (ˆ

Caution: Note that the Newton Raphson Algorithm doesn’t check the second order

conditions necessary for ˆ

x to be a maximizer. This means that if you give the algorithm a

bad starting value for x

you may end up with a min rather than a max.

2.4

Example: Calculating the MLE of a Binomial Sampling Model

To see how the Newton Raphson algorithm works in practice lets look at a simple example

with an analytical solution– a simple model of binomial sampling.

Our log-likelihood function is:

(π|y) = y ln(π) + (n − y) ln(1 − π)

where n is the sample size, y is the number of successes, and π is the probability of a success.

The ﬁrst derivative of the log-likelihood function is

(π|y) =

y

π

+ (n − y)

−1

1 − π

and the second derivative of the log-likelihood function is

(π|y) = −

π

2

−

n − y

(1 − π)

Analytically, we know that the MLE is ˆ

π =

For the sake of example, suppose n = 5 and y = 2. Analytically, we know that the MLE

is ˆ

π =

= 0.4. Let’s see how the Newton Raphson algorithm works in this situation.

We begin by setting a tolerance level. In this case, let’s set it to 0.01 (In practice you

probably want something closer to 0.00001). Next we make an initial guess (denoted π

) as

to the MLE. Suppose π

= 0.55.

(π

|y) ≈ −3.03 which is larger in absolute value than our

tolerance of 0.01. Thus we set

← π

−

(π

|y)

(π

|y) ≈ 0.40857.

Now we calculate

(π

|y) ≈ −0.1774 which is still larger in absolute value than our

tolerance of 0.01. Thus we set

2

← π

−

(π

|y)

(π

|y) ≈ 0.39994.

(π

|y) is approximately equal to 0.0012 which is smaller in absolute value than our

tolerance of 0.01 so we can stop. The Newton Raphson algorithm here returns a value of

pi equal to 0.39994 which is reasonably close to the analytical value of 0.40. Note we can

make the Newton Raphson procedure more accurate (within machine precision) by setting

the tolerance level closer to 0.

The Newton Raphson Algorithm for Finding the Max-

imum of a Function of k Variables

3.1

Taylor Series Approximations in k Dimensions

Consider a function f : R

→ R that is at least twice continuously diﬀerentiable. Suppose

x ∈ R

and h ∈ R

. Then the ﬁrst order Taylor approximation to f at x is given by

f (x + h) ≈ f (x) +

f (x) h

and the second order Taylor approximation to f at x is given by

f (x + h) ≈ f (x) +

f (x) h +

h D

f (x)h

where

f (x) is the gradient (vector of ﬁrst derivatives) f at x, and D

f (x) is the Hessian

(matrix of second derivatives) of f at x.

3.2

Finding the Maximum of a Second Order Polynomial in k

Variables

Consider

f (x) = a + b x + x Cx

where a is a scalar, b and x are k-vectors, and C is a k × k symmetric, negative deﬁnite

matrix. The gradient of f at x is

f (x) = b + 2Cx

Since the gradient at the value that maximizes f will be a vector of zeros we know that the

maximizer ˆ

x satisﬁes

0 = b + 2Cˆ

Solving for ˆ

x we ﬁnd that

x = −

−1

Since C is assumed to be negative deﬁnite we know that this is a maximum.

3.3

The Newton Raphson Algorithm in k Dimensions

Suppose we want to ﬁnd the ˆ

x ∈ R

that maximizes the twice continuously diﬀerentiable

function f : R

→ R.

Recall

f (x + h) ≈ a + b h +

2

h Ch

where a = f (x), b =

f (x), and C = D

f (x). Note that C will be symmetric. This implies

f (x + h) ≈ b + Ch.

Once again, the ﬁrst order condition for a maximum is

0 = b + Cˆ

which implies that

h = C

−1

In other words, the vector that maximizes the second order Taylor approximation to f at x

x + ˆ

h = x − C

−1

= x − D

f (x)

−1

f (x)

With this in mind we can specify the Newton Raphson algorithm for k-dimensional

function optimization.

Algorithm 3.1: NewtonRaphsonKD(f, x

, tolerance)

comment: Find the value ˆ

x of x that maximizes f (x)

i ← 0

while

f (x

) > tolerance

i ← i + 1

i

← x

i−1

− (D

f (x

i−1

))

−1

f (x

i−1

)

x ← x

return (ˆ

x)

6

Yüklə 96,77 Kb.

Dostları ilə paylaş: