0% found this document useful (0 votes)
72 views

MATH49111/69111: Scientific Computing: 5th October 2016

This document summarizes key points from a lecture on scientific computing and numerical error: 1) Numerical solutions to real-world problems involve approximating equations to describe the problem, finding a numerical solution to the approximated equations, and interpreting the solution in context of the original problem. 2) There are two main types of numerical error - discretization error from approximating problems, and roundoff error from limitations of floating-point arithmetic. 3) Errors are quantified using norms of the difference between approximate and exact solutions, with common norms being L1, L2, and L∞ norms. Relative errors are also used to measure accuracy. 4) Roundoff error can accumulate over calculations and lead to

Uploaded by

Pedro Luis Carro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

MATH49111/69111: Scientific Computing: 5th October 2016

This document summarizes key points from a lecture on scientific computing and numerical error: 1) Numerical solutions to real-world problems involve approximating equations to describe the problem, finding a numerical solution to the approximated equations, and interpreting the solution in context of the original problem. 2) There are two main types of numerical error - discretization error from approximating problems, and roundoff error from limitations of floating-point arithmetic. 3) Errors are quantified using norms of the difference between approximate and exact solutions, with common norms being L1, L2, and L∞ norms. Relative errors are also used to measure accuracy. 4) Roundoff error can accumulate over calculations and lead to

Uploaded by

Pedro Luis Carro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

MATH49111/69111: Scientific Computing

Lecture 5
5th October 2016

Dr Chris Johnson
[email protected]
Mathematical modelling

Real-world modelling numerical Numerical interpretation Problem


. Equations
problem error error solution error solution

Solving real-world problems is a three-stage process:


1. Formulate equations that describe (an approximation of ) the
real-world problem
2. Find an (approximate) numerical solution to these equations
3. Interpret solutions in context of problem

. It is important to keep the stages separate


. We will focus on stage 2 (the easiest!)
Types of numerical error

Discretisation/truncation error
. Many problems must be approximated or discretised before
being solved numerically
. For example, we may approximate a infinite sum by many
terms of a finite sum
. The error introduced by approximating the problem is
truncation or discretisation error

Roundoff error
. Operations on floating-point (fractional) numbers are inexact
. Often we cannot solve even our approximated problem exactly
Measuring errors

. In order to quantify errors in our solutions we need to define a


measure for the error
. If ϕ∗ is an approximation to a scalar quantity ϕ then the
absolute error is defined by

∣ϕ − ϕ∗ ∣

. The relative error is defined by

∣ϕ − ϕ∗ ∣
, (when ϕ ≠ 0)
∣ϕ∣
Measuring errors

. Often our solution will be vector v = (vi ), rather than scalar,


with an approximation v⋆ .
. Rather than list the error for every component, we define a
scalar error measure with a norm
. Three norms are commonly used for absolute errors:
. The L1 norm, ∥v⋆ − v∥1 ∶= ∑i ∣v⋆i − vi ∣ ,

2
. The L2 norm ∥v⋆ − v∥2 ∶= ∑i (v⋆i − vi ) ,
. The L∞ norm ∥v⋆ − v∥∞ ∶= maxi ∣v⋆i − vi ∣, the maximum absolute
error

. Relative errors for an Lp norm are defined by ∥v⋆ − v∥p /∥v∥p


. If v represents a (discretisation of a) function, use the norm of
the function not the norm of the vector.
Roundoff error

. When using floating-point fractional numbers (float and


double), we can only store around 7 decimal digits (for float)
or 15-16 decimal digits (for double).
. Truncation of digits beyond this (called roundoff error) means
that most floating-point calculations are inexact
. Roundoff error is particularly bad when subtracting two
numbers of similar size, or when adding numbers of very
dissimilar sizes.
. Errors may be magnified repeatedly under certain sequences
of calculations (this is called numerical instability)
. We can sometimes control this by altering the algorithm or
rearranging the order of operations
Example: roundoff error in sums
We wish to evaluate
(2π)2 (2π)3 (2π)4
e−2π = 1 − 2π + − + + ⋯ = 1.86744 . . . × 10−3
2! 3! 4!
using float variables (∼ 7 decimal digits of precision)
. The sum has heavy cancellation
. the largest terms are ≈ ±80
. the final result is ≈ 10−3

. We estimate the error as arising from the largest terms:

roundoff error ≈ 10−7 × 80 ≈ 10−5

. Result obtained with float variables: 1.8714 . . . × 10−3


. Accurate only to two digits (result = O(10−3 ); error = O(10−5 ))
Example: roundoff error in sums
How can we get a more accurate answer?
1 1
e−2π = = (2π)2 (2π)3
e2π 1 + 2π + + +⋯
2! 3!

. The largest terms are ≈ 80, as before


⇒ the roundoff error from these terms is ≈ 10−5 , as before
. The value of the sum is now e2π ≈ 500
. Expect ∼ 7-digit accuracy (result = O(102 ); error = O(10−5 ))
. about as good as we could ever expect from float

. Final result obtained with float: 1.86744236 . . . × 10−3


. Reordering sums can reduce the roundoff error1
(floating-point addition is not associative!)
1
N. J. Higham, 1993. SIAM J. Sci. Comput 14(4), 783–799
Question
What will the output of this program be?
#include <iostream>

int main()
{
float f = 4.0/3.0;
double d = 4.0/3.0;
double difference = f - d;

std::cout << difference << std::endl;


}

A. A number around 10−7


B. A number roughly between −10−7 and 10−7
C. A number around 10−15
D. A number roughly between −10−15 and 10−15
E. 0
Question
What will the output of this program be?
#include <iostream>

int main()
{
float f = 4.0/3.0;
double d = 4.0/3.0;
double difference = f - d;

std::cout << difference << std::endl;


}

A. A number around 10−7


B. A number roughly between −10−7 and 10−7
C. A number around 10−15
D. A number roughly between −10−15 and 10−15
E. 0
Discretisation/truncation error
. Many problems are impossible to solve exactly on a computer
. Infinite expressions:

1 1
γ = ∑ [ − log (1 + )]
k=1 k k

(we cannot sum an infinite number of terms)


. Continuous problems:
1 ′′
f ′′′ + ff =0
2
(we cannot represent f exactly, only an approximation)

. To solve these problems we must approximate them as a


discrete, finite problem that can be solved numerically.
. This approximation introduces error, often more significant
than roundoff error.
Numerical error: calculating a derivative
How do we calculate the derivative f ′ (x) of a function f (x)?
We cannot directly evaluate the limit

f (x + h) − f (x)
f ′ (x) = lim .
h→0+ h
However, writing f local to x as a Taylor series,
h2
f (x + h) = f (x) + h f ′ (x) + f ′′ (x) + . . .
2
we find
f (x + h) − f (x) h
= f ′ (x) + f ′′ (x) + . . .
h 2
≈ f (x) when h ≪ 2f ′ (x)/f ′′ (x)

We approximate f ′ (x) by choosing a small but finite h.


Numerical error: calculating a derivative
Our approximate derivative is, calculated with double variables, is
f (x + h) − f (x)
fˆ′ ∶= ,
h
Example:

f (x) = sin(x), x = 1, f ′ (x) = cos(1) = 0.5403 . . .

0.6

fˆ′

0.5

0.4
10−16 10−12 10−8 10−4 100
h
Numerical error: calculating a derivative
f (x) = sin(x), x = 1, f ′ (x) = cos(1) = 0.5403 . . .

0.6

fˆ′

0.5

0.4
10−16 10−12 10−8 10−4 100
h
Our numerical estimate fˆ′ is
. apparently accurate over wide range of h < 1
. very inaccurate for h > 10−2 and h < 10−14
Numerical error: calculating a derivative

Truncation error
f (x + h) − f (x) h
= f ′ (x) + f ′′ (x) + O(h2 )
h 2
Approximation of f has a truncation error of ≈ (h/2)f ′′ (x)

Roundoff error
. f (x) and f (x + h) are accurate only to a relative accuracy є
. є is the machine precision (≈ 10−7 for float, ≈ 10−16 for double)
. The absolute error in f (x) and f (x + h) is therefore δ ≈ f (x)є
. We are therefore calculating a value in the range

f (x + h) − f (x) ± 2δ f (x + h) − f (x) 2δ
= ±
h h h
Numerical error: calculating a derivative
The combined truncation and roundoff error is:
h ′′ 2δ
f (x) +
2 h
100
Measured
Error estimate
10−4

∣ fˆ′ − f ′ ∣

10−8

10−16 10−12 10−8 10−4 100


h

. Truncation error dominates for h ⪆ δ ≈ 10−8

. Roundoff error dominates for h ⪅ δ ≈ 10−8
Comparison of floating-point variables

. Values of float or double variables are rarely exact


. Comparison of values
. Avoid if (a == b) (fails unless values are identical)
. Prefer if(std::abs(a-b) < eps) for some eps ≪ a, b

. Floating-point loop indices


. Avoid: for (double d=0; d<=1; d+=0.1)
. Prefer:
for (int i=0; i<=10; i++)
{
double d=i*0.1;
// ...
}
Numerical error: summary
Discretisation/truncation error
. Error from solving a discretised version of a continuous/infinite
problem
. Minimised by:
. careful choice of discretisation
. ‘finer’ discretisation (often slower)

Roundoff error
. Error from finite precision of floating-point (float/double)
numbers.
. Minimised by:
. careful choice of algorithm
. use of double rather than float
. high precision arithmetic arithmetic (slower than either)

You might also like