Oberon-Kurs Kap13 Timing

	Local copy of this page. The page pointed by the link above may be more up to date, if it is not broken, you should use it.

[Home]

Home Up Intro Contents Chapter 1 2 3 4 5 6 7 8 9 10 Design Assert Timing EBNF Report Pas Last Changed: Nov. 19, 1997

This is a conversion from Oberon text to HTML. The converter software is still under development, and some features or information may be missing in this converted version. HTML hypertext facilities are not yet active in this document. To exploit the interactive facilities, use Oberon System 3 and the source of this text, available as ASCII-coded Oberon System 3 documents. A previous version is also available for Oberon V4. To access this and other additional material use ftp.

For the convenience of our students, most of this information and the related material is in German. Sorry if this is not one of your languages.

Einführung in die Programmiersprache Oberon.

G. Sawitzki <gs@statlab.uni-heidelberg.de>

13 Timing and Optimization

This is a preliminary version. Please do not circulate. Comments to <gs@statlab.uni-heidelberg.de>
For recent material, see <http://statlab.uni-heidelberg.de/>.

In this chapter, we discuss some aspects of the problem of timing and optimization of program. More generally, the topic of this chapter is the usage of resources used by a program.

Time consumption, or mor general, consumption of resources, may have several components. Comparison is made easier, if we transform consumption of resources to a general scale. Our personal preference is to use human time consumed as a general base scale. On the other side, resource usage may depend on computational parameters. If we have a model for the computation, comparison is made simpler if we can combine these parameters in a general measure of complexity, usually referred to as operations of some abstract machine. Time consumption is then modelled as operations*speed + setup time. Historical units for speed are MIPS= million instructions per second, where instructions refer to the instruction set of the classical DEC VAX, or FLOPS= floating point operations per second, where "operations" still needs and additional definition.

Other resources can often be converted to human time needed. For example, if you use an internet browser which requires xxx MB of RAM above your usual requirements, add the time you need to work to pay xxx MB of RAM, plus the "system time"(=time needed to order, pay and install the RAM) to the cost of the software. If the software is used for a group, add the time for training, configuration and support.

We make two fundamental assumptions. The first is that a program is developed and used. Timing and optimizing a program has to take into account both. Sometimes, development and usage may lead to different aspects. For example, time optimizing a program may be time consuming during the devlopment process, but time saving during application. If a program is used for eternity, the application aspect will prevail. With a real life program however we have a finite scope, and there is a balance between development and application effort. For illustration, and to call attention to secondary expenses, we think of a cost model as
  cost  =  development  +  development testing
    +  maintenance  +  maintenace testing
    +  application    +  application testing
    +  replacement  +  replacement testing
We separated testing cost for each step to draw the attention to the fact that correct performance is not given for free at any stage of the software usage. Maintanance cost includes the cost of maintaining subsequent versions of the software. Replacement costs cover for the effort to install, and possibly remove versions of the software. While the development side (development and maintenace) is under control of the software developers, the application side (application and replacement) usually is subject to speculation. A helpful estimate is to factorize the cost e.g. as
  application= nr of users * expected time of usage per user
    * expected usages per time unit * mean time per usage

The second assumption is that any optimization needs to be visible. In particular, we assume a measurement process, and optimizations are judged within the limitation of this measurement process. While development and maintenance resources can (in principle) be observed directly, application and replacement resources can only be estimated or predicted.

We have several level of complexity. On the top level, we have the task as seen from the user. On the lowest level, we have machine instructions. Ideally, time calculations would add up and be consistent across all levels. In reality, we have sequence effects. One operation on a lower level may change the system state of the global system, and this may affect the performance of the next operation. As a first approach, we can use an additive model, but we always should check for violations of this assumption. Caching, pipelining and other effects may force us to use more elaborate models.

Timing and optimization is full of surprises. It is a helpful habit to formulate assumptions explicitely, and to include checks of these assumptions. Two basic assumptions generally are false. Time is no continuum. On a computer, time has a discrete structure. And time is not consistent. On a computer, two different timers will generally show inconsistent time. Here is a small program to check the timing of your Oberon installation:
  Time.Grain ~ returns the minimal time difference, over 100 runs
  Time.Clock ~ returns the number of time units per second
The source code is in Kurs/Time.Mod.

Exercises:

Repeat the measurements on the grain size. Is there any variation? Compare with the documentation of your Oberon installation. Is the result plausible?

Is there any variation in the clock measurement? The clock measurement output has an additional line which marks whether a measurement is below or above the overall mean. Do any peculiarities (patterns etc.) occur? Try to formulate you observations on the measurement process explicitely. Can you verify whether your observations reliable, or whether they just reflect artifacts?

Possibilities for optimization will be discussed first. Timing questions are discussed at the end of this chapter.

Optimization

The main omptimization is to avoid a computation at all. This cannot be over-emphasized: if a computation is not necessary, don't do it.

If you have to do a computation, a list of common optimization techniques can be applied. In principle, these optimizations can be be done by a compiler. Unfortunately, not all compilers perform these common optimizations, and proper documentation of the optimization done by individual compilers is rare.

Although the common optimizations can be done automatically, the complexity of the necessary optimization algorithm sometimes forbids extensive optimization on the compiler level. On the other hand, the programmer may have a view of the intent and structure of the program which can be used to call the available optimizations into action. In many cases a clean program design is the most important step towards an effective implementation. In addition to this, a combination of an optimization-aware programming style and automatic optimization by the compiler is helpful. So we give a list of common optimization techniques, and add it as a check list for your programs or compilers at the end of this chapter.

Computation timing is not easy to interpret. The common expression is to represent an optimization gain as reduction factor time with opimization/time without optimization. Howewer the time without optimization enters as a norming constant here, which can be affected severely by other implementation issues. The other common choice is to assign an implementation independent complexity to a problem, and to standardize using this complexity.

The programmer view of a code is generally in terms of statements and a raw estimated can be derived by adding the statment costs. Hiden costs are introduced by implicit actions. In particular,
- procedure calls may involve an overhead for setting up a procedure frame and for passing parameters.
- control statements, like FOR...DO...END, may involve a control overhead.
- memory allocation, like NEW, may involve houskeeping and garbage collection
- access to components of structured data types, like A[i,j,k] may involve an address calculation (usually the evaluation of a polyynomial to get Addr(A)+ (LEN(A,2)*LEN(A,1))*i+LEN(A,1)*j + k)*SIZE(A[i,j,k]) )

To compare optimizations, we may refer to a coarser segmentation. We will refer to a basic block as a group of statements which is entered by a unique initial statement and left at a unique statement, with no exception or branching except at the end of the block. This means a block is a "run" of statements which is always executed as a whole and can be treated as a self contained unit. Since there are no branches in a basic block, all statements in the basic block are executed the same number of times, and time consumption factorizes as
  nr of block executions * block execution time

As a programmer, we have essentially two possibilities for optimization: reduction of operation cost and code motion.

Reduction of operation cost can be achived by replacing expensive operations or data structures by cheaper ones with equivalent system status. The benefit is obvious, if we do not lose information. Some possibilities are exploited by nearly all compilers. For example, for integers on common CISC processors
  i*2  i+i  ASH(i,1)  are equivalent,
          the latter being the cheapest
  i:=i+k    INC(i,k)  are equivalent,
          the latter being the cheaper
Others are rarely found in compilers, and hand optimization is helpful. For example, we can check that a vector (x,y) has norm zero by
  (x*x)+(y*y) = 0
or
  (x=0)&(y=0)
the latter of course being cheaper (and more stable). But a compiler will rarely be able to make this reduction.
Reduction of operation cost strongly depends on the actual performance of the processor. For example, on common RISC processors - both i+i and ASH(i, 1) take one cycle and can be considered equivalent. More advanced processors use one or more instruction pipelines, and the performance may depend on the previous state of the pipeline, making reduction of operation cost a non trivial task.

Other reductions of operation cost may have a trade off. For example, we may have the choice of representig a data structure as a linked list or as an array. Using a linked list reduces the polynomial address calculation by a step through a pointer, which may be more time efficient if it comes to higher dimensions. On the other hand, we have to spend the additonal memory for a pointer per cell. Again the actual performance depends on the architecture of the processor. If the processor uses a data cache, with a linked list the cache behavior may change drastically if the temporal locality is very bad.

Code motion can be used to move operations to places where they are executed less frequently, or at less time critical instances. The most prominent example is to move loop-invariant code outside a loop.

These possibilities can be applied to a range of targets. Simplest of all: Constant Elimination. The aim is to reduce run time (and possibly code size) by avoiding repeated evaluation of expression with constant value. It comes in two flavours. Expressions which are proper constants and can be evaluated at compile time. Expressions which are not proper constants, but are invariant during repeated invocations can be stored to temporary variables to avoid repeated calculations.

- Constant Folding
Expressions consisting of constants only can be evaluated at compile time.
Hand optimization: move to constant section. Potential draw back: most compilers evaluate constants using the precision of the compile time environment, not the run time environment, In particular, mathematical constants like e, pi, may have different values at run time than at compile time, if you use cross-platform environment,
Implementation: generally implemented for INTEGER and BOOLEAN constants, rarely for other cases.
Floating point constant folding is very tricky since the order of evaluation is critical and on some architectures even the use of registers or memory locations might change the result. Compilers usually dont do it.

- Common Subexpression Elimination
Common subexpressions are parts of a block which have a matching form and do not change their values between computations.
If the same expression occurs repeatedly, the program can be simplified.
Hand optimization: Introduce a temporary variable to hold the common subexpression.
Implementation: Used to reduce space in OMI-based compilers.
One important point is the following: Compilers might optimize common subexpressions if the subexpression does not contain procedure calls e.g in the following example the compiler might translate statement (1) into statement (2)

(1)  IF x > pi/180*45 THEN x := pi/180*45 END;
(2)  t := pi/180*45; IF x > t THEN x := t END;

However, if the expression contains procedure calls then the compiler does not optimize common subexpressions due to possible sideeffects. In the following example the programmer has to perform the common subexpression elimination by hand:

(1)  IF x > f(y, z) THEN x := f(y, z) END;
(2)  t := f(y, z); IF x > t THEN x := t END;

- Loop invariant code motion
This is a special case of common subexpressions. If a subexpression does not change its value during repeated evaluations of a loop, it can be moved before the loop. For example
  WHILE (i >limit -2) DO... END;
can be changed to
  t:=limit-2; WHILE i>t DO... END;
if limit is a loop invariant. Identification of loop invariants, reduction of induction variables and code motion are usually used in combination.

- Induction variables
Induction variables are variables in loops which keep a fixed relation to the loop index. Since typically these variables are evaluated very often, reduction of operation cost can be very effective. A first attempt may be to reduce operation cost by the introduction of new auxilliary loop constants. If that is not feasible, the next attempt is to keep the computation time for induction variables at a minimal level, e.g. by sorting nested loops, that is by using code motion.
For an example, consider the code fragment
  i := 0;
  WHILE i < j DO
    a[i] := 0;
    INC(i)
  END;

A straightforward compiler generates:

  i := 0;
  WHILE i < j DO
    adr := baseadr(a);
    idx := size(basetyp)*i;
    Mem[adr+idx] := 0;
    i := i + 1
  END;

A compiler with loop invariant code motion generates:

  i := 0; adr := baseadr;
  WHILE i < j DO
    idx := size(basetyp)*i;
    Mem[adr+idx] := 0;
    i := i + 1
  END

A compiler with loop invariant code motion and induction variables generates:

  i := 0; adr := baseadr;
  WHILE i < j DO
    Mem[adr] := 0;
    i := i + 1;
    adr := adr + size(basetyp)
  END

Techniques for elimination of constants may be obstucted by aliasing, that is by different names used for the same logical entity. A common strategy to avois part of this problem is propagation of copies. This strategy means to use the oldest reference path if multiple reference paths are possible. This is a stretegy to find "hidden" constants. For example,
  x:=y;
  a[z]:=b;
  a[w]:=x;
is transformed to
  x:=y;
  a[z]:=b;
  a[w]:=y;

The next group of strategies addresses proper use of the control flow of a program.

- Range-tracking and dead check removal
Range tracking means to move range checks for variables or expressions from a lower to a more global level. For example, access to an array in a loop does not need a range check at every step if we can guarantee that the maximum and minimum indices are within the proper bounds. More generally, dead check removal means to remove check code which can be proven to never fail. For example, assignments do not need an overflow check if the global minimum and maximum is under strict control.
If there is compiler control on a statement level, appropriate compiler options can be used to avoid compiler generated check in save code regions. If the compiler can only be controlled on a module level, this asks for a clean separation of "save" modules and common ones.

- Pre-calculation of array access
Reduces the operation cost for the special case of array access. For example, repeated access of the form [variable+constant] may be removed by either shifting the array one position, or using a redifined variable. In general, a change of the data structure should be considered in these instances as an alternative.

- Dead code elimination
Code fragments which never can be reached can be eliminated. If the code is guarded by a constant expression, this is usually done by the compiler, as is the elimination of procedures which are never called. If the code is a partial code of a procedure guarded by a variable expression, hand removal may be nessary. Dead code often is generated by propagation of constants. ("IF debugging THEN END;").
With cached memory access, code elimination may have the side effect to improve locality, thus reducing cache miss rate and leading to better time performance.

- Optimization of Boolean expressions
This is achieved by arranging Boolean expressions to stop evaluation as soon as the result is established. In Oberon, shortcut evaluation is done by definition. In other languages a careful use of shortcut operators or nested IF statements may be necessary.
Care has to be taken that no intended side effects are broken, as when the evaluation of one part of the expression changes intentionally some parameter which is used later on.

- Tabulation
Pre-calculation and storage of tabulated value may be used to reduce time in search and decision situation. This can be rarely achieved in an automatic way. Of course tabulation is only shifting the problem, and there is a memory/speed trade off which has to taken into account.

- Special casing and use of recursive structures
A more dynamic variant of tabulation. Care should be taken to avoid implicit use of recursive structures. A tail recursive propagation can always be replaced by an iteration, usually with less overhead.

- Loop unrolling
Loop unrolling reduces the number of loop passes by repeating the loop code inline. This allows to adapt to particularities of processors (e.g. using LONGINT access instead of 2*INTEGER access). High-quality compilers may provide loop unrolling, but it is still not common place.

- Software pipelining
The goal of software pipelining is to overlap the execution of several loop iterations, just like a processor pipeline overlaps the execution of several instructions. Software pipelining is often considered superior to loop unrolling as it allows a higher degree of overlap.

- Variable placement
This has been an important technique for CPUs with a small number of registers, and the relevance for programmers decreased with the use of RISC processors and caches. Some languages and compilers allow programmer hints for variable placement, such as reserving registers for variable. Variable placement and register allocation may be done more effectively by a compiler.

- Algebraic/analytic reduction (Strength reduction)
This is one of the more powerful forms or reduction of operation costs. If we can replace an expression by an equivalent one whose evaluation consumes less time, we contribute to time optimization.
The most trivial of these reductions, such as removal of operations on neutral elements (x+0=0+x=x) are done by most compilers. Other require programmer support. Example: To get equidistiant points on a circle, we can use
  fak := 2 * pi / n;
  FOR i := 1 TO n DO
    NodeTable[i].Posx := -Math.Sin(i * fak);
    NodeTable[i].Posy := Math.Cos(i * fak)
  END;
We can instead use
  fak := 2 * pi / n;
  nsin:=Math.Sin(fak);
  ncos:=Math.Cos(fak);
  x:=1;y:=0;
  FOR i := 1 TO MaxUsedNode DO
    NodeTable[i].Posx := ncos*x-nsin*y;
    NodeTable[i].Posy := nsin*x+ncos*y;
    x:=NodeTable[i].Posx; y:=NodeTable[i].Posy
  END;
thus avoiding repeated calculations of sin and cos at the expense of two more multiplications - generally a cheaper operation. On the other side, roundoff errors may be different for both implementations and should be taken into account.

To conclude, we should mention types of optimization which are usually exclusive for the compiler. It may make sense to optimize a compiler for a certain hardware architecture or implementation. On this level, the compiler can try for

- Peephole optimization: local machine level variant of reduction of operation cost

- Instruction scheduling: arrange the instruction sequence to match the architecture of CPUs with independent execution units, avoiding interlocks and wait cycles

- Cache optimizations: arrange data and code access with respect to make most efficient use of any caches available.

Timing

At a first look, timing is the simple process to get the time needed for an operation. Unfortunately, time measurement is not a smooth process, and the granularity of computer time adds some difficulty. Moreover, most operations are not directly observable, which makes timing more complicated. Finally, each measurement changes the process under observation, and we have to work to separate the effects of the timing from those of the process under investigation.

Performance is usually reported as operations per time, so that bigger means better. For a critical analysis it is more convenient to use the reciprocal scale, time per operation. Ultimately we want to compare times needed, and assess of error propagation is simpler for this scale. So we calculate coefficient of variations on the time scale, and any error bounds for speed are calculated on the time scale and then converted to speed.

As an indicator of the quality of a measurement, we use the relative error, or coefficient of variation
  stddev of estimate / true value of parameter
which indicates the relative precision if the estimator is unbiased. Of course since we do not know the true value off hand, we have to resume to estimators like
  estimated stddev of estimate / est. mean value of estimate.
We should not start discussion unless we can guarantee a coefficient of variation of less than 10%, and we should get below 1% if we want to get serious.

To get a usable description, we have to introduce some formalization. We assume that the compute process to be timed takes some time
  C  compute process time
If some preparation or cleanup is needed, we collect this as
  P  preparation and clean up time
We can choose the number of repetitions, and may have the freedom to run the preparation without doing the proper process calculation. With N repetitions of the compute process, preparation included, and M additional repetitions of the preparation process, we have measurement for times
  T_{C  ;
e_{i=1..N (C_{i+P_{i) + err_{i

  T_{P  ;
e_{i=1..M P_{i + err_{i

where err is the error term reflecting variations in the process to be timed.
The measurements will not give the true time of the processes, but may be
affected by some additional measurement error. We denote the measurements
by T'_{C, T'_{P respectively.

In theory, we will assume that we take samples under equivalent conditions,
so the systematic terms are constant and these errors are independent, and
the error terms in each row have a common identical distribution. In practice,
we have to take care to establish these conditions. If we do not have additional
information or measurements on the preparation process, we cannot separate
timing of the preparation and the proper computing process, and all we can
achieve is a joint estimate for both.

If we have measurements on the preparation process, we can use

  C^ = (T'_{C - N 7
(T'_{P/M)) / N = T'_{C/N - T'_{P/M

as an estimate for C. Writing down this estimate is straight forward approach.
Work has to be done to judge the quality of this estimator. How reliable
are the figures it produces? If we can establish stochastic independence
of the timing for C and P, variance decomposes as

  Var(C^) = Var(T'_{C)/N²
+ Var(T'_{P)/M².

Clock discretization

Since reading an registering the time changes the time behaviour, we try
to keep the proper timing and registering operations to a minimum and include
it in the preparation and clean up time P or ignore it. For any time T we
want to take, the best we can get is a time reading T' = t1-t0 within the
precision of the clock.

The time reading error T-T'= d0-d1 is best modelled as a stochastic error.
The errors d0 and d1 are related to T in a fixed relation: d1 is the remainder
of (d0+T) div delta. This relation can be understood completely. For
simplicity, we will scale by delta, that is we assume delta=1 and T= k +
epsilon for some k=0,... and epsilon<1. If d0<1-epsilon,
T'=k and the measurement error is -epsilon. If d0>1-epsilon, T+d0>
km, so T'=k+1 and the measurement error is 1-epsilon. So given
epsilon we have this error dependency on d0:

Now d0 reflects the starting time of our measurement. If we can assume a
starting time indpendent of the clock cycle, we are led to a uniform distribution
for d0. If the execution length does not depend on the starting time, we
have independence between epsilon and d0, If we can assume a uniform distribution
for the "true discretization error" epsilon, this gives a triangular
distribution for the observable discretization error d0-d1 on [-delta,delta],
with mean = 0, and variance 1/6 delta². We denote
the mean of the discretization error d0-d1 by E(d) and the variance by Var(d).

We can of course try to avoid the initial discretization offset and start
simulation after synchronizing with the clock, forcing d0 ;
0. This may however be counterproductive. We end at the conditional distribution
for d0=0, which is uniform with a worse bias of E(d)=delta/2 but
a more favorable variance of Var(d)=1/12 delta².

Immediate timing precision

From observations on T' we get an estimated mean E(T') and estimated variance
V(T'). Estimating the mean of T by E(T'), we write T as T'-d. We can estimate
the variance of T' from observations, and we have the true variance of d,
but unfortunately T' and d are not independent due to the common (unobservable)
term epsilon. So for the variance of T, we have to use a conservative
estimate

  V(T) = (Std(T')+Std(d))²where Std(T'), Std(d) are estimates of the standard deviation
of T' resp. d. If we estimate a coefficient of variation from this, we must
take into account that we do not know the true mean for T. We know the true
mean of E(d), and for practical purposes where we use the coefficient of
variation to derive confidence intervals, we can take an adjusted estimate
of V(T')+Var(d) for Var(T) to give

  (V(T))^1/2 /
(E(T') - E(d))

as an estimate for the coefficient of variation, and thus as an indicator
of the relative precision. If we have to guarantee for the estimate of the
coefficient of variation itself, we would take a more conservative estimate
for the mean of T, yielding

  V(T)^1/2
/ (E(T') - delta).

as an estimate for the coefficient of variation.

If we can chain execution of operations, we effectively increase the time
of execution from T' to an order of NT' without adding additional time truncation
error. This leads to a coefficient of variation which combines the usual
sample size effect with a deflation of the discretization error, leading
to an improved estimator on the original scale.

Scaffold timing precision

More often, we have to pepare an initial setup before we can do the proper
calculation: we build up a "scaffold" to prepare for the real timing. So
the time T=T_{C is a sum T_{C=C+P with empirical timing
T'_{C. We will usually have the possibility to time the preparation
without doing the proper computation, giving a timing T'_{P for
P, but we will not be able to time C without preparation. A plug in estimator
is to estimate C by C^:=T'_{C-T'_{P. The above considerations
apply to T'_{C and T'_{P. To judge the quality of C^, we
try to avoid interaction of the measurement processes for T_{C and
T_{P. As estimate, we plug in the adjusted estimators for the variance
with independence assumption as above, yielding

  V(C); V(T_{C)
+ V(T_{P)

and use

  V(C)^1/2 / C^

as an estimate for the coefficient of variation, or

  V(C)^1/2 / (C^-27delta)

as a conservative estimate for the relative precision.

If we can chain execution of operations, we now have a choice. We can, but
we need not use the same number of repetitons for the measurement of the
scaffold P as for the proper measurement with target P+C. We can use N repititions
for the full computation and M repititions for the setup timing only, yielding

  E(T_{C) = N7(C
+ P) + E(d)

  E(T_{P) = M7
P + E(d)

Assuming independece of the measurement series, if Var_{C and Var_{P
are the original (single execution) variancences of the computing time, and
E_{C, E_{P the corresponding means, this leads to

  Var(C^); (N7Var_{C
+Var(d))/N² + (M7Var_{P+Var(d))/M²  (*)

consuming an expected compute time

  ET= E(T_{C) + E(T_{P)                          (**)

where we have to choose N and M to meet our time constraints and resources
while meeting the target. We can either minimize (*) subject to ET #
T_{limit, or we can minimize (**) while keeping the variance below
a requested limit. Any way we have to balance variance reduction for T'_{C
and T'_{P, compensating for the discretization aspect. This optimization
problem can be solved directly, but the simple rule is to keep in mind that
the total varance reduction contribution is proportional to the variance
of the components, with weight proportional to 1/effort attributed to the
components.

If the setup is a major source of variance, we should consider an approximate
decomposition

  Var(C^) ; ((N7Var_{C+Var(d))/N²
+

    (M7Var_{C
+ M7Var_{P-C +Var(d))/M²)

  =   (((N+M)7Var_{C+Var(d))(1/N²+1/M²)
+ M7Var_{P-C)

which shows that the sum N+M contributes to the reduction of the T' variance
estimate, and M is used for the T'' estimate.

Immediate timing revisited

Looking again at the immediate timing, we see that actually it has an implicit
scaffold. We can treat the setup to do the proper timing as a preparation
and apply the rules for scaffold timing.

So if timing creates considerable overhead, we can reduce the variance of
the estimates for our process to be timed by running additional empty timing
runs and correct for the timing overhead as above.

Real World Timing

In the examples above, we used coefficients of variation as an informative
measure of relative precision. This requires knowledge of mean and variance
of the target variable, and of course both are not known a priori but must
be estimated from observations, thus introducing an additional source of
error.

We can stay with a naive application of coefficients of variation, and be
pleased with small interval estimators. But we run the risk of being lead
astray by random variantion, discussing seeming timing effects which are
but an artifact introduced by incomplete knowledge of the target mean and
variance.

To avoid being lead astray by random variation, we have to use confidence
intervals. The number of indpendent observations used to determine the variance
estimate is critical for the reliabiltiy. A strict analysis and a proper
model may be too much of an effort. Instead, we use rules of thumb to get
a first approximation. Assuming a Gaussian distribution as an approximate
error distribution, and allowing for a chance of 1 in 10 to be out of bound,
i.e. a confidence level of 10%, we can use the table of the t- distribution
to get a approximate confidence intervals. With k samples to estimate the
variance (k-1 degrees of freedom) we catch the true parameter with 90% (99%)
confidence using

confidence interval =

  = estimated mean+/- t 7estimated
standard deviation

with

  k  t
(90%)  t(99%)

  2  6.3    63.7

  3  3    10

  6  2    3.5

  16  1.75    2.9

  ..  1.65    2.58

As a rule of thumb, we can use these figures as factors to get confidence
intervals from our estimates and standard deviation. To be more precise,
we can use exact quantiles from the t distribution as factors.

We did not model the exact dependence of the error contributions, but made
some simplifying plausible assumptions, and we are assuming a Gaussian distribution
as a convenient smooth approximation. So we will not have exact confidence
intervals.

But we have left vague approximations and reached explicit statements: the
calculated intervals should cover the true (unobservable) time with guaranteed
confidence of 90% (99%). This can be used to derive predictions on observable
timings. So if we have additional timing resources, we can repeat the timing
process and check the model assumptions and see whether we can improve the
timing process.

A step by step illustration

Most timing routines will have one or two groups of three-lined output.

The first of these lines gives confidence intervals based on the assuption
that you have exact (continuous time) measurement, and exact knowledge of
the error variance. This is the most simple approximation, but the most error
prone.

The second line corrects for the discretization error. It will give larger
intervals than the first one, but these are more reliable.

The third line additionally takes into account that you do not have exact
knowledge of the error variance. Instead, the error variance is estimated
from observations. Again, the assumption of an approximate Gaussian distribution
is used. If the information used to estimate the error variance is increased,
the interval limits shown here will converge to the Gaussian approximation
given in the second line. The figures given in the third line can be used
as conservative bounds for practical use.

Timing gets more complicated if we move to smaller times. The extreme case
is to time an empty loop, i.e. to time the timing overhead.

An initial solution is

  Time.Nothing

To illustrate the various choices, Time.Nothing gives an abundant report.
In a first table, the plain measurements and elementary derived statistics
are listed.

Next, the time statistic is evaluated. Confidence intervals are given for
three variants: blue eyed statistics assuming a plain Gaussian model, a gausssian
model with error correction for discretization, and a conservative correction
adding a worst-case correction for the mean and using studentized confidence
intervals.

As a complement, these results are translated to speed scale.

Next, the speed statistics (not recommended) is evaluated. As for the time
scale, we give confidence intervals for blue eyed statistics assuming a plain
Gaussian model, for a gausssian model with error correction for discretization,
and for a conservative correction adding a worst-case correction for the
mean. As a complement, these results are translated to time scale.

The recommended variant is to use a time based model with variance correction,
using derived intervals for speed estimation.

The output is not corrected for artefacts. So if the Gauss aproximation leads
to unreasonable results, you will see garbage in the output.

Time.Nothing is a parametrized command so that we can iterate executions.

We can use it as

  Time.Nothing

    <minimal nr of
time grains> <nr of repeated runs>

e.g.

  Time.Nothing 2 5

  Time.Nothing 4 5

  Time.Nothing 8 5

  Time.Nothing 16 5

to see how variance is reduced if we move away from the initial time discretization.

Given real world limited resources, we have a choice of either fixing a time
span and taking observations until the time is used up, or fixing a number
of observations and taking the time when the observations are finished. The
difference is the number of times we have to take the time. For N observations,
this is N+1 1 in the first case, and 2 in the second. If taking the time
is negligible

The simple timing loop consists of several components

  - increasing a counter

  - reading the time

  - checking limits

  - enclosing control structure.

To get an idea about the relative contribution, we can unroll the loop. For
example, we still use a counter increment of 1, but we read the time and
check the limits only in 1 of 8 steps.

  Time.Nothing8

implements this variant, with parametrizations as above

e.g.

  Time.Nothing8 2 5

  Time.Nothing8 4 5

  Time.Nothing8 8 5

  Time.Nothing8 16 5

The example illustrates the large contribution from reading the time. So
if we can avoid it, we should do.

To get a time measurement with prescribed precision, we can use a compound
strategy. Here is one strategy, aiming to get a measurement with relative
errror (coefficient of variation) of alpha=1%.

-   Get an initial timing.

  If the process is above time resolution,
take

  just one measurement.

  If it is below time resolution,
find a sufficient

  measurement count.

-   Using this measurement count,
get an initial

  estimate for the mean and variance
of the speed, and

  the time per execution.

A verbose example is

  Time.SomeThing

You can add a time limit to restrict the total time used by adding a time
limit [in seconds] such as

  Time.SomeThing 1 ~ 1 second. Not
yet implemented

Time.SomeThing tries to do a timing within this limit (but it does not guarantee
it). If you add a second parameter, it will be used as a limit for the relative
precision, for example

  Time.SomeThing 60 0.01

      ~ 1%
precision or 60 second. Not yet implemented

For the second form, the time usually will serve as an emergency exit to
avoid too lengthy calculations.}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{Exercises:

Givens' rotation is used to rotate a general vector
to the x axis. It is used as an elementary step to solve linear equations.

The file Kurs/DROTG.Mod contains two variants of Givens' rotation, taken
from the BLAS system (BLAS=basic linear algebra subroutines).

DROTGF is machine translated from the original FORTRAN source. DROTGO is
a manual translation which should be equivalent to DROTGF. Compare these
implementations. Use a random number generator to time both routines with
random argument DA,DB and compare the timing. Try to improve DROTGF.}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}} _{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{_{Home
Up
Intro
Contents
Chapter
1
2
3
4
5
6
7
8
9
10
Design
Assert
Timing
EBNF
Report
Pas}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

Local copy of this page.

Einführung in die Programmiersprache Oberon.

G. Sawitzki <gs@statlab.uni-heidelberg.de>