|
|
|
|
|
|
|
A2-1. Errors and Uncertainties. A2-2. Significant figures and rounding. A2-3. Reading digital meters. A2-4. Instrumental defects. A2-5. Calibration of instruments. A2-6. Propagation of uncertainties. |
A2-7. Combining uncorrelated measurements. A2-8. Weighted average. A2-9. Correlation. A2-10. Linear regression. A2-11. Weighted linear regression. A2-12. Reduced Chi² test. |
|
A2-1. Errors and Uncertainties. We can divide sources of error into 3 groups: |
|
Precision: The value of a random variable is meaningless without some measure of its precision or uncertainty. |
| One possible way to determine the uncertainty in an experimental result is to repeat the measurement a large number of times. This will only work if the resolution of the measuring device is considerably greater than the uncertainty of the measurement; e.g., it doesn't matter how many times you connect a digital multimeter to a resistor, it will always give the same number (provided that the temperature doesn't change). The resolution of a digital measuring instrument is usually the same as its precision (i.e., provided that the least significant digit remains stable). In many situations however, the noise in the measurement is clearly apparent to the experimenter, in which case it is possible to determine a distribution of results. |
![]() Distribition of results. |
| The complete set of all possible measurements of a random variable is called the parent population. The parent population may be finite or infinite. An example of a finite parent population is the permeability of type K transformer cores in last week's production batch. An example of an infinite parent population is the noise voltage of an RF amplifier. In most cases we do not have access to the parent population, or it is impractical or too expensive to measure the whole population. We must therefore take sample measurements and work with a smaller sample population. |
|
Sample mean: For measurements with randomly distributed errors, the best estimate for the true value of the variable is the average of the sample population. This is usually defined as the mean value: |
|
|
n |
S i=1 |
xi | Sample mean |
|
The equation above implies the convention that we have made a
set of measurements and have numbered then from 1 to n. The subscript
i is used to refer to the measurements generically, i.e., xi is an individual measurement with a unique
identifier i which has a value between 1 and n. The symbol S (capital Sigma) indicates that a summation
is to be performed on objects of the type indicated to its right;
with the range over which the summation is to be carried out
written below and above. The mean of the sample population Note that in some derivations, the subscript i may already have been used to indicate (say) parameters associated with a current sampling network. In that case, as a precaution against ambiguity, another letter can be used as the sample number when carrying out data analysis. Strictly it should not be necessary to take such precautions, because the i below the summations symbol relates only to the subscripts on the variables in the summation itself and has no global significance, but we can just as well write: |
|
|
n |
S k=1 |
xk | Sample mean |
| without changing the meaning of the statement in any way. |
|
Parent mean: The mean of the parent population is defined as the sample mean in the limit that the number of samples goes to infinity (i.e., encompasses everything). |
|
m = |
Lim |
|
n |
S i=1 |
xi |
|
| The parent mean is usually given the symbol m ("mu"), which is italicised here in an attempt to distinguish it from magnetic permeability. |
|
Deviation: All data are meaningless in the absence of some implied or actual measure of uncertainty. We may guess that real-estate agents are likely to get the dimensions of rooms right to within an inch or so; but for scientific and engineering purposes, we require some kind of rigorously defined interval which can be carried through into other calculations. The most widely used definition is that involving the mean of the squares of the deviations of the individual measurements. This gives rise to a quantity known as the variance of the data, which is given the symbol s² ("sigma squared"): |
|
s² = |
Lim |
|
n |
S i=1 |
(xi - m)² |
|
Variance |
|
Notice that the variance is defined in terms of the parent mean
m, because we can only know
it absolutely when we have included the deviation of every member
of the population into the average. Note also the reason for
averaging the squares of the deviations, the point being
that the deviations themselves can be positive or negative, but
the square is always positive. The average of the deviations
themselves (unsquared) is by definition zero. The problem with the rigorous definition of variance, is that we do not usually know the parent mean. All we have is an estimate of it in the form of the sample mean |
|
s² = |
n |
S i=1 |
(xi - |
(only true for large n). |
| The problem with this definition is that if the number of samples is only 1 it tells us that s²=0, i.e., it implies that there is no uncertainty if only one measurement is made. The reality is that we can have no confidence whatsoever in a single uncorroborated measurement, and so this definition is false. The flaw, expressed in mathematical language, is that the equation has an incorrect boundary condition, i.e., it does the wrong thing when pushed to one of the limits of its range of applicability. There only way in which we can repair it is by changing it to: |
|
s² = |
n-1 |
S i=1 |
(xi - |
Estimated variance (true for any n). |
| Dividing by n-1 instead of n has no effect if n is large, but when n®1, s²®0/0, i.e., the corrected expression tells us that uncertainty is undefined if we only make one measurement. The quantity n-1 is the number of degrees of freedom in the data, i.e., it is the number of independent data or measurements at our disposal. The reason why we must divide the square error sum by n-1 instead of n is that we have lost one degree of freedom by using the data themselves as the source of the estimate of the population mean. |
|
The quantity s, which is
defined as: s = +Ös² is known as the standard deviation of the data, and is the RMS average deviation. If it is derived from a variance using It is good practice to state all measurements (or at least those with normally distributed errors) in the form 1.231(2) is the same as 1.231±0.002 1.4097(19) is the same as 1.4097±0.0019 Note that a standard deviation usually has the same units as the quantity to which it relates, hence it is appropriate to write: I = 632±3 mA [not I =632mA ±3]. One situation in which the uncertainty is not in the same units as the measurement however, is when it is expressed as a percentage. In this case it is correct to write: I = 632mA ±0.5% [but definitely not I = 632±0.5% mA]. To convert an uncertainty in % back into its native units, divide it by 100 and multiply it by the measurement to which it relates: 632mA ±0.5% = 632 ±(0.005×632) mA = 632±3 mA |
|
Manufacturing tolerances: Resistors, capacitors, and other manufactured components usually have "brick wall" tolerances associated with their nominal values; i.e., the manufacturer guarantees that the true value will definitely lie within the stated range (provided that other conditions such as operating temperature are also within the stated range). Hence, the manufacturers tolerance is (with some reservations) a 100% confidence interval, and the associated errors are not normally distributed. The actual distribution of errors then depends on the manufacturing process. If the manufacturing process is imprecise, components are tested and binned according to the preferred-value range in which the actual measured value lies. In that case, the errors have a flat distribution and the most probable error is 50% of the tolerance. For a normally distributed population on the other hand, the most probable error is s. Hence, when analysing data which are subject to both measurement errors and component tolerance errors, a reasonable approximation to the standard deviation of a component value is given by taking half of the manufacturers tolerance. Sometimes a manufacturing process has considerably greater precision than the stated component tolerances seem to imply. This is a danger area for those attempting to deduce the most probable error. The problem here is that for some components, particularly rolled capacitors and wirewound resistors, the manufacturer can save materials and maximise profits by always producing components with values which are close to the lower tolerance limit. Hence, if working with stated tolerances, it is always necessary to ask the question "can the manufacturer save money by engineering a difference between the most probable value and the nominal value?" If the answer is "yes", then the nominal value cannot be trusted and independent measurement is the only sensible course of action. |
|
A2-2. Significant figures and rounding: There is usually no point in recording a number with significant digits beyond the point where the standard deviation renders those digits meaningless. We might, for example, perform a calculation which gives the result: x = 5.13782041 ± 0.00362148 Taking the most austere attitude towards rounding, this number should be reported as: x = 5.138 ± 0.004 although, for those who dislike throwing anything away: x = 5.1378 ± 0.0036 is a little less draconian. The usual rounding rule taught in schools is that, if the digit after the last significant digit to be reported is between 5 and 9, round up; but if it is between 0 and 4, round down. This rule however only works unambiguously if we know several more digits than are to be reported. If, for example, if we wish to round the number 21.351723 to one decimal place, it is obvious that the result should be 21.4; and if we wish to round the number 21.347816 likewise, the result should be 21.3. A difficulty will only occur if we have to round a number like 21.350000, because then we have no way of deciding whether to go up or down. In that case we should decide randomly (tossing a coin will do the trick), or we can use a quasi-random rule like 'round up if the digit to be rounded is odd' (or vice versa). Fortunately, the cases where such decisions have to be made are very rare, and so the old school rule 'always round up' will not introduce significant bias into the data. A more serious problem arises when someone or something has already truncated the number we wish to round. Take for example the case where we wish to round the number 21.35 to one decimal place. How do we know whether the last 5 arose from a string like 51723 or from a string like 47816? If we always round up, we will on average add a bias of +0.25 to the least significant digit, which will introduce a systematic error admittedly smaller than, but nevertheless of the order of, the standard deviation. The solution, if forced into this position, is once again to round up or down on a random or pseudo-random basis. The solution if using a spreadsheet or writing a computer program, is to prevent the problem from occurring by formatting the output so that numbers are always given to a few more places than are significant. There is another situation in which it is not a good idea to round a number too drastically. This is when the number is to be used as a seed for the regeneration of a data set or for the prediction of new data. A typical situation is when the data in a table have been fitted to a mathematical function. The parameters determined from the fitting process may have uncertainties which suggest that they should be rounded; but if the numbers are rounded and put back into the function, they will no longer give the best estimates of the original data or good predictions for new data. There is no harm in reporting: x = 5.13782041 ± 0.0036 and it is sometimes useful to retain a long string of computed digits in this way. The only problem is that it is sometimes necessary to argue with editors and reviewers who think that they know better. The last word on rounding is: only round at the end of a computation. Never round intemediate results. A2-3. Reading digital meters: When using digital measuring instruments, the best we can usually manage is to assume that the designer of the instrument has incorporated statistically reasonable rounding rules into the firmware. This raises the question: 'what do you write down when the last digit of the reading flickers between two numbers?' Take for example, the situation where a reading flickers between 100.1 and 100.2 when measuring a resistance. If the reading is steady at 100.1, then it is reasonable to assume that a number between 100.05000... and 100.14999... is implied. If the reading is steady at 100.2, then a number between 100.15000... and 100.24999... is implied. Hence if the instrument cannot decide, then logically the number is exactly 100.15. There is no convention for recording the extra phantom digit obtained in this way, and indeed most workers round up or down arbitrarily. It is fallacious to argue that arbitrary rounding does not matter however, because many digital instruments are designed in such a way that the uncertainty is only in the last digit. This ensures that arbitrary rounding will introduce bias of the order of the standard deviation. Thus the extra digit should be included in any calculations which use the measurement; and the author's recommendation (in the event that other users of the data will wonder where it came from) is to put it in italics or a reduced font size when reporting; i.e, for the above example, the measurement can be reported as 100.15 or 100.15. |
|
A2-4. Instrumental defects: Measuring instruments suffer from some or all of the following defects: In electrical measuring instruments, scale error is due to the incorrect adjustment of a shunt or series resistor, or some other gain or sensitivity control. A scale error implies that all readings of the instrument are multiplied by some (perhaps unknown) scaling factor. Offset error occurs when an instrument does not read zero for zero input. Its effect is to introduce a systematic error which is greater for low readings than it is for high readings. For digital instruments, offset nulling is part of the calibration procedure. For moving-coil meters, the offset may change according to the physical orientation of the instrument, and it is important to check the zero adjustment before making readings. Linearity errors arise from a variety of causes and are often difficult to quantify. Sometimes the cause of non-linearity is well understood (e.g., diode detectors), and a correction function can be applied to the readings. Often however, non linearity is due to instrumental ideosyncracies, or failure to care for the instrument correctly. For moving coil instruments, linearity can be affected by steel swarf or stray parts adhering to the magnet and upsetting the uniformity of the magnetic field. Variable reference capacitors are often calibrated by means of knifing-vanes, and kinks in the relationship between dial markings and actual capacitance can occur at the boundaries between vanes. Monotonicity error is a type of range changing error. In digital instruments it occurs when changes take place in the more significant (left-most) digits of the binary number, and manifests itself as the reading or the output going down when it should go up, or vice versa. Monotonicity error limits the effective resolution of A-D and D-A converters, e.g., a converter might have an physical resolution of 16 bits, but it might only be monotonic at the 14 bit level (the last two digits must be rounded off to guarantee a monotonic output). In analogue instruments, it is due to the change in scale error which can occur on changing ranges. It manifests itself as a jump or discontinuity in the measurements, but can be controlled by careful calibration or application of a range-dependent correction factor to the readings. |
|
A2-5. Calibration of instruments: In the user manual provided with a measuring instrument, there will be a section specifying the accuracy of the instrument on its various ranges. Such specifications usually refer to scale errors and are normally expressed in %. At time of calibration, there are two possible interpretations of the specification for a particular instrument range: There are no prizes for guessing which interpretation is correct. Instruments calibrated according to the first philosophy always introduce systematic errors. The fact that there will be some uncertainty in the value of the standard, and some error in the reading of the standard, means that an instrument which appears to be working within its tolerance may not be. Exact agreement with the standard maximises the probability that the instrument will perform according to its specification. |
|
A2-6. Propagation of Uncertainties: A common situation when making use of scientific measurements is the need to compute the value of some other quantity from those measurements. To do that, we must use a mathematical formula, and the first question we need to ask is: 'how is the uncertainty in a measured quantity transmitted through a formula into the final result?'. Let us begin by considering a generalised formula with a single variable parameter: y = f(x) where the parameter x has an uncertainty dx (here we will start by using d instead of s because s is always positive by definition, whereas an uncertainty transmitted through a formula can change sign). We want to know the uncertainty of y; to which, by obvious convention, we will asssign the symbol dy. The solution to this problem is given by straightforward logic: The uncertainty of y is the rate of change of y with respect to x multiplied by the uncertainty in x, i.e.: dy = dx dy/dx All we have to do is find the derivative dy/dx and multiply it by dx. So far, so good, but what we have at this stage usually relates to very trivial situations. An example might be that of measuring a current in Amps, and wanting to know it and its standard deviation in milliAmps. This is so easy it barely warrants a thought, but if we state the problem formally we can examine the underlying process: Let y be the current in mA and x the current in A. Then: y = 1000x dy/dx = 1000 dy = 1000dx In this case the derivative is positive, so we can substitute standard deviations directly in place of the deltas: sy = 1000sx Things get a little more complicated when, for example, powers of x or logarithms are involved, because then the uncertainty which x transmits into y depends on the value of x, but the general principle is the same. Take for example: y = 1/x In this case: dy/dx = -1/x² hence: dy = -dx/x² which means, for this particular relationship, that large values of x transmit less uncertainty into y than small values of x. We can also see from this example why the deltas are not the same as standard deviations; because we have the situation in this case that if dx is positive, then dy is negative. We can force the general technique to work with standard deviations however, either by using the magnitude of the derivative (i.e., by just ignoring the sign), or algebraically by using the RMS value of the transmitted uncertainty, i.e.: sy = +Ö[(sx dy/dx)²] The point being to make the answer positive by squaring it and the taking the square root. Now let us consider the general situation, which is frequently encountered when the problem at hand is sufficiently difficult to warrant some programming or the production of a spreadsheet. This is when several measured quantities of different origin must be combined in the calculation of a desired quantity. We can express this situation generically by writing: y = f(p, q, r, s . . . ) where p, q, r, s, etc. are variables with uncertainties dp, dq, dr, ds, etc.. If we consider each variable in turn, we can find its contribution to the uncertainty in y by taking the partial derivative of the function with respect to the variable (i.e., we differentiate the function with all of the other variables held constant). If we assign the symbol dyp to the contribution which the uncertainty in p makes to the uncertainty in y, then: dyp = dp ¶y/¶p and so on. Thus we can determine all of the partial error contributions from all of the parameters, but we are left with the problem of how to add them together. Our first guess might be that we should simply add them directly, i.e.: dy = ( dp ¶y/¶p ) + ( dq ¶y/¶q ) + ( dr ¶y/¶r ) + . . . . . The failing here is that derivatives can be positive or negative; hence for some formulae and data there may be total cancellation of all errors, and this is physically unreasonable. We might therefore try to remove this flaw by using magnitudes: dy = | dp ¶y/¶p | + | dq ¶y/¶q | + | dr ¶y/¶r | + . . . . . This looks better, but if we translate the statement into ordinary language, we will find that it is still not correct. The statement tells us to compute the uncertainty in y on the assumption that each of the parameters is most likely to be displaced from its parent mean by an amount equal to its uncertainty (which is reasonable), but then assumes that the displacements will always conspire in such a way that their contributions increase the total. This amounts to a worst-case combination of errors, and the probability that it will occur in practice is very low. The reality is that the difference between a variable and its parent mean is just as likely to be negative as it is to be positive. Hence there is just as much chance that the uncertainties of a pair of variables will partially cancel as there is that they will augment. We must deal with this issue by combining the parameter uncertainties in the way which is most probable; and indeed, if the uncertainties are standard deviations, we must combine them to produce a result which is also a standard deviation. This requires us to consider the extent to which the parameter uncertainties are correlated (i.e., literally, co-related). If an experiment is properly designed, then by definition, all possible steps have been taken to eliminate systematic errors. This does not necessarily mean that all systematic errors have been eliminated, but it does mean that such errors have been eliminated as far as is known. In that case, it is sensible to assume that all of the remaining errors are random. If an error in a parameter is random, then its magnitude or direction cannot be influenced by the error in any other parameter. If the deviations in two or more variables are able to change independently, each without affecting any other, then they are uncorrelated; and, by virtue of their independence they must exist in different dimensions. Hence the uncertainties in any pair of parameters can be considered to be like vectors which move at right angles to each other. Objects having this property of mutual independence are said to be orthogonal. If we transmit the uncertainties through a formula and turn them into partial error contributions, the partial errors are still orthogonal, but are now all measured in the same units and can be treated like the components of an actual vector. The the proper way to combine them therefore is to take the vector sum, i.e., to calculate the magnitude of the resultant by using Pythagoras' theorem. Hence the correct way to find the sum of a set of uncorrelated error contributions is: dy = +Ö[( dp ¶y/¶p )² + ( dq ¶y/¶q )² + ( dr ¶y/¶r )² + . . . . . ] Notice that this sum is an RMS value. Therefore it can be used with standard deviations instead of arbitrarily defined uncertainties, and a standard deviation will be obtained as the result: sy = +Ö[( sp ¶y/¶p )² + ( sq ¶y/¶q )² + ( sr ¶y/¶r )² + . . . . . ] A practical example of the use of an error function of this type is given in section 1-31a. |
|
A2-7. Combination of uncorrelated measurements: There is one application of the error analysis procedure just outlined which should be memorised by anyone engaged in the business of making measurements. It is the answer to the question: 'how much improvement in precision can be obtained by repeating a measurement several times and taking the average?' This question of course only relates to measuring devices which are capable of resolving the noise in the quantity being measured, or which in some sense add noise to a level which is greater than the instrumental resolution. Let us first consider the case where two measurements are made of a quantity x, and the average is taken: x = (x1/2) + (x2/2). If both measurements have the same standard deviation s, and the errors are uncorrelated, then: sx = Ö[( s ¶x/¶x1 )² + ( s ¶x/¶x2 )²] where ¶x/¶x1=1/2 and ¶x/¶x2=1/2, hence: sx = Ö[( s/2 )² + ( s/2 )²] which simplifies to: sx = s/Ö2 Averaging two uncorrelated measurements improves the precision by a factor of 1/Ö2. Now consider the general case when x is the average of n measurements: x = (x1/n) + (x2/n) + (x3/n) + . . . . + (xn/n) The derivatives are all the same, i.e.: ¶x/¶xi=1/n Hence: sx = Ö[n(s/n)² ] which simplifies to: sx = s/Ön Averaging n uncorrelated measurements improves the precision by a factor of 1/Ön. Note that these techniques will only work with measurements which have genuine randomness in the least significant digits of the result. There is no use in writing down the unchanging reading of a digital multimeter several times (the uncertainty in that case must be taken from the manual or deduced by means of some calibration process), but it is valid and sensible to make the same measurement with two different digital multimeters and take the average. |
|
A2-8. Weighted average: Another important problem related to the propagation of uncertainties is: 'how do you combine two measurements having different standard deviations to obtain a meaningful average?' If one measurement has a significantly smaller uncertainty than the other, then it is not fair to the better measurement to take the straight average and bias will be introduced by doing so. On the other hand, it is a waste of data to throw away the inferior measurement. The solution is to take a weighted average, the idea being that if measurement A is n times better than measurement B, we should add n of A to 1 of B and divide by n+1. This leaves us with the problem of defining what is meant by 'better'; but it transpires, for reasons of dimensional consistency which will become apparent in the later discussion of least-squares fitting, that the variance of an observation from a normally distributed population is the proper measure of its goodness. The weight of an observation is consequently a number which gets bigger as the variance gets smaller, i.e.: wi = 1/si² The weight of an observation is a type of frequency, it is the number of times it should be included in an average relative to an observation with unit weight (i.e., a weight of 1). It does not matter that it is not an integer. There is no problem in adding 1.7 of A to 3.1 of B and dividing by 4.8. Hence the general form for a weighted average is: |
|
x = |
Swi |
|
where: wi = 1/sxi² Here we have used the summation symbol without indicating the range, an informal shorthand which means "sum over the whole range". The weighted average is more compactly written: x = (Swixi)/(Swi) Now to work out the uncertanty in x, we note that the derivatives are given by: ¶x/¶xi = wi/(Swi) Hence: sx = Ö[ (x1 w1 / Swi)² + (x2 w2 / Swi)² + (x3 w3 / Swi)² + . . . . . + (xn wn / Swi)² ] but 1/(Swi)² can be factored out of the square root bracket: sx = (1/Swi)Ö[ (x1 w1)² + (x2 w2)² + (x3 w3)² + . . . . . + (xn wn)² ] Hence: sx = [ÖS(wixi)²]/(Swi) Relative weight: The absolute weight of an observation is given by the reciprocal of its variance. There are occasions however, when the variances of the observations are not known in advance of the averaging process but there are instrumental or mathematical considerations which permit the relative variances to be computed. In such cases we can take an average using relative weights in place of absolute weights and still obtain an unbiased result. This follows by inspection of the expression for the weighted average (above), where it is apparent that multiplying all of the weights by the same arbitrary constant k (say) will not change the result: x = [S(k/sxi²)xi]/[S(k/sxi²)] = [kS(1/sxi²)xi)]/[kS(1/sxi²)] It is for this reason that we give the weights their own symbol rather than writing them explicitly as reciprocal variances: they do not have to be variances as long as they are proportional to the variances of the observations. A situation in which a weighted average is required but absolute variance might not be known in advance is that which arises when data have to be processed by means of some mathematical formula before they can be averaged. This statement is actually a description of the generalised process of mathematical modeling or data reduction. It is often the case that the input data all have the same uncertainty, but are are scaled by some non-linear function. This means that the amount by which the uncertainty is scaled will vary depending on the value of the measurement. If we give the original measurements a relative weight of 1, we can easily work out the new weight after scaling and so take the weighted average. Furthermore, we can then work out the variance of the average, and from it deduce the standard deviation of an observation of unit weight. This subject is examined in more detail in a section A2-11. |
| Now imagine the process of trying to fit the data by guesswork. To do that we would plot the data as points on a graph, and then, using a ruler and pencil, we would try to draw the best straight line through the points. That would involve sliding the ruler up and down and rotating it until the errors appeared to be scattered equally above and below the line all the way along it. To do the same mathematically (and without guesswork), we satisfy two conditions. The first is that the sum of all the errors should add up to zero (equivalent to sliding the ruler up and down), i.e.: |
|
S i=1 |
ei = 0 |
| The second is that the sum of the squares of the errors should be as small as possible (equivalent to rotating the ruler). The point here is that by making the errors add up to zero, we have found a line which passes through the mid-point of the data set. We still need to rotate the line however, because it may be that all of the residuals on one side of the mid point are positive while all of those on the other side are negative. By squaring the errors, we make them all positive, so the square error sum will only be minimised when the line is optimally embedded in the data. The process of fitting data by minimising the square error sum is called the method of least squares. |
|
To minimise the square error sum, we note that it will vary as
each parameter is varied, and that a minimum occurs when its
rate of change with respect to a parameter is zero. Hence we
want to find the two conditions: ¶Sei²/¶a = 0 and ¶Sei²/¶b = 0 To find the derivatives, we start by writing the square error sum explicitly using equation 10.1: Sei² = S(yi - a - bxi)² which expands to: Sei² = S(yi²+ a² + b²xi² -2ayi - 2bxiyi + 2abxi ) Differentiating a summation is just a matter of differentiating the individual terms and adding them together, and so: ¶Sei²/¶a = S(2a -2yi + 2bxi ) = 0 hence: S(a -yi + bxi ) = 0 The term Sa just means 'add a to itself n times', and so Sa=na, hence:
¶Sei²/¶b = S(2bxi² - 2xiyi + 2axi ) = 0 hence: S(bxi² - xiyi + axi ) = 0
b = [ (Sxiyi) - a(Sxi) ] / Sxi² Substitutuing this into 10.2 gives: Syi = na + { [(Sxiyi) - a(Sxi)](Sxi) / Sxi² } Multiplying throughout by Sxi² then gives: (Sxi²) Syi = na(Sxi²) + (Sxiyi)(Sxi) - a(Sxi)² Which gives a full solution for a by rearrangement: |
|
a = |
n(Sxi²) - (Sxi)² |
|
|
Equation 10.3 also gives
a solution for a in terms of b: a = [ (Sxiyi) - bSxi² ] / Sxi Substitutuing this into 10.2 gives: Syi = n{ [ (Sxiyi) - bSxi² ] / Sxi } + bSxi and multiplying throughout by Sxi gives: (Sxi) Syi = n(Sxiyi) - nb(Sxi²) + b(Sxi)² Which gives a full solution for b by rearrangement: |
|
b = |
n(Sxi²) - (Sxi)² |
|
|
Note that equations 10.4
and 10.5 share the same denominator.
Also notice that solutions for a and b can be obtained by calculating
four sums: Sxi,
Syi, Sxiyi,
and Sxi²;
i.e., all it takes is four columns of a spreadsheet and two simple
equations involving the totals to find the optimum regression
line. Example: Diode correction function. When using a diode detector to measure the level of an AC signal, the output voltage of the detector is always slightly lower than that predicted on the basis that the diode is a perfect rectifier. Consequently, in order to obtain accurate results, it is necessary to correct the data for the diode forward voltage drop. Correction can be achieved by making a set of measurements of diode forward voltage vs forward current and using them to determine a correction function. If a signal diode with low junction resistance, such as the 1N5711 is used, the diode forward characteristic can be fitted to the following expression (see section 6.1-8c): Vf = V0 + V1 Loge(If) This relationship holds good over several decades, and is therefore more than adequate for the correction of readings taken from a single meter range. In the spreadsheet shown below, a set of diode measurements has been entered in columns A and B, and a linear regression analysis (explained below) has determined the formula: Vf = 0.15834185 + 0.02906031 Loge(If /[mA]) Volts Note that the units of the current are included as a divisor in the logarithm bracket. The statement is meaningless unless this is done, because the quantity inside the log bracket must be dimensionless (and it should be obvious that the formula will give the wrong results if the wrong units are used). The milliameter readings in column A are nominal (i.e., uncalibrated), because this is a calibration function for a particular diode used in conjunction with a particular meter. The meter should be the same as that used for the acquisition of the experimental data to be corrected, the point being to cancel calibration errors by exploiting negative correlation as discussed earlier. |

|
(see open-document spreadsheet 1N5711.ods) In this case, defining the regression line as y=a+bx, we identify: y = Vf , a = V0 , b = V1 , and x = Loge(If) In columns C, D, and E we calculate the quantities xi, xiyi, and xi². The sums in row 13 are then calculated using the spreadsheet S function, so that: B13 = Syi , C13 = Sxi , D13 = Sxiyi , and E13 = Sxi². The fitting parameters a and b (column F) are then calculated using equations 10.4 and 10.5; where, for the data above, the number of observations n=11. A regression line has now been determined, but it would be unwise to discontinue the analysis at this point. If we use the fitting function to produce a set of calculated yi values and compare them with the observed values, the exercise becomes self-policing with regard to any mistakes which might have been made in entering the formulae. Note that the formula used in column G must use absolute addressing to pick-up a and b from cells F3 and F9. In Open Office Calc (and other spreadsheets), this is done by placing a $ symbol in front of any row or column designators which must be held constant; i.e., the seed formula for column H is: H2 = F$3 + F$9 * C2 The residuals (observed - calculated) are then obtained in column H using the seed formula: H2 = B2 - G2 The sum in cell H13 shows that the least-squares fitting procedure has indeed established the condition Sei=0. Standard deviation of fit: The calculated values from a least-squares fit are, in effect, a set of estimates for the parent means (i.e., true values) of the individual observations. Consequently, if all of the observations have the same precision, the residual errors can be treated as though they are all deviations from the same mean. This implies that we can calculate the variance of the fit and relate it to the variance of an observation as if we had made repeated measurements at that particular setting of x. To this end, colum I of the spreadsheet shown above is used to calculate the quantity Sei² (cell I13), and the standard deviation of an observation is estimated (cell I14) as: sfit = Ö[ (Sei²) / (n-2) ] The divisor n-2 is the number of degrees of freedom in the data, and is two less than the number of observations because the estimate is based on the use of two parameters (a and b) which have been extracted from the data. Once again, we can see that this divisor is correct by considering boundary conditions. If a data set consisting of only two observations is fitted to a function having two variable parameters, then the fit will be exact and the square error sum will be zero. In that case, all knowledge of uncertainties vanishes, and so the number of degrees of freedom must be n-2. In general, the number of degrees of freedom for any fitting process is the number of observations included in the fit minus the number of variable parameters. In the example above, we obtained sfit=0.00028 Volts, and it is interesting to consider how this number came about. The experiment was conducted using a moving-coil milliAmmeter, and a digital Voltmeter giving 3 decimal places on its 2V range. By gently tapping the case of the moving-coil meter with a pencil to minimise errors due to bearing friction, it was estimated that the current could be set with a precision of about ±0.2mA. These current settings were then destined to converted into logarithms; a process which, although non-linear, has the effect of scaling-down the uncertainties. Hence it was considered reasonable to ignore the setting errors, and since the voltage measurements all have near identical precision, the data were deemed suitable for a simple regression analysis. The digital voltmeter used had, in its manual, a stated accuracy of ±0.5%, ±0.001 for the 2V DC range. The first uncertainty is however primarily a measure of the expected scale error, and the second uncertainty (±1 in the last digit) is primarily the offset error. These quantities say very little about the actual precision of the measurements under the conditions encountered, and sfit is a measure of precision, not accuracy. We can however deduce the likely precision by noting that the range of voltages involved was very small (0.272 to 0.292), and that the meter is unlikely to have exhibited significant non-linearity over such a range. Therefore, the principal causes of deviation between observed and calculated values are either rounding error, or invalid assumptions about setting error, or failure of the data to conform to the model. Now, the meter reads to three decimal places, which means that there will be a maximum rounding error of ±0.0005 in any reading. On average however, we should expect a rounding error of about half the maximum, on which basis we should expect sfit=0.00025 if the deviations are due entirely to rounding. We got sfit=0.00028, which tells us that the data do conform to the model within the uncertainty of the method, and that rounding error is the primary cause of deviation. Parameter uncertainties: Parameter ESDs, given in column F, in this case serve to indicate the number of decimal places required when using a and b. These are given by: sa = sfit Ö[(Sxi²) / D] and sb = sfit Ö[n / D] where D is the denominator of equations 10.4 and 10.5: D = n(Sxi²) - (Sxi)² The derivation of these two formulae, for the general case of weighted linear regression, is given at the end of the next section. |
|
A2-11. Weighted Linear Regression. The problem with the simple linear regression process described above is that it is only valid when the precision of the dependent variable is the same for all samples. This is often not the case, and it definitely ceases to be the case if the dependent variable must be subjected to some non-linear transformation before it can be equated with y. Take, for example, a situation in which the dependent variable is governed by an exponential relationship: z = k exp(bx) where 'exp' means "e to the power of", z is the dependent variable, x is the independent variable, and k and b are parameters. We can linearise this relationship by taking the logarithms of both sides: Loge(z) = Loge(k) + bx which means that we can determine a regression line y=a+bx, with y=Loge(z) and a=Loge(k). Now let us suppose that all of the measurements of z have the same uncertainty sz.(which may or may not be known). The uncertainty of a given value of y will be: si = Ö[(sz dy/dz )²] where, in this case, dy/dz = 1/z Hence, expressed in terms of variances: si² = sz² / zi² The variance of a given yi will depend strongly on the value of the corresponding zi. Hence the deviations of the yi can no longer be treated as though they relate to the same mean. The immediate practical consequence is that simple linear regression analysis will not give the best fitting parameters because it will give equal weight to all observations. The solution to this problem is related to dimensional analysis. No two quantities can be added unless they are expressed in the same units, and so all terms in the error sum and in the square error sum must be measured in the same units. If the data are all subjected to some non-linear scaling process, then each element of each sum should be subjected to the reciprocal of this scaling process, or the arithmetic will be invalid. The most consistent action therefore is to divide each error by some external measure of relative error which has the same units. The resulting dimensionless square error sum is known as the goodness of fit (GooF) and is famously referred to as Chi squared: c² = S(ei²/si²) This can also be written: c² = S(wi ei²) where wi = 1/si² is called the weight of an observation. We have met this quantity before in the discussion of weighted averages, but now we can see that it is a normalisation parameter, i.e., it serves to scale deviations so that they all behave as though they belong to the same mean. Hence c², in this context, is the weighted square error sum. |
|
Now, for the weighted regression analysis, using equation 10.1 we have: c² = Swi(yi - a - bxi)² which expands to: c² = Swi(yi²+ a² + b²xi² -2ayi - 2bxiyi + 2abxi ) Th find the weighted least squares fit, we minimise the weighted square error sum with respect to each variable parameter, i.e., ¶c²/¶a = 0 and ¶c²/¶b = 0 hence: ¶c²/¶a = Swi(2a -2yi +2bxi) = 0 Which gives:
¶c²/¶b = Swi(2bxi² -2xiyi +2axi ) = 0 Which gives:
|
|
a = |
(Swi)(Swixi²) -(Swixi)² |
|
|
b = |
(Swi)(Swixi²) -(Swixi)² |
|
|
Once again, the expressions for the parameters have the same
denominator. Also notice that if the weights are all set to 1,
the expressions revert to the non weighted form (equations 10.4 and 10.5).
The weighted fitting procedure will work just as well with relative weights, and these can be deduced from the rule by which raw data are converted into y values. Also notice that an observation can be excluded from the fit by setting its weight to zero. This is useful when a particular observation is found to have a large residual and an illegitimate error (i.e. a mistake) or a breakdown of the model is suspected. Note that if an observation is excluded in this way, then the number of observations is reduced by one, and the number of degrees of freedom used in calculating the variance of the fit (see below) must be reduced accordingly. When programming, this requires the operation: For i=1 to n: If wi=0, n®n-1 or something equivalent. A better technique, which allows data to be excluded and re-included quickly to see what happens, is to define a fitting flag for each observation. The fitting flag can have a value of 1 or 0. The theoretical weight of an observation is multiplied by the flag to decide whether it will be included, and the number of observations is determined by taking the sum of the flags. For weighted linear regression, the input data must be given as a three-column list (xi, yi, wi); although these quantities may of course be computed according to some transformation of the raw data. We then need to compute the quantities: Swi , Swixi , Swiyi , Swixiyi , and Swixi² If using a spreadsheet, Swi is the sum at the bottom of the wi column, and so four extra columns are needed to assemble the quantities needed for equations 11.3 and 11.4. By incorporating weights, we have effectively scaled all of the sample populations so that they have the same variance. Hence, we can estimate the standard deviation of an observation of unit weight by treating all of the normalised deviations as though they are deviations from the same mean. A deviation is normalised by dividing it by its standard deviation (or by a quantity proportional to its standard deviation), i.e.: ei / si = [yi(observed) - yi(calculated) ] / si Note that this expression says that the deviation of a y value with a large uncertainty should be taken less seriously than than the deviation of a y value with a small uncertainty. Hence: sfit² = (Sei²/si²) / n where n (nu) is the number of degrees of freedom, defined as: n = n-p n being the number of observations of finite weight, and p the number of variable parameters. For a linear regression analysis, n=n-2. Now observe that: sfit² = c² / n This quantity is known as "reduced chi-squared " and has a very special propertity. It is the variance of an observation of unit weight. If the weights used in the fit are absolute, i.e., they are derived from the known variances of the observations, then: c²/n » 1 If the fit is based on absolute weights, then there must be something wrong if the variance of an observation of unit weight is not found to be approximately 1. This relationship is the basis of the reduced chi-squared test, which will be discussed in section A2-12. If we use relative weights, then the true variance of an observation of unit weight is not expected to be 1. In this case, the quantity: sfit = Ö[ (Swiei²) / (n-p) ] becomes the estimated standard deviation (ESD) of an observation of unit weight, and the ESD of any particular observation is given by: si = sfit /Öwi Note that the si is the ESD of a yi, not of a quantity (zi say) from which yi was obtained by transformation. To obtain the uncertainty of zi, it is necessary to apply the reverse transformation, i.e., if: zi = f(yi) then szi = si dz/dy |
|
Uncertainties in the fitting parameters: If the parameters obtained from a least-squares fit are to be used solely for the purpose of recreating the data, then we have little interest in them apart from plugging them into a formula to make use of the values they produce. In many situations however, the parameters themselves can be related to physical quantities, in which case we are also interested in their uncertainties. These uncertainties can be obtained by applying the normal rules relating to the propagation of uncertainties, i.e., we calculate the error contribution from the uncertainty in each observation, and add the squares of these contributions to find the variance of the parameter. This, of course, will only apply if the error contributions are uncorrelated, i.e, the calculated variance will only tell us about the precision of the parameter, it will not tell us its accuracy. Hence, for the parameters from a linear regression analysis: sa² = S(si ¶a/¶yi)² and sb² = S(si ¶b/¶yi)² but si=sfit /Öwi , hence: sa² = sfit²S[(¶a/¶yi)²/wi] and sb² = sfit²S[(¶b/¶yi)²/wi] The required derivatives are of course obtained by differentiating equations 11.3 and 11.4, a task facilitated by the fact that the expressions for a and b both have the same denominator, and that y does not appear in the denominator. This means that the denominator will remain constant in the differentiation; and it will already have been evaluated for the purpose of obtaining a and b. We will therefore give it the symbol D, a constant, defined as:
Hence: a = [ (Swixi²)(Swiyi) -(Swixi)(Swixiyi) ] / D ¶a/¶yi = [ (Swixi²)wi -(Swixi)wixi ] / D = wi[(Swixi²) -xi(Swixi)] / D sa² = sfit²S(¶a/¶yi)²/wi = (sfit²/D²)Swi²[(Swixi²) -xi(Swixi)]²/wi = (sfit²/D²)Swi[(Swixi²)² -2xi(Swixi)(Swixi²) +xi²(Swixi)²] = (sfit²/D²)[(Swixi²)²(Swi) -2(Swixi)²(Swixi²) +(Swixi²)(Swixi)²] = (sfit²/D²)(Swixi²)[(Swixi²)(Swi) -(Swixi)²] = (sfit²/D²)(Swixi²) D
b = [ (Swi)(Swixiyi) -(Swixi)(Swiyi) ] / D ¶b/¶yi = [ (Swi)wixi -(Swixi)wi ] / D = wi[xi(Swi) -(Swixi)] / D sb² = sfit²S[(¶b/¶yi)²/wi] = (sfit²/D²)Swi²[xi(Swi) -(Swixi)]²/wi = (sfit²/D²)Swi[xi²(Swi)² -2xi(Swi)(Swixi) +(Swixi)²] = (sfit²/D²)[(Swixi²)(Swi)² -2(Swixi)²(Swi) +(Swixi)²(Swi)] = (sfit²/D²)(Swi)[(Swixi²)(Swi) -(Swixi)²] = (sfit²/D²)(Swi) D
Thus the parameter uncertainties are easily computed from quantities already determined during the fitting process. If the weights of all of the observations are set to 1 (except for any exclusions) then: sa² = sfit² (Sxi²) / D and sb² = sfit² n / D where n is the number of observations included in the fit. Example spreadsheets: Open document spreadsheet files demonstrating the weighted linear regression procedure. View using Open Office. |
|
A2-12. The reduced c²
test: In sections A2-10 and A2-11 we alluded to the idea that the goodness of a fitting process can evaluated on the basis of a comparison between the observed and the expected variances of the measurements. This test is known as the reduced c² test [1] and is a very powerul tool for the analysis of scientific data (and also a method for detecting scientific fraud). c² in this case is defined as the weighted square error sum based on absolute weights . 'Reduced c² ' is c² divided by the number of degrees of freedom in the data (n=n-p) and is given by: |
|
c²/n = (1/n) |
S i=1 |
[(yi(obs) - yi(calc))/si]² |
|
| This quantity is, of course, the variance of the fit based on absolute weights, and is therefore the variance of an observation of unit weight. Since an absolute weight is a reciprocal variance, the variation of an observation with a weight of 1 should be 1. |
| The point about the c² test is that, if the variance of the fit is considerably larger than the estimated variance of an observation of unit weight, then the model does not agree with the data and should be modified or discarded. If the variance of the fit is about the same as the variance of an observation of unit weight, then the model accounts for the data and is valid within the accuracy of the experimental technique. If the variance of the fit is considerably smaller than the variance of an observation of unit weight, then the data may have been faked, or too many variable parameters have been used and and the model is fitting noise, or the estimates of the si are too pessimistic. The corresponding ranges of c²/n for the three possible outcomes are accordingly: |
|
|
The model is incorrect, the errors are not random, or the si are underestimated. |
|
|
The data agree with the model. |
|
|
The data are fake, there are too many parameters, or the si are overestimated. |
| Note, that if all observations (yi) have the same standard deviation s, then the weights can be factored out of the summation in equation 12.1 above and we have: |
|
c²/n = (1/s²)(1/n) |
S i=1 |
(yi(obs) - yi(calc))² |
| Hence, in this simplified case, we can define the variance of an observation as estimated from the fit as: |
|
sfit² = s² c²/n = (1/n) |
S i=1 |
(yi(obs) - yi(calc))² |
|
To obtain c²/n,
all we have to do is calculate sfit² from the data and divide it by s². In the example at the end of section A2-10 we obtained sfit=0.00027634 Volts and deduced that if the errors were entirely due to rounding of the readings of the DVM used to acquire the data then a standard deviation of 0.00025 V should be expected. For this example: c²/9 = (0.00027634/0.00025)² = 1.22 The reduced c² test can be applied with considerable sophistication by reference to the c² distribution (see ref. [1] or similar text on data analysis), but for a simple qualitative interpretation, it is fair to say that the model gave an excellent fit to the data, and the main effect of least-squares fitting was to drive a line between the DVM rounding errors. |
|
References: [1] Data Reduction and Error Analysis for the Physical Sciences, Philip R Bevington. McGraw-Hill, 1969. Library of Congress cat. card # 69-16942. 2-2: Sample mean and standard deviation. Area under the Gaussian distribution: Table C-2, p308. 5-4: c² test of distribution. |
|
|
|
|
|
|