OrientationObservationIn-depth interviewsDocument analysis and semiologyConversation and discourse analysisSecondary Data
SurveysExperimentsEthicsResearch outcomes
Conclusion
8.4.1.4 Measures of dispersion Sometimes it is necessary to know how spread out (dispersed) the data is around the average. For example, the data in Table 8.4.1.1.1 covers a wide range of years that is not apparent from simply knowing the average. It is possible to measure how spread out the data is. Measures that do this are called measures of dispersion. There are three commonly-used ones: the range, the interquartile range and the standard deviation.
8.4.1.4.1 Range
The range is the difference between the highest and lowest value for a variable. When the data is interval the range can be expressed as a single number. The range for the ages in Table 8.4.1.1.1 is 98 - 12 = 86 years.
We can also talk about a range of ordinal values from 'strongly agree' to 'strongly disagree'. It is not really meaningful to talk about the range of nominal categories, such as 'Design and Printing' to 'Catering and Tourism' (V34 in the CASE STUDY survey ) as this provides us with no way of knowing that 'Government and Sociology' students are included in the range.
8.4.1.4.2 Interquartile range
The interquartile range is the difference between the third quartile and the first. This means splitting the distribution in half, using the median, then find the median for each half, i.e. split the distribution into quarters.
The first quarter point is known as the first quartile Q1, the second, Q2, is the median of the whole distribution and the third quarter point is known as the third quartile Q3. The interquartile range is the difference between Q3 and Q1.
The interquartile range can be applied to ordinal, interval and ratio data.
Why bother with the interquartile range? The range of a distribution can be very misleading when a distribution has a few very high or very low values as it makes it seem as though the distribution may be spread out, although the vast majority of the distribution may be clustered around the median (as in Table 8.4.1.1.1). The interquartile range just shows the range of the middle half of the distribution and is sometimes a better indication of the spread or concentration of a distribution.
8.4.1.4.3 Standard deviation
The standard deviation is the most useful measure of dispersion for several reasons, not all of which are initially apparent. Its most obvious advantage is that it takes into account all the data in a distribution, as does the arithmetic mean. The standard deviation is a measure of the dispersion around the arithmetic mean and thus interval scale date is required. Like the arithmetic mean, the standard deviation is distorted by highly skewed data.
The standard deviation is, at first glance, a peculiar measure. It is the square root of the variance. The variance is defined as the mean of the squared deviation of each value of the variable from the mean of the variable. That sounds complex and not at all intuitive. The following is what is involved in computing the standard deviation.
1. Work out the arithmetic mean of the values in the sample.
2. Calculate the difference between the mean and the value for each value in the sample (that is, subtract the mean from each separate value in the sample).
3. Square all the differences (the minus differences all become positive). 4. Add up all the differences. (This gives you the total of the squared differences.)
5. Divide the total by the sample size. (This gives you the mean of the squared differences from the sample mean.) This is the variance.
6. The standard deviation is the the square root of the variance.
This appears to be rather complicated just to measure the spread of a distribution, and conceptually it is a bit cumbersome. However, it is a measure that takes account of all the values in a frequency table, unlike the range, which is only concerned with the extremes. In addition it is an important measure of dispersion when it comes to making generalisations from samples of interval or ratio scale data (See Section 8.4.3.3 on confidence limits).
The following formula provides a short cut for working out the variance from a frequency table:
variance = (Total of (Frequency multiplied by value of x squared) divided by sample size) minus (Mean of x) squared
or rather more compact
Variance = ∑fx2/n - (∑fx/n)2
where x is a value of the variable,
means 'Total'
/ = divide,
n = sample size,
f = frequenc fx = multiply x by f
fx2 = square x and multiply by f
Mean of x = total fx/n = ∑fx/n
Before you decide that you have seen quite enough formulas and this one is beyond a joke, don't give up because this is actually much less complicated than it seems.
Using the data in Table 8.4.1.1.3 this works as shown in Table 8.4.1.4.1, below. The standard deviation for the data in the table is 15.6 years. Loosely translated this means that the values deviate from the arithmetic mean by an average of 15.6 years.
Table 8.4.1.4.1 Calculation of standard deviation
Value
x
Frequency
f
Frequency x value
fx
Frequency x value squared
fx2
12
1
12
144
14
2
28
392
15
1
15
225
16
28
448
7168
17
2
34
578
18
33
594
10692
19
1
19
361
20
8
160
3200
21
26
546
11466
25
2
50
1250
30
2
60
1800
60
1
60
3600
78
1
78
6094
80
1
80
6400
98
3
294
28812
Total
n=112
∑fx=2478
∑fx**2=82172
Variance = 82172/112 - (2478/112)**2
Variance = 733.678 - 489.516
= 244.162
Standard deviation = √variance = √244.162 = 15.62568 = 15.6 to one decimal place
While it is easy to grasp the idea and point of an average it is not so easy to grasp the idea of the standard deviation or the point of it. The standard deviation measures the variation around the mean. In other words, the smaller the standard deviation the more concentrated the data is around the mean. If the standard deviation were small, say two years, then this would mean that most people in the sample would agree that the age of consent should be within a narrow range either side of the mean.
So in this case, 15.6 years is a lare standard deviation, because there are several very high values that makes the variation much larger on average.
Why is it important to calculate the standard deviation? Because it is important when generalising the sample results to the population. If the sample varies a lot then it less likely that one can be precise about the average of the population from which the sample was taken (see Section 8..4.3.1 and Section 8.4.3.2).
Activity 8.4.1.4.3.1
Compute the standard deviation and range for the age of consent for women (V21) and compare them with those for men (V22).