8.1 Introduction to surveys
8.2 Methodological approaches
8.3 Doing survey research
8.4 Statistical Analysis
8.4.1 Descriptive statistics
8.4.2 Exploring relationships
8.4.3 Analysing samples
8.4.3.1 Generalising from samples
8.4.3.2 Dealing with sampling error
8.4.3.3 Confidence limits
8.4.3.4 Statistical significance
8.4.3.5 Hypothesis testing
8.4.3.6 Significance tests
8.4.3.6.1 Parametric tests of significance
8.4.3.6.1.1 z tests of means and proportions
8.4.3.6.1.2 t tests of independent means
8.4.3.6.1.3 F test of variances
8.4.3.6.1.4 Parametric tests of differences in means of matched pairs
8.4.3.6.1.5 Analysis of variance
8.4.3.6.2 Non-parametric tests of significance
8.4.3.7 Summary of significance testing and association: an example
8.4.4 Report writing
8.5 Summary and conclusion
Introduction
The analysis of significance tests (z, t and F) has, so far, been restricted to comparison of one sample with a hypothesised mean, or two independent samples. In some cases, though, two samples being compared are not independent, which calls for a different approach.
When two samples are related, the ordinary test of two independent samples is not appropriate.
The samples are not independent in the case of matched pairs, for example, a 'before and after' interview situation. This is not actually two samples but one sample, drawn at random, and then re-interviewed.
What, then, is being tested when investigating matched pairs for significance? The point of taking the sample is to see if a population (too large to investigate in its entirety) has changed over time. Within the time span of the before and after samplings, some known stimulus would have been applied to the population. The researcher wants to know if it has had any effect. Where it is only practical to investigate a sample of the population, then the difference between samplings, must show not only a change, but a significant change. In other words, the change between samplings most be shown to be more than could be accounted for by sampling error.
This type of research has often been taken one stage further. Only the selected sample is exposed to the stimulus and the resultant change, if significant, is regarded as indicative, of a change that would occur to the entire population if similarly exposed to the stimulus.
Consequently, comparing matched pairs, is in fact not comparing two samples, but a set of differences derived from the measures of a single sample. These differences are then investigated to see if they are indicative of a change in the population.
Why bother to use matched samples? Operationally it may be difficult to ensure that each member of the original sample is matched on the second sampling. In addition, what would be two samples under independent sampling has become one sample of differences under matched sampling. Consequently, degrees of freedom are lost, (under independent sampling degrees of freedom = n1+n2-2. As a single related sample, degrees or freedom = n-1. Where n is number of differences; therefore, degrees of freedom are halved). The effect of a reduced number of degrees of freedom is that the critical t value is larger and the ability to reject a false null hypothesis is thereby reduced.
However, there may be advantages in using matched pairs. First, the variation in results should be reduced. Two independent samples may vary considerably whereas related samples will tend to have a smaller dispersion. If the variance of matched pairs is less than the variance of two independently drawn samples, then the loss of degrees of freedom will be offset.
Second, it has also been contended that significance tests on related samples are more indicative of causality than tests of independent data. This is because there may be many other variables effecting independent samples, other than the controlled one, which have not been taken into account. When samples are directly related, then the chance of variables, other than the controlled variables, causing significant changes are much reduced. This however is a very contentious issue.
There are several reasons why related samples may be no more indicative of causality than independent samples. First, a change may be induced in the sample simply because they are part of an experimental situation (see the famous Hawthorne experiment). This change may be inferred as being caused by the stimulus. Second, even related samples may be prone to
influences between samplings that accentuate or diminish changes that are again falsely attributed to the stimulus Third, the time lag between samplings may be too long or too short to record the full effects of the stimulus.
While it is possible that variables may be more closely controlled using related samples rather than independent samples, this is by no means conclusive.
When to use the parametric test of differences in means of matched pairs
The test is used when there are matched pairs (for example, a resampled sample), and the investigation is concerned to establish whether the population from which the sample was taken has changed. Interval scale data is necessary to carry out a test of means.
Hypotheses
H0: µ1=µ2
1 and 2 refer to the population before and after. I.e. a null hypotheses that the population mean has not changed from the first to the second measurement. The alternative hypothesis is that it has changed, which can be in one of three ways:
HA: µ1≠µ2 not equal; a two-tail test
HA: µ1>µ2 greater than; upper-tail test
HA:µ1<µ2 less than; lower-tail test
What we need to know
Sample mean before: SM1
Sample mean after: SM2
hence the difference in the sample means: SMD
The sample standard deviation of the differences in matched pairs sD
The sample sizes (n1 should equal n2)
Consequently all the data has to be paired off to compute the differences.
Sampling distribution
The sampling distribution is the distribution of sample mean differences. This takes on the shape of a t-dlstributloa given the assumption that the distribution of the population differences is normal.
For an explanation of the concept of the distribution of sample means see Section 8.4.3.5
The standard error of the distribution of sampie mean differences of matched pairs is denoted σD and is equal to
SED = sD/√(n-1)
where sD standard deviation of the differences in matched pairs.
sD = √(∑(D-SMD)2/n) = √((∑D2/n)-(∑D/n)2)
where D is the differences in the matched pairs
and ∑ means 'sum of'
Assumptions
The assumption is that the distribution of sample mean differences is a normal distribution when using the t-test of matched pairs. When the sample size is large (i.e. n > 30) and the t distribution closely approximates a normal distribution, then statisticians maintain that the assumption of a normal population may be relaxed.
Testing statistic
The formula for the testing statistic, t, is the difference between the sample mean differences and and the mean of the sample mean differences (which is zero) divided by the standard error
t = (SMD - 0)/σD = SMD/SED
which when using the sample estimate is
t = SMD/(sD/√(n-1)) = (SMD√(n-1))/sD
The resultant value of t is compared to critical values to determine whether the mean difference is greater than zero.
Critical values
These are found from tables of critical t values (such as Engineering Statistics Handbook (accessed 30 April 2020) or Medcalc (accessed 30 April 2020) for n-1 degrees of freedom (or generated by an appropriate computer program), where n is the number of differences.
See NOTE on degrees of freedom
The decision rule is that the null hypothesis is rejected if the calculated value of t is:
greater than the critical +t value for an upper-tail test;
is less than the critical -t value for a lower-tail test;
greater than +t or less than -t for a two-tail test.
(See Section 8.4.3.5).
Worked example 1: One-tail test
A group of students were given a statistics test and and two weeks later 12 of them, selected at random, were given a very similar test without prior warning. The two sets of results for the 12 students are shown in the table below. Have the students as a whole improved?
Student |
1st test |
2nd test |
D
|
(D-SMD) |
(D-SMD)2 |
D2
|
A |
12
|
14
|
2
|
0
|
0
|
4 |
B |
15
|
15
|
0
|
-2
|
4
|
0
|
C |
16
|
13
|
-3
|
-5
|
25
|
9
|
D |
19
|
18
|
-1
|
-3
|
9
|
1
|
E |
14
|
15
|
1
|
-1
|
1
|
1
|
F |
10
|
12
|
2
|
0
|
0
|
4 |
G |
8
|
9
|
1
|
-1
|
1
|
1
|
H |
3
|
8
|
5
|
3
|
9
|
25
|
I |
6
|
7
|
1
|
-1
|
1
|
1
|
J |
8
|
8
|
0
|
-2
|
4
|
0
|
K |
4
|
15
|
11
|
9
|
81
|
121
|
L |
5
|
10
|
5
|
3
|
9
|
25
|
Totals |
|
|
24 |
|
144
|
192
|
|
As the data is interval and related, a t test of differences in means of matched pairs is appropriate.
Hypotheses:
H0: µ1=µ2
HA: µ1<µ2
One-tail test to see if students have improved.
Significance level: 5%
Testing statistic: t= SMD/SED
SED = (sD/√(n-1)
therfore t is estimated by t = SMD/(sD/√(n-1))
= (SMD√(n-1))/sD
Critical value: t = 1.796 (one-tail test at 5% and n-1 = 11 degrees of freedom)
Decision rule: reject H0 if calculated t>1.796
Computation:
The red figures in the table are computed.
The mean difference, SMD, = 24/12 =2
The standard deviation of the differences = sD = √(∑(D-SMD)2/n
sD = √(144)/12 = √12
[sD could also be computed using
sD =√(∑D2/n-(∑D/n)2) = √(192/12 - (24/12)2) = √(16 - 22) = √12]
Therefore
t = SMD/(sD/√(n-1)) = 2/(√12/√(12-1)) = 2/√12/√11 = 2/(3.464/3.3166)
t = 2/1.04 = 1.923
Decision: Reject the null hypotheses (H0: µ1=µ2) 1.923 > 1.796 (the critical value of t in a one-tail test at 5% significance level and 11 degrees of freedom); there is a significant improvement, i.e. the population from which the sample was drawn at random has improved (or would have improved if set the test again).
Worked example 2: Two-tail test
Thirty-four workers were shown two methods of compiling an electronic component. They were then asked to compile as many components as they could in an hour using each method. The table below shows the results for each worker (the red colums subsequently computed). Is either method significantly speedier than the other?
Worker |
Method 1 |
Method 2 |
D (Meth 2 - Meth 1)
|
D2
|
1 |
12
|
14
|
2
|
4
|
2 |
12
|
14
|
2
|
4
|
3 |
12
|
16
|
4
|
16
|
4 |
13
|
11
|
-2
|
4
|
5 |
13
|
11
|
-2
|
4
|
6 |
14
|
13
|
-1
|
1
|
7 |
14
|
16
|
2
|
4
|
8 |
14
|
14
|
0
|
0
|
9 |
14
|
15
|
1
|
1
|
10 |
14
|
16
|
2
|
4
|
11 |
14
|
18
|
4
|
16
|
12 |
15
|
12
|
-3
|
9
|
13 |
16
|
14
|
-2
|
4
|
14 |
16 |
15 |
-1 |
1
|
15 |
16
|
18
|
2
|
4
|
16 |
16
|
19
|
3
|
9
|
17 |
16 |
19
|
3
|
9
|
18 |
16
|
21
|
5
|
25
|
19 |
16
|
21
|
5
|
25
|
20 |
17
|
21
|
5
|
25
|
21 |
16
|
22
|
6
|
36
|
22 |
17
|
15
|
-2
|
4
|
23 |
17
|
14
|
-3
|
9
|
24 |
17
|
15
|
-2
|
4
|
25 |
17
|
18
|
1
|
1
|
26 |
17
|
20
|
3
|
9
|
27 |
18
|
15
|
-3
|
9
|
28 |
18
|
16
|
-2
|
4
|
29 |
18
|
18
|
0
|
0
|
30 |
18
|
21
|
3
|
9
|
31 |
19
|
18
|
-1
|
1
|
32 |
19
|
20
|
1
|
1
|
33 |
19
|
23
|
4
|
16
|
34 |
20
|
20
|
0
|
0
|
Totals |
|
|
∑D=34 |
∑D2=272 |
Hypotheses:
H0: µ1=µ2
HA: µ1≠µ2
Two-tail test to see if either method is better.
Significance level: 5%
Testing statistic: t= SMD/SED estimated by t = SMD/(sD/√(n-1))
= (SMD√(n-1))/sD
Critical value: t = 1.96 (two-tail test at 5% and n-1 = 33 degrees of freedom)
This is the same as the z test critical value as t distribution approximates a normal distribution when n>30
Decision rule: reject H0 if calculated t>1.96 or t<-1.96
Computation:
The red figures in the table are computed.
The mean difference, SMD, = 34/34 =1
The standard deviation of the differences = sD
sD =√(∑D2/n-(∑D/n)2) = √(272/34 - (34/34)2) = √(8 - 12) = √7
Therefore t = SMD/(sD/√(n-1)) = 1/√7/√(34-1))
= 1/√7/√33 = 1/(2.646/5.744) = 1/0.4607 = 2.17
Decision: Reject the null hypotheses (H0: µ1=µ2) 2.17 > 1.96 (the critical value of t in a two-tail test at 5% significance level and 33 degrees of freedom); there is a significant difference in the output of the two methods. As the difference is positive on average (when method 1 scores subtracted from method 2 scores), then method 2 would seem to be better than method 1.
Two reservations about the test of differences in means of matched pairs
The first reservation that has been voiced with respect to the test is its lack of account of the proportionate change. Effectively the test ignores the original sample values and merely considers the differences.
Hence a change in a matched pair from 1 to 3 is regarded as equal to a chance of 1001 to 1003. The proportionate change is quite different. However, the relative variation in the data is taken into account by the test, via the standard error of differences. If the difference is statistically significant, this is all the test purports to show, i.e., that the difference cannot he accounted for by sampling error. It is up to the researcher to comment on whether the statistically significant difference is in fact substantively different.
A discussion of this point can be found at https://achimkemmerling.wordpress.com/2013/04/30/statistical-vs-substantive-significance/ (accessed 30 April 2020).
The second reservation is more fundamental. It is best shown by an example. Consider the two cases below. Both are of samples of 6 matched pairs. In each case all 6 pairs show an increase on second sampling.
CASE1 1
Item |
X1
|
X2
|
D
|
(D-SMD) |
(D-SMD)2 |
A |
4
|
7
|
3
|
1
|
1
|
B |
5
|
7
|
1
|
0
|
0
|
C |
6
|
8
|
2
|
0
|
0
|
D |
8
|
10
|
2
|
0
|
0
|
E |
9
|
11
|
2
|
0
|
0
|
F |
10
|
11
|
1
|
1
|
1
|
Totals |
|
|
12 |
|
2
|
|
SMD = 12/6=2
sD = √(∑(D-SMD)2/n = √(2/6) = √(1/3)
t = SMD/(sD/√(n-1)) = 2/√(1/3)/√5 = 2/√(1/15) = 7.746
With 5 degrees of freedom at a 5% level of significance t=2.571 (two-tail test) Therefore, there is a significant change from first to second sampling.
CASE 2
Item |
X1
|
X2
|
D
|
(D-SMD) |
(D-SMD)2 |
A |
4
|
5
|
1
|
-7
|
49
|
B |
5
|
6
|
1
|
-7
|
49
|
C |
5
|
7
|
1
|
-7
|
49
|
D |
8
|
9
|
1
|
-7
|
49
|
E |
9
|
31
|
22
|
14
|
196
|
F |
10
|
31
|
22
|
14
|
196
|
Totals |
|
|
48 |
|
588
|
|
SMD = 48/6=8
sD = √(∑(D-SMD)2/n = √(588/6) = √(98)
t = SMD/(sD/√(n-1)) = 8/√(98)/√5 = 8√5/√98 = 1.807
With 5 degrees of freedom at a 5% level of significance t=2.571 (two-tail test) Therefore, these is no significant change from first to second sampling.
So no significant difference despite the fact that the average increase is much greater than in Case 1 and all the differences are positive. The implication is that the high variation in Case 2 is indicative of sufficient minus changes in the population to suggest that the population has not in fact changed. This test, therefore, considers a change from minus to plus, as equal to a change from plus to a higher plus. However, consider the probability of a sample of 6 giving 6 positive changes where the chance of a positive or negative are the same (the implication of the test). If "positive change" occurs half the time then probability of 6 positive changes out of six is 0.56 = 0.0156. This is a probability of less than 5%. Hence there is less than a 5% chance that the change in the sample does not represent a change in the population.
(This procedure is the basis of the Sign test, see Section 8.4.3.6.2.5).
A test that overcomes this dilemma is the Wilcoxon test, used in preference to the rather crude sign test where data is interval, it allows for the size of differences in each direction but only where differences in both directions actually occur in the sample data. (See Section 8.4.3.6.2.6 Wilcoxon test).
Unworked Examples
1. A sample of 60 factory workers were asked what the difference in their take-home pay was when the factory changed from "piece" to "time" rate. 35 workers indicated an increase and 10 a decrease. The remaining 15 indicated no change. If the overall average difference was £20 per week with a standard deviation of differences of £7.84, have wages significantly increased as a result of changing from "piece" to "time" rate?
2. The data in the table below shows the number of leading political figures known by students on a politics module at a college. The first sampling was taken on the students' first day and the second sampling half way through the second term. Is there a significant difference in the number of political figures known by students?
Student |
1st time |
2nd time |
A |
3
|
11
|
B |
4
|
10
|
C |
5
|
9
|
D |
7
|
7
|
E |
7
|
8
|
F |
9
|
6
|
G |
9
|
9
|
H |
11
|
12
|
J |
12
|
14
|
K |
14
|
13
|
L |
15
|
17
|
M |
18
|
16
|
N |
18
|
20
|
P |
20
|
19
|
Q |
20
|
20
|
|
|