RESEARCHING THE REAL WORLD



MAIN MENU

Basics

Orientation Observation In-depth interviews Document analysis and semiology Conversation and discourse analysis Secondary Data Surveys Experiments Ethics Research outcomes
Conclusion

References

Activities

Social Research Glossary

About Researching the Real World

Search

Contact

© Lee Harvey 2012–2024

Page updated 8 January, 2024

Citation reference: Harvey, L., 2012–2024, Researching the Real World, available at qualityresearchinternational.com/methodology
All rights belong to author.


 

A Guide to Methodology

8. Surveys

8.1 Introduction to surveys
8.2 Methodological approaches
8.3 Doing survey research
8.4 Statistical Analysis

8.4.1 Descriptive statistics
8.4.2 Exploring relationships
8.4.3 Analysing samples

8.4.3.1 Generalising from samples
8.4.3.2 Dealing with sampling error
8.4.3.3 Confidence limits
8.4.3.4 Statistical significance
8.4.3.5 Hypothesis testing
8.4.3.6 Significance tests

8.4.3.6.1 Parametric tests of significance

8.4.3.6.1.1 z tests of means and proportions
8.4.3.6.1.2 t tests of independent means

8.4.3.6.1.2.1 One sample t tests of means
8.4.3.6.1.2.2 Two sample t tests of means

8.4.3.6.1.3 F test of variances
8.4.3.6.1.4 Parametric tests of differences in means of matched pairs
8.4.3.6.1.5 Analysis of variance

8.4.3.6.2 Non-parametric tests of significance

8.4.3.7 Summary of significance testing and association: an example

8.4.4 Report writing

8.5 Summary and conclusion

 

8.4.3.6.1.2 t tests of independent means

The t-test is used when samples are less than 30 in size. It is applicable when testing a sample mean against an assumed or claimed mean, or when testing two independent sample means for significant difference. It is not applicable to test sample proportions as the binomial distribution does not approximate a t distribution for small samples (see Section 8.4.3.6.1.1.3). Consequently, there are two cases of the t test of independent means:  one-sample test and a two-sample test.

Note that the t test can be used for larger samples than 30 but the t distribution closely approximates a normal distribution in most circumstances and so a z-test can be used.

There is also a t test for related samples (such as matched pairs) and this is covered in Section 8.4.3.6.1.4.

Top

8.4.3.6.1.2.1 One sample t test of means

When to use the one sample t test of means
Hypotheses
What we need to know
Sampling distribution
Assumptions
Testing Statistic
Critical values
Worked examples

This test is similar to the one sample z test of means but with a different sampling distribution (the t distribution). Unlike the normal distribution used in z test, the t distribution is similarly bell shaped but changes spread as the sample size changes.

 

Hypotheses
H0: µ0

i.e. a null hypotheses that the claimed value of the population mean is equal to the true population mean. The alternative hypothesis is that it is not, which can be in one of three ways (not equal, greater or smaller):

HA: µ0≠µ not equal; a two-tail test
HA: µ0 greater then ; upper-tail test
HA: µ0 less than; lower-tail test

 

What we need to know
Claimed value of population mean: µ0
Sample mean: SM.
Sample (or population) standard deviation: s (or σ)
Sample size: n

See Section 8.4.3.6.1.1 for an explanation of notation.

 

Sampling distribution
The sampling distribution is the distribution of sample means. The mean of the distribution of sample means is equal to the mean of the population from which the samples are drawn.

The one sample t test compares a sample mean with a specified (or hypothesised) population mean.

The distribution of sample means for small samples is not normally distributed but approximates Student's t distribution. The t distribution is similar in shape to the normal distribution (for the same mean and standard deviation) but for small samples (under 30) tends to be 'shorter and fatter'. Unlike the normal distribution it changes shape as the sample size changes, the lower the sample size the less peaked is the t distribution and the wider it is.

This is logical. if samples are very small, n=2, then the mean of the sample will vary widely and the distribution of sample means will be almost the same shape as the distribution of the sample itself. Whereas if the sample is much larger, the mean of the sample will be much closer to the mean of the population and the distribution of sample means will be much narrower than the distribution of the sample itself.

Given the value of the mean and standard deviation of the normal curve, the area under the curve for any given value of x is fixed, irrespective of the size of the sample. However, the area under the t curve changes as n changes, even though the mean and standard deviation remain the same.

This means that (for example) the 95% confidence limits cannot be simply found by adding and subtracting a constant number of standard units to and from the mean of the distribution (as is the case with the normal curve where critical z equals 1.96 for the 95% limits).

The number of standard t units that must be added to enclose 47.5% of the curve either side of the mean decreases as the sample size increases. The t value for 95% confidence limits ranges from 12.7 and approaches 1.96 as the sample goes from very small to large (i.e. over 30).

Naturally the same principle applies for other levels of confidence and the t-value changes as n changes. This requires looking up values for different sample sizes in tables. When using a computer program the appropriate critical values or probability of rejecting the null hypothesis will be included in the output, without the need to look up critical values.

The standard deviation of the distribution of sample means is the standard error: SE.

The standard error equals the population standard deviation (σ) divided by the square root of the sample size, i.e. σ/√n.

But it is unlikely that the population standard deviation is known and so it has to be estimated on the basis of the sample.

So the estimated standard error (est SE) is the sample standard deviation divided by the square root of the sample size minus 1:

est SE = s/√(n-1)

See CASE STUDY: Illustration that the sample variance is a biased estimate of the population variance for an explanation of why the sample size is reduced by 1 in estimating the standard error.

See NOTE on degrees of freedom

 

Assumptions
The underlying assumption when using a t test is that the population from which the sample was taken was a normal distribution, (z tests do not require this assumption). Before the sampling distribution can be assumed to adhere to a t distribution, the population must be normal. Hence this test cannot be used as indiscriminately with small samples as the z test can with large samples when testing popu­lation means.

 

Testing statistic
The formula for the testing statistic, t, is the difference between the sample mean and the hypothesised population mean divided by the standard error

z = (SM - µ0)/SE

Where SE = σ/√n

and is estimated by

est SE = s/√(n-1)

The resultant value of t is compared to critical values to determine whether there is any significant difference.

 

Critical values
The critical values for t are derived from tables of areas under the Student's t distribution for different sample sizes.

The appropriate critical value can be read off from the appropriate table once the significance level and degrees of freedom have been established. The number of degrees of freedom are dependent upon the sample size, in the one-sample case degrees of freedom = n - 1.

If using a computer program the appropriate degrees of freedom will be provided along with critical values. Otherwise use available tables, such as Engineering Statistics Handbook (accessed 30 April 2020) or Medcalc (accessed 30 April 2020).

The decision rule is that the null hypothesis is rejected if the cal­culated value is:

greater than the critical +t value for an upper-tail test;
is less than the critical -t value for a lower-tail test;
greater than +t or less than -t for a two-tail test.  (See Section 8.4.3.5).

 

Worked example

In a factory producing DAB radios the standard method of compiling the radios resulted in an average of fifteen radios per shift. Five workers used a new approach on a trial basis to see if it was as efficient at producing radios. The output for the next five shifts using the new method is in the table below. Is the new system as efficient?

Output per shift: new method. Five shifts for five workers

Worker 1

12

11

13

15

15

Worker 2

10

10

12

12

12

Worker 3

13

13

13

14

15

Worker 4

12

12

15

15

16

Worker 5

14

13

15

16

17

 

An appropriate parametric test would be to compare the mean of the twenty-five new method outputs with the old method mean of 15. The old method mean represents the claimed mean, and the hypothesis being tested is that the sample mean is not significantly different from the claimed mean.

Hypotheses:

H0: µ0
HA: µ0

I.e. a null hypotheses that the new method mean is equal to the old method mean. The alternative hypothesis is that the new method is not as efficient as the old method. A one-tail test.

Significance level: 1%

Testing statistic: t = (SM - µ0)/SE

Critical value: From t tables for 1%, one-tail test, with n-1 (25-1) degrees of freedom, t = 2.492

Decision rule: reject H0 if t<-2.492

Computation:

Mean of the sample is total number of radios produced divided by 25 shifts = 13.4

Standard deviation = 1.83 (see Section 8.3.12.7.3 to see how to compute the standard deviation).

SE = s/√(n-1) = 1.83/√24 = 0.374

t = (SM - µ0)/SE = (13.4 -15)/0.374 = -1.6/0.374 = -4.27

Decision: Reject the null hypotheses at 1% level (one-tail test), the new method is not as efficient as the old method.

Top

8.4.3.6.1.2.2 Two sample t test of means

When to use the one sample t test of means
Hypotheses
What we need to know
Sampling distribution
Assumptions
Testing Statistic
Critical values
Worked examples

This test compares two sample means to see if the difference between them is significant: is indicative of any differ­ence between the populations from which the samples are taken. The approch is very similar to two-sample z test of means (see Section 8.4.3.6,1.1)

 

Hypotheses
H0: µ1=µ2

i.e. the mean of the population from which sample one was drawn is equal to the mean of the population from which sample two was drawn.

The alternative hypothesis is that the two populations have different means, which can be in one of three ways (not equal, greater or smaller):

HA: µ1µ2 not equal; a two-tail test
HA: µ1>µ2 greater than; upper-tail test
HA: µ1<µ2 less than; lower-tail test

 

What we need to know
Mean of sample 1: SM1

Mean of sample 2: SM2

Standard deviation of sample 1 (or the standard deviation of the population from which sample 1 was taken): s1 (or σ1)

Standard deviation of sample 2 (or the standard deviation of the population from which sample 2 was taken): s2 (or σ2)

Size of sample 1: n1

Size of sample 2: n2

 

Sampling distribution
The sampling distribution is the distribution of sample mean differences. This is similar to the z test (see Section 8.4.3.6.1.1) The two-sample t test usually compares the difference between two sample means for small samples). The distribution of sample mean differences approximates a Student's t-distribution provided certain assumptions are upheld. See Assumptions below.

The mean of the distribution of sample mean differences equals zero (as with the z test of two sample means). However, the standard deviation of the distribution of sample mean differences is not quite the same as the simple standard error of sample mean differences that is used in z tests.

There is no condition of homoscedasticity for z tests,  (i.e. equal variance, see Assumptions below) The variance of the distribution of sample mean differences in the z test is the simple linear combination of the variances of the distributions of sample means for the distributions from which sample one and sample two were taken (which means that the standard error of the distribution of sample mean differences will be bigger than the standard error of either of the component distributions). The simple combination is thus:

σSM1-SM2 = √(varianceSM1 + varianceSM2)

est σSM1-SM2 = √(var1/n1) +(var2/n2)

But in the t test the condition of homoscedasticity means that var1 must be equal to var2. The common variance (denoted varc) means that that for the t test

est σSM1-SM2 = √(varc/n1) +(varc/n2) = √(varc)(1/n1 +1/n) = σc √(1/n1 +1/n)

An unbiased estimate of the common variance is the weighted average of the unbiased estimates of the sample variances, also known as the estimated pooled variance. So:

varc = (n1(var1) + n2(var2))/(n1 + n2 - 2)

σc = √(n1(var1) + n2(var2))/(n1 + n2 - 2)

Therefore; the estimate of standard error of sample mean differences with a common variance (est common σSM1-SM2) valid for t tests is

est common σSM1-SM2 = est σc√((1/n1) + (1/n2))

where est σc = √(n1(var1) + n2(var2))/(n1 + n2 - 2)

so est common varSM1-SM2 = (((n1(var1) + n2(var2))/(n1 + n2 - 2)))(1/n1 + 1/n2)

and

est common σSM1-SM2 = √common varSM1-SM2

This formula differs from the formula for z tests (of larger samples) because z tests do not have any condition of homoscedasticity (i.e. common population variance) to adhere to.

It is also important to note that pooling sample variances to derive a common variance does not mean that the assumption of homoscedasticity is thereby fulfilled. On the contrary, the common variance is an estimate of the pooled variance provided the assumption is fulfilled.

In most cases there is no knowing whether the assumption is fulfilled and probably, in practice, it is not.  If the sample variances are significantly different (see F test Section 8.4.3.6.1.3) then there is little likelihood of both samples coming from populations with identical variances, then no amount of "pooling" of sample variances will fulfil the condition of equal population variances. The pooling to create a common variance is just a device to enable the t test to be used irrespective of the basic requirements of common variance and normal population distributions.

That the test is widely used (including on larger samples), without due regard for these assumptions, and the results subsequently treated as 'objective fact' raises fundamental questions about the validity of conclusions based upon them, which tend to be ignored when presenting and scrutinising the results.

 

Assumptions
The two populations from which the samples are taken are assumed to be normal and to have equal variances,  this is known as the condition of homoscedasticlty. Technically the distribution of differences in sample means will not approximate a t distribution unless these assumptions are fulfilled. However, the t test is regarded as being 'robust'. It is claimed that the test is still applicable even when the two populations do not exactly fulfil the above conditions. Provided the two populations do not differ appreciably in shape the test is still good particularly If both distributions are symmetrical. This notion of 'appreciably' is rather loose and is a matter of practice rather than theory.

Note that equal population variance is not necessarily reflected by equal sample variances. Sample variances are affected by sampling error in the same way as sample means.  Consequently, the common, or 'pooled' variance is the result of combining sample variances on the assumption that the differences in sample variances are not significant and are solely the result of sampling error.

 

Testing statistic
The formula for the testing statistic, t, is the difference between sample mean 1 and sample mean 2 divided by the standard error (whihc in this case is the pooled, common standard error of differences):

t= (SM1 -SM2)/est common σSM1-SM2

where

est common σSM1-SM2= √(((n1(var1) + n2(var2))/(n1 + n2 - 2)))(1/n1 + 1/n2)

The resultant value of t is compared to critical values to determine whether there is any significant difference.

 

Critical values
The appropriate critical value can be read off from the appropriate table once the significance level and degrees of freedom have been established. The number of degrees of freedom are dependent upon the sample size, in the two-sample case degrees of freedom = n1 + n2 - 2.

Computer programs will generate the appropriate critical values.

 

Worked example
The number of people per square kilometer in two large areas of England (Area A and Area B) were computed on the basis of taking 11 sub areas selected at random from Area A and 10 sub areas selected at random from Area B. The sample means were 31 and 22 for Areas A and B respectively, with corresponding variances of 134.7 and 145.6. Is there any significant difference in population density in each area?

Hypotheses:

H0: µ12
HA: µ1≠µ2

Two-tail test to see if there is any difference, direction unspecified.

Significance level: 5%

Testing statistic: t= (SM1 -SM2)/est common σSM1-SM2

Critical value: t = 2.093 (two-tail test at 5% and 11+10-2 degrees of freedom)

Decision rule: reject H0 if t>2.093 or t<-2.093

Computation:

SM1 = 31; var1 = 134.7; n1 = 11

SM2 = 22; var2 = 145.6; n2 = 10

est common σSM1-SM2= √(((n1(var1) + n2(var2))/(n1 + n2 - 2)))(1/n1 + 1/n2)

= √(((111(134.7) + 10(145.6))/(11 + 10 - 2)))(1/11 + 1/10)

= √((2938/19)(21/110)) = √29.52 = 5.43

Thus:

t= (SM1 -SM2)/est common σSM1-SM2 = (31 -22)/5.43 =1.658

Decision: Cannot reject the null hypotheses (H0: µ1=µ2) 1.658 < 2.093 (the critical value of t in a two-tail test at 5% significance level and 19 degrees of freedom); the two Areas do not have significantly different population densities.


Unworked Examples

1. The average weekly sales of a local newspaper ia 2001 was 24,300 copies. A change in editorial policy was inaugurated at the beginning of 2002 and weekly sales for the first 8 weeks of 1972 were as follows: 23600, 25800, 26600, 28900, 27200, 27100, 28500, 28300. Has there been a significant increase in circulation?

2. Two soap producing machines of different manufacture were installed in a factory and a close check made on their respective outputs during a fifteen-day trial period. The machines produced an average of 14500 and 15300 bars of soap per day respectively. The standard deviation was the sense for both machines, namely 50 bars per day. The machine with the highest average however costs slightly more to run and the management of the firm will only install further machines of this type if they produce significantly higher output than the machines with a lower average. Carry out an appropriate significance test at a 1% level of significance.

3. A school had two streams for the 16-year old students. The A-stream and the B-stream. The absentee records for both streams ware checked for a 20-day period. In the A-stream, there were on average 11 students absent per day, with a standard deviation of 3.8. The B-stream had an average of 14 absentees per day, with a standard deviation of 3.5. Is there a significantly different average absentee rate for the two streams (at a 5% level)? Is this indicative that the B-stream students are less interested in school than the A-stream student stream?

 


Next 8.4.3.6.1.3 F tests of variances

Top