8.1 Introduction to surveys
8.2 Methodological approaches
8.3 Doing survey research
8.4 Statistical Analysis
8.4.1 Descriptive statistics
8.4.2 Exploring relationships
8.4.2.1 Association
8.4.2.2 Crosstabulation
8.4.2.3 Correlation and regression (bivariate)
8.4.2.4 Multivariate analysis
8.4.3 Analysing samples
8.4.4 Report writing
8.5 Summary and conclusion
Activity 8.4.2.2.1
Activity 8.4.2.2.2
For example, we know how many people had heard of Clause 28 (Table 8.4.1.1.1) but the frequency table does not tell us anything about those who had, and those who had not, heard of the clause.
An important analytic technique in sociology is the use of crosstabulation. It allows data to be broken down. Instead of having a table for the entire sample, it allows you to break down the frequency table to take account of a related variable.
For example, as part of the analysis of hypothesis 3 it is necessary to find out whether knowing about Clause 28 is related to the age of the respondent. Thus, construct the crosstabulation of the answer to question 1 'Had you heard about Clause 28?' (V1) by age (V30). (see Table 8.4.2.2.1) [A2.5].
Table 8.4.2.2.1 Crosstabulation V1 'Had you heard about Clause 28?' by V30 'Age in years'
Count
Row %
Column % |
16 or 17
(1) |
18
(2) |
19 or 20
(3) |
Row total |
V1 |
|
|
|
|
Yes (1) |
12 24.0 22.2 |
23 46.0 36.5 |
15 30.0 46.9 |
50 33.6% |
|
|
|
|
|
No (2) |
42 42.4 77.8 |
40 40.4 63.5 |
17 17.2 53.1 |
99 66.4 |
|
|
|
|
|
Column
Total |
54 36.2% |
63 42.3% |
32 21.5% |
149 100.0% |
As there were not many sixteen or twenty-year-olds in the sample so the sixteen and seventeen-year-olds have been combined into one group (value 1), the eighteen-year-olds make up the second group (value 2) and the nineteen to twenty-year-olds form a third group (value 3).
Each cell of the table has three figures in it. The top one is the actual number of people that fall into that category, the count. For example, 23 eighteen-year-olds had heard of Clause 28. The second figure in each cell is the row percentage, that is, the percentage of the total for the row that falls into the cell. So the 23 eighteen-year-olds that had heard of Clause 28 made up 46% of the number of 50 people who had heard of Clause 28. The third figure is the column percentage; this is the percentage of the total for the column that falls into the cell. So again, the 23 eighteen-year-olds who had heard of the clause constituted 36.5% of the total sample of 63 eighteen-year-olds. Note that the row and column percentages have quite different meanings.
If constructing crosstabulations by hand the easiest way of doing this is as follows. Make a grid with large squares. You need as many squares across the page as there are values of the first variable and as many squares down the page as there are values of the second variable. Leave room for labels on the rows and columns. Draw another separate box and mark it ‘missing values’. You now need to look at both the columns that represent the variables to be crosstabulated. Go down the two columns row by row and put a mark in the appropriate square of the grid for each row. If a row has missing values in either column put a mark in the missing value box. At the end you should have as many marks as you have rows in the data file. Now count up the number of marks in each box. This is the ‘count’ for each cell in a crosstabulation. This is a slow, time-consuming and boring activity to do by hand and a computer program that does this makes life much easier.
Activity 8.4.2.2.1
Referring to Table 8.3.12.6
1. How many people are there in the sample? Why do you think this is less than the total sample of 151?
2. What percentage of the sample were over over the age of eighteen?
3. What percentage of over eighteen-year-olds had not heard of the clause?
4. How many people in total said they had not heard of the clause? What percentage of the valid sample is this?
5. What percentage of people aged eighteen had heard of the clause? 6. What percentage of those who had heard of the clause were eighteen?
7. Is there any relationship between age and awareness of the clause?
There were 23 eighteen-year-olds who said they had heard of Clause 28. It is important to note that the percentage of people who had heard of the clause who were eighteen (the row percentage = 46%) is not the same as the percentage of eighteen-year-olds who had heard of the clause (the column percentage = 36.5%). This is not a trick of the data. The two figures represent very different things.
The row per cent in this case says that, of all the people in the sample who had heard of the clause, 46% were eighteen years old. This figure can be compared to the percentage of people who had heard of the clause who were under eighteen years old (24%) and the percentage of those who had heard of the clause who were over eighteen years old (30%). The three figures total 100%.
The column per cent says that 36.5% of eighteen-year-olds had heard of the clause. Compare this to 22.2% of under-eighteens had heard of the clause and 46.9% of over-eighteens who had. If we take the row percentages it appears that the eighteen-year-old group is more informed about the clause than any other group. Yet if we take the column percentages it is the over-eighteen group that has the highest percentage of people who had heard of the clause. How can we explain this and which figure do we use?
The reason the eighteen-year-old group has the highest row percentage of 'Yes' responses is because there are more eighteen-year-olds in the sample. In fact, there are twice as many eighteen-year-olds as over-eighteens. So it is not surprising that there are more of them in the 'Yes' row. To use the row percentages and claim that the eighteen-year-olds are more likely to have heard of the clause is misleading, as the higher percentage simply reflects a larger sample of eighteen-year-olds.
The correct comparison is the column percentage, as this takes into account the different sample sizes for each age group. We can then conclude that the older the respondent the more likely they are to have heard of the clause.
It is not always the case that the column percentage is the correct one for comparison purposes. The decision whether to use the row or column percentage depends on which variable is the independent variable and which is the dependent variable. The dependent variable is the one that may be affected by the independent variable. In the example, the age is independent and the knowledge of the clause is dependent. After all, knowing about Clause 28 may be dependent on how old the respondent is, that is, the older the respondent the more likely she or he is to have come across the clause. There is no possibility that the respondent's age is dependent upon knowing about the clause!
The percentage you then use for comparison is the percentage for the independent variable. If the independent variable is the column variable then you use the column percentage, as in Table 8.4.2.2.1. If the independent variable is the row variable then you use the row percentage.
The crosstabulation, using the column percentages, shows that more than twice as many older students had heard of the clause than younger ones. This suggests that there is a relationship between age and awareness of the clause. It thus seems as though the two variables are associated. However, there may be other reasons why age and awareness seem to be associated and these would need to be investigated, see Section 8.4.2.1 on association and Section 2.2.2.4 on multivariate analysis.
Activity 8.4.2.2.2
Using THE CAE STUDY FIles
Attitudes Towards Homosexuality CASE STUDY hypotheses
Attitudes Towards Homosexuality CASE STUDY questions
Attitudes Towards Homosexuality CASE STUDY data file
Attitudes Towards Homosexuality CASE STUDY coding frame
1. Construct a crosstabulation for the answer to question 1 (V1) by sex of the respondent (V31). Is there a larger percentage of women than men who have heard of the clause?
2. Construct the crosstabulation to find out whether more men than women agree with Clause 28. What conclusions can you draw about hypothesis 3?
3. Construct other suitable crosstabulations to analyse hypotheses 4 and 5.
|