A Guide to Methodology

8. Surveys

8.1 Introduction to surveys
8.2 Methodological approaches
8.3 Doing survey research

8.3.1 Aims and purpose
8.3.2 Background to the research
8.3.3 Feasibility
8.3.4 Hypotheses
8.3.5 Operationalisation
8.3.6 How will data be collected and what are the key relationships?
8.3.7 Designing the research instrument Question content Question style Meaningful questions Non-ambiguous questions Single questions Leading questions Summary of question style Self-completion questionnaires Closed and open questions Layout

8.3.8 Pilot survey
8.3.9 Sampling
8.3.10 Questionnaire distribution and interviewing
8.3.11 Coding data
8.3.12 Analysis
8.3.13 Hypothesis testing
8.3.14 Report writing

8.4 Summary and conclusion

Activity 8.3.7

8.3.7 Designing the research instrument
Social surveys usually involve either formal interviews or questionnaires.

Formal interviews are a structured dialogue between two people, the interviewer and the respondent. The interviewer reads questions from an interview schedule in a predetermined order and records the respondent's answers.

Questionnaires do not involve interviewers; respondents are asked to complete questions by themselves, although in some cases respondents do have the otion of having the questions read to them and the answers written in by the researcher. For example, in their study of violence against prostitutes Farley and Barkan (1998) 'offered to read the questions and write in the answers for those who appeared hesitant to write or who had difficulty reading'.

Whether a questionnaire or schedule is used, the basic principles for designing the set of questions are much the same. It is important to ask questions in a logical sequence so that the respondent is led easily from one point to the next. It is also a good idea to start with questions that are easy to answer and generally interesting, leaving the more difficult ones until later.

It is usually advisable to leave until the last 'personal' questions (usually called classificatory questions) such as age, occupation, marital status, education, income, ethnicity and so on. In some circumstances, the survey only requires answers from some types of people and so some of the classificatory questions need to be asked at the beginning.

When designing questions you need to consider both the content and the style. Question content
Writing questions involves imagination, although three things determine the content.

First, the hypotheses you are testing. Make sure your questions relate to your hypotheses and are such that you are able to test your hypotheses on the basis of the answers you receive. Avoid including 'interesting' questions that do not relate to your hypotheses.

Second, the questions will be also be circumscribed by the operationalisation of the concepts that are being investigated. It is possible to have a large number of indicators for each dimension of a concept, especially if it is a complex concept such as alienation, culture, discrimination, violence or deprivation. This, potentially, may lead to a very large questionnaire. A pragmatic decision may be needed to reduce the number indicators and thus the associated questions.

For example, Peter Townsend et al. (1987) (CASE STUDY Operationalising Poverty) had 94 variables for deprivation. This is rather a lot and would be too many in most research settings. For example, instead of Townsend's five indicators for dietary deprivation, one or two could be selected to represent them all. Reducing the number of indicators should not be done in an arbitrary way but should be the result of careful theoretical consideration, although the selection is made easier if one adopts the notion of the interchageability of indicators, (see Section

Third, 'lead-in' questions are often included to help the flow of the questionnaire or schedule. While this makes the survey more user-friendly it is important not to overload the survey schedule or questionnaire with lead-in questions: just include those that help the questionnaire move from one subject to another. Question style
The style of the questions also depends on circumstances but there are some general guidelines to avoid the most frequent errors. Meaningful questions
The questions should make sense to the informant and be possible for the informant to answer. This can be done by using words that will be familiar to the sampled population.

Do not use technical terms or slang if they are not familiar to the respondents. For example:

Do you think that your socialisation has affected your gender role?

The terms 'gender role' and 'socialisation' are well known to sociologists but not necessarily to the public at large. On the other hand, do not be afraid of using technical terms if they are appropriate, such as medical jargon when surveying a sample of doctors. Non-ambiguous questions
Ambiguity should also be avoided. For example:

When did you leave school?

is a very vague question and could be interpreted differently by different respondents. It would be much better to ask something like:

At what age did you complete your school education?

If different informants read the meaning of the questions differently then you will be unable to compare their answers.

Another example is from the Sheffield Lib Dems' Crime Survey, February 2007

10. Some people favour the introduction of ID cards to fight crime. Others think ID cards wouldn't be effective. The £18billion it may cost could be better spent on 10,000 more police on our streets. Do you think they're a good idea? Yes/No/Don't Know.

The “they're” could be police or ID cards (or even streets!). This is also a leading question (see below). Single questions
Make sure that each question asks only one thing at a time. Do not combine two questions in one. For example:

Did you know that it is possible to work part-time up to 10 hours and to claim an attendance allowance? Yes/No

Respondents could quite easily answer 'yes' to the first part of the question and 'no' to the second part but be unable to indicate their different answers.

Questions should be short and clear rather than complex. This can sometimes be done by using a series of questions, for example:

Has it happened to you that over a long time, when you neither practised abstinence, nor used birth control, you did not conceive? Yes/No (from The Family Limitation Survey, 1949)

This question asks so much at once and contains so many negatives, that it is initially impossible to know whether to answer 'yes' or 'no'. The information that this question is trying to find out could be collected by using a sequence of questions such as:

Have you conceived in the last 'x' months? Yes/No
Did you practice abstinence? Yes/No
Did you use birth control methods? Yes/No

If the informant can make a wide range of possible responses to a question it may be easier for the respondent to ignore the question altogether rather than try to sort it out. Leading questions
A common error in questionnaire design is the use of leading questions that may direct the informant to a response that they would not normally have given. This can bias the results.

Leading questions arise in several ways. First, by providing a restricted set of answers to choose from, which exclude other possible answers.

Second, by using a leading structure to the question, such as 'You don't think do you...?' (which leads to a negative answer) or 'Shouldn't something be done about...?' (which leads to a positive answer) or by providing a biased set of precoded answers. For example,

How has the service been? extremely good/good/quite good/average/poor

Third, a leading sequence of questions can also lead the respondent to answer in a way desired by the researcher, for example:

Did you know that privatisation of water cost millions of pounds to the tax payer? Yes/No
Should the privatisation of water be stopped? Yes/No

Another example is also from the Sheffield Lib Dems' Crime Survey February 2007

9. Tony Blair's Labour government spent nearly £6 million pounds on plans to merge the UK's police forces. This plan has now been shelved and local police forces are asking for this money back from the government. Do you agree that the government should put this money back into policing? Yes/no/don't know Summary of question style
Great care must be taken with question design to avoid ambiguity, asking more than one question at once and leading the respondents.

Care must also be taken when attempting to design questions that involve the respondents recalling information from their past as this can often be inaccurate.

Some respondents will agree with what they perceive as the researcher's opinion, especially if they have not got much interest in the survey. Try to avoid giving clues, in questions, that might enable the respondent to do this.

In summary, the problems associated with question design are as follows:

1. Using technical and undefined terms.
2. Leading questions and sequences.
3. Ambiguous or vague questions.
4. Presumptuous questions.
5. Complex questions.
6. Multiple questions.
7. Uneven and overlapping categories in pre-coded answers.
8. Restricted range of categories in pre-coded answers. Self-completion questionnaires
The points already discussed apply to both self-completion questionnaires and to interviewer schedules. However, for self-completion questionnaires there are special problems resulting from the absence of an interviewer. The first few questions on the questionnaire must capture the interest of the respondent to motivate her or him to participate in the survey.

The researcher must provide clear guidance and instructions to respondents about how to record their answers. For example, should they write out their answers in full, or tick a suitable box? Should only one box be ticked per question or can the respondent tick as many as are relevant? Researchers must accept that without an interviewer to probe or follow up leads, the information gleaned will be in less depth or detail than an interview schedule and will also be more suited to closed questions.

In the case of self-completion questionnaires it is not possible to guarantee the order in which the respondent will read the questions. This can be overcome to some extent by taking great care in the design so that the answers to a question are not influenced by questions that come later but which the respondent may read in advance. This is more of an issue with paper questionnaires. Online questionnaires that prevent later questions being seen before preceding ones are answered do not encounter this problem, unless there is a facility for respondents to go back and change earlier responses. Closed and open questions
Questions on schedules and questionnaires may be either closed or open. CASE STUDY Attitudes Towards Homosexuality (Todd, 1990) will be used to illustrate the points below in a paper questionnaire format.

Closed questions are designed with a set of alternative answers. These are known as pre-coded answers. Recording pre-coded answers is easy. Where they occur on questionnaires the respondent normally just has to mark an appropriate box or code (tick or place a cross in a box or circle an answer) (See CASE STUDY Attitudes Towards Homosexuality.)

When they occur on an interview schedule the interviewer just has to indicate the appropriate code based on the response from the interviewee.

Sometimes pre-coded questions permit only one answer (for example, questions 1 to 8 in the CASE STUDY Attitudes Towards Homosexuality). Sometimes the respondent can indicate more than one answer (for example, question 9).

In an interview, the list of alternative answers may have to be read out by the interviewer or they may be typed on to a card and handed to the respondent to read.

If the pre-coded categories are not known to the respondent (that is, are just on the interviewer's schedule) then the interviewer has to fit the response given by the interviewee to one of the pre-coded categories.

Pre-coding questions in advance enables the data to be analysed rapidly; indeed in some cases the decriptive statistics (such as percentage agreement with a statement) for each answer can be generated instantly when the survey is delivered on-line.

The problem with pre-coded questions is that the list of alternatives is restricted to the researcher's preconceived ideas about what is important.

It is normal to have a code such as 'None of these' to capture any response that does not match any of the precoded categories.

Open questions do not have any pre-coded answers. Thus the respondent's answer has to be written down in full, either by an interviewer recording the reply verbatim on a schedule or the respondent writing down their answers on a questionnaire (CASE STUDY Attitudes Towards Homosexuality does not have any open questions but invites further comments, which would need to be analysed).

Open questions thus require more work from the respondent or interviewer than closed questions. In practice, in an interview situation interviewers may not be able to record every word because they cannot ask the respondent to slow down while they write the anser as this may upset the flow of the interview.

The researcher also has to decide how to deal with open questions. There are two options. To 'post-code' the answers or to treat the data qualitatively. Post-coding involves reading through the answers and deciding on a set of categories into which the answers can be put. This might be a simple set of categories such as 'broadly in favour' and 'broadly opposed', or a more detailed list of alternatives that arise from the answers given. The alternative is to deal with the responses in a qualitative way, similar in-depth interview answers (see Section 4.5). This option is used when the answers cannot be fitted into convenient categories or when more detailed case material is needed than can be provided from statistical summaries (see Analysis (Section 8.3.12)).

Activity 8.3.7
Critique these example questions and their precoded answers (NOTE: These questions are in no way connected with each other, that is, they do not form a complete questionnaire)
1. How old are you?
Under 20 18–21 21–33 33–50 50–65 65 and over

2. How much do you earn?
Over £20000/Under £20000

3. How much do you drink per day?
Less than 1 unit/5–10 units/More than 10 units

4. Do you spend a lot of time watching television?

5. Do you feel influenced by your peer group?

6.'Britain should scrap nuclear weapons.'
Very strongly agree/Strongly agree/Agree/Disagree

Click here for some suggested answers. Layout
Designing the questions is only part of the process of constructing a schedule or questionnaire. The questions then have to be put in the right order and then laid out on the paper or computer screen to make a usable questionnaire or schedule.

Taking time preparing a proper layout is important for several reasons. First, it makes it easier to extract the data for analysis later. Second, it makes it easier for respondents or interviewers to use. Third, a good layout makes the research look more 'professional' and thus it gets taken more seriously by both interviewers and respondents.

Questionnaires and schedules should be typed without any errors. Instructions to respondents on questionnaires or interviewers on interview schedules should be clear and unambiguous.

It is also important to make a clear distinction between the questions, the variables and the values used to denote the answers.

Number each question in order; this is the question number. A question may refer to one or more variables depending on the type of question. Each variable should also be clearly numbered on an interview schedule.

On a questionnaire, to avoid confusion for the respondent, questions should be numbered but the variables should be identified on a separate coding sheet. The coding form is necessary when variables are not included on a questionnaire whether it is a paper questionnaire or an online one. Ultimately, the coding frame will be built into the online questionnaire but a coding template is necessary to ensure that the online version is structured correctly (see the coding sheet in CASE STUDY Attitudes Towards Homosexuality) (See Section 8.3.11 for more detail on coding data)

It is important that the focus is on variables rather than questions, as questions can include several variables (for example question 10 in CASE STUDY Attitudes Towards Homosexuality.

For each variable, there are at least two possible answers (otherwise it would not be a variable); these are known as the values for each variable. So a question such as:

Do you have a current driving licence? Yes/No

consists of a variable 'driving licence' and two values, 'yes' and 'no'. When coding questions it is advisable to use numbers to represent the answers. So, for example, use '1' to represent 'yes' and '2' to represent 'no', in the question above. It is advisable to use '0' only to represent a 'not applicable' or 'missing' answer. There is a tendency to code 'no' answers as '0' but this can lead to confusion as missing answers and 'no' answers end up being coded in the same way. So, use non-zero numbers to code actual answers, including 'don't know' answers.

Design a set of questions that will form the basis of an interview schedule, to test your hypotheses (from Activity 8.3.1). Use these questions to prepare an interview schedule.

Suggested answers to Student Activity 8.3.7
Question 1: The categories are not even. For example, there are 3 years between 18–21, 12 between 21–33. The categories must reflect what you are trying to find out. The categories overlap, thus a person could be in two categories. For example, if you were 21 years old you could tick either the 18–21 box or the 21–33 box.
Question 2: People find this a sensitive issue and do not usually like to say how much they earn. It is unclear in this question as to whether the earnings are annual income, whether it is gross or net of tax and includes all income or just that from paid work. Unless you are trying to identify very low earners then the over £20000 category is includes a very large range of possibilities and you might want more non-overlapping categories. It is also ambiguous as it is not clear whether hourly, weekly, monthly or annual income is required; whether gross or net after tax, or whether from all work or just primary employment.
Question 3: The question doesn't specify alchol, which is presumably what the question is referring to. Categories are too restrictive as they do not allow for people to answer that they do not drink. Therefore, a 'none' category is required. The categories are too large. There is a lot of difference between drinking 5 units and 10 units per day. Using technical terms like units can cause confusion as not everybody knows what they mean.
Question 4 This question is too vague. The term 'a lot' needs to be defined. For example Do you spend 1 hour per day? Do you spend 2–4 hours per day? and so on.
Question 5: Using technical or specialised terms can cause problems. Respondents may not know what you mean by 'peer group'. People may not agree about what is meant by the term 'influenced'.
Question 6: This is a leading question because, although respondents are able to agree or disagree, there are three agreement categories and only one disagreement category. This implies that 'agreement' is the suitable answer. How much difference is there between 'very strongly' and 'strongly' agree? This may ne an unnecessary set of distinctions. When analysing this question, what benefit is there in distinguishing between these different forms of agreement? The term 'scrap' also has connotations of 'rubbish' and this may further influence uncommited respondents to agree with the proposition.