Social Research Glossary A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Home
Citation reference: Harvey, L., 2012-24, Social Research Glossary, Quality Research International, http://www.qualityresearchinternational.com/socialresearch/
This is a dynamic glossary and the author would welcome any e-mail suggestions for additions or amendments.
|
|
_________________________________________________________________
Correlation
Correlation measures the extent to which two (or more) variables vary together.
Introduction
Correlation procedures are attempts at measuring the degree of association between variables. That is, the extent to which changes in the value of a dependent variable Y are matched by changes in an independent variable X given an underlying relationship between X and Y (known as the regression equation).
This degree of association is measured by a statistic called the correlation coefficient, of which there are various types. Nearly all types of correlation coefficient vary between +1.0 and -1.0. A positive correlation indicates that the variables vary with each other in such a way that high scores on one tend to be associated with high scores on the other and low scores with low scores. Zero correlation means no relationship, while a negative correlation means that high scores on one variable tend to be associated with low scores on the other and vice versa.
A correlation of +1.0 is known as perfect positive correlation, and one of -1.0 is called perfect negative correlation. In both these cases, given the value of one variable, the value of the other can be predicted with absolute certainty. For values between +1 and -1 this is not the case, and the closer the value is to zero, predictions made from one variable to the other are less precise.
The explanation above refers to the bi-variate situation (Y correlated with a single X). Correlation coefficients may also be computed in the multivariate situation where Y is correlated with a number of independent variables, X1, X2, X3 etc. When Y is related to all the independent variables together it is referred to as multiple correlation. When Y is related to one X taking into account the effect of some or all of the other independent variables it is referred to as partial correlation.
Correlation techniques are most appropriate for ratio or interval measurement scale data where the ‘best’ measure is Pearson’s Product Moment Correlation Coefficient, r . There are techniques for computing correlation coefficients for ordinal scale data, the ‘best’ measures are Spearman’s Correlation Coefficient and Kendall’s Rank Order Correlation Coefficient. Correlation coefficients cannot be computed for nominal categories except the special case of dichotomous data where Pearson’s or Kendall’s coefficients are suitable.
Correlation does not necessarily indicate causation and certainly does not prove the existence of causal relationships.
See DATA ANALYSIS: A BRIEF INTRODUCTION Section 13
Dependent and Independent variables
Associated variables are either dependent or independent.
Correlation measures the relationship between variables. This is usually done with a view to suggesting causal relationships.
The variable to be explained or predicted is known as the dependent variable.
The variables doing the explaining are known as explanatory, causal, or independent variables.
In experimental research, the independent variables are the ones systematically varied by the experimenter and the dependent variable is the one not under the experimenter’s control but which is to be affected by the other variables being manipulated.
Covariation
Covariation is a general term meaning the extent to which the scores on two variables go together. Covariance and the various correlation coefficients are indicators of covariation.
Convariance indicates the extent to which values on two variables vary together (i.e. high values on one go with high values on the other and low values with low values).
It is not however, limited to values between +1 and -1. It takes on a value of zero where there is not linear relationship, and takes positive and negative values in situations where the data show evidence of corresponding to a positive or negative linear trend. The size of covariance will increase or decrease as the strength of the relationship, or perfect linearity, increase or decreases.
Concomitant variation is the parallel variation of two or more variables without there necessarily being a direct causal relationship between them.
The coefficient of determination is the square of the correlation coefficient. It provides a measure of the extent to which the independent variable(s) in a regression equation explains the variation in the dependent variable. This applies to bi-variate and multivariate analysis. For example a correlation coefficient of 0.6 gives a coefficient of determination of 0.6 squared = 0.36. I.e., 36% of the variation in Y is explained by X (or the combination of Xs). The remaining 64% is due to random variation (which may ‘hide’ other explanatory variables).
A multiple correlation coefficient is a measure of the relationship of a dependent variable with the all the independent variables, i.e., Y with the combined Xs.
It is an indicator of the extent to which the multiple regression equation fits the data. As in the bi-variate case, the multiple correlation coefficient varies between 1 and 0, the nearer to 1, the stronger the degree of association.
The multiple correlation coefficient (usually Pearson’s coefficient, (R)) allows us to specify the relationship between Y and the associated Xs. The normal procedure is to identify likely factors that have a bearing on Y and (on the basis of available data) proceed in, what is called, a stepwise analysis. Approximately this amounts to taking the value of X with the largest coefficient in the multiple regression equation, compute the correlation between X and Y. Then take the value of X with the next highest coefficient and work out the multiple correlation coefficient of Y with both Xs. Then take the next highest X and add that into the analysis, and so on, in a stepwise process, at each stage generating a new value of the multiple correlation coefficient.
Partial correlation is a procedure that measures the effects of one X on Y, while controlling for the effects of all the other independent variables (known as the control variables).
Partial correlation usually assumes an underlying linear relationship between all the variables. Once the linear relationship between the independent, the dependent and the control variables is known, it is possible to remove the effect of the control variables. This is done by predicting the values of the independent and dependent variable (separately) from the control variable(s) (on the basis of the correlation between the control variable(s) and X and the control variable(s) and Y).
Partial correlation is a useful tool in clarifying the relationships between three or more variables. While the multiple correlation coefficient gives an overall picture of the relationship between Y and the combined Xs, the partial correlation coefficients provide the basis for examining the relationship between each X and Y when all the influence of the other factors is removed.
For example, income might be seen to be a function of education, age and gender. It would be possible to assess the extent of this relationship (assuming that education can be measured) by computing the multiple correlation coefficient. This would provide an overall picture. The partial correlation coefficients show the relationship of any one of the independent variables with Y, whilst controlling for the effect of the others. So it would be possible to see what effect educational attainment score had on Y whilst controlling for age and gender. (If educational attainment = X1, age = X2 and gender = X3, then this would be the partial correlation of X1 on Y, controlling for X2 and X3).
This procedure is different from just calculating the bi-variate relationship of X1 on Y because the partial coefficient takes into account the effect of X2 and X3.
Partial correlation analysis is useful in multivariate analysis in clarifying relationships. It can indicate spurious relationships by uncovering antecedent or intervening variables.
Colorado State University (1993–2016) defines correlation as:
1) A common statistical analysis, usually abbreviated asr, that measures the degree of relationship between pairs of interval variables in a sample. The range of correlation is from -1.00 to zero to +1.00. 2) A non-cause and effect relationship between two variables.
Elwell's Glossary of Sociology (undated) defines correlation as:
The relationship between two variables in which they vary together--say a correlation between the income of parents and reading ability among primary school children. Statistical correlation can vary from -1 to 1 (a 0 indicates no correlation between the variables). A positive correlation between two variables exists where a high score on one is associated with a high score on the other. A negative correlation is where a high score on one variable is associated with a low score on the other.
Correlation: A relationship between two variables whereby a change in one coincides with a change in the other.
See also
DATA ANALYSIS: A BRIEF INTRODUCTION Section 13 (Downloads as .pdf)
Schaefer, R. T., 2017, 'Glossary' in Sociology: A brief introduction, Fourth Edition, originally c. 2000, McGraw-Hill. Available at http://novellaqalive.mhhe.com/sites/0072435569/student_view0/glossary.html, site dated 2017, accessed 11 June 2017, page 'not found' 1 June 2019.