Social Research Glossary A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Home
Citation reference: Harvey, L., 2012-24, Social Research Glossary, Quality Research International, http://www.qualityresearchinternational.com/socialresearch/
This is a dynamic glossary and the author would welcome any e-mail suggestions for additions or amendments.
|
|
_________________________________________________________________
Cluster analysis
Cluster analysis is a multivariate analysis technique used to identify one or more sub-groups (or clusters) of items from a large pool of items that have been included in the analysis.
Items can only be combined into clusters if they (theoretically) measure approximately the same dimension or factor.
Ideally clustered items should inter-correlate highly with each other and show low correlation with items outside the cluster.
In constructing clusters the researcher must avoid simply clustering inter-correlated variables irrespective of what they measure.
According to Statsoft.com (undated):
The term cluster analysis (first used by Tryon, 1939) encompasses a number of different algorithms and methods for grouping objects of similar kind into respective categories. A general question facing researchers in many areas of inquiry is how to organize observed data into meaningful structures, that is, to develop taxonomies. In other words cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. Given the above, cluster analysis can be used to discover structures in data without providing an explanation/interpretation. In other words, cluster analysis simply discovers structures in data without explaining why they exist.
We deal with clustering in almost every aspect of daily life. For example, a group of diners sharing the same table in a restaurant may be regarded as a cluster of people. In food stores items of similar nature, such as different types of meat or vegetables are displayed in the same or nearby locations. There is a countless number of examples in which clustering plays an important role. For instance, biologists have to organize the different species of animals before a meaningful description of the differences between animals is possible. According to the modern system employed in biology, man belongs to the primates, the mammals, the amniotes, the vertebrates, and the animals. Note how in this classification, the higher the level of aggregation the less similar are the members in the respective class. Man has more in common with all other primates (e.g., apes) than it does with the more "distant" members of the mammals (e.g., dogs), etc. For a review of the general categories of cluster analysis methods, see Joining (Tree Clustering), Two-way Joining (Block Clustering), and k-Means Clustering. In short, whatever the nature of your business is, sooner or later you will run into a clustering problem of one form or another....
Note that the above discussions refer to clustering algorithms and do not mention anything about statistical significance testing. In fact, cluster analysis is not as much a typical statistical test as it is a "collection" of different algorithms that "put objects into clusters according to well defined similarity rules." The point here is that, unlike many other statistical procedures, cluster analysis methods are mostly used when we do not have any a priori hypotheses, but are still in the exploratory phase of our research. In a sense, cluster analysis finds the "most significant solution possible." Therefore, statistical significance testing is really not appropriate here, even in cases when p-levels are reported (as in k-means clustering).
In a cluster randomised controlled trial, people are randomised in groups (clusters) rather than individually. Examples of clusters that could be used include schools, neighbourhoods or GP surgeries.
See also
Researching the Real World Section 8