Orientation Observation In-depth interviews Document analysis and semiology Conversation and discourse analysis Secondary Data Surveys Experiments Ethics Research outcomes



Social Research Glossary

About Researching the Real World



© Lee Harvey 2012–2019

Page updated 25 June, 2019

Citation reference: Harvey, L., 2012–2019, Researching the Real World, available at
All rights belong to author.


A Guide to Methodology

CASE STUDY Twitter analysis

In their study, 'Big data: methodological challenges and approaches for sociological analysis', Tinati et al., (2014), developed a methodological tool for use with Twitter accounts.

Twitter, established in 2006, allows individuals to send 140 character messages (called 'tweets'). These are immediately visible to the sender's 'followers' and to anyone searching the Twitter website. As of 2017, half a billion tweets are sent per day (Internet Live Statistics, 2017). Its attraction for social researchers is that it is easily accessible and communication networks can be identified. Twitter does not require reciprocal connection between users, unlike some other social media websites. A person tweets, this goes to all the 'followers' and is accessible to non-followers. Tweets can be passed on ('retweets') and 'hashtags' within the message may be used to group discussions around specific topics (e.g. #englandmanager).

The authors write:

Our tool provides a dynamic visualisation of the Twitter information flows and social networks that emerge over time. Its development was driven by the following underlying principles. First, begin with the network. If we are interested in the actors and outcomes that are produced in the ongoing flow of information, we need tools that can explore how these emerge within the network, rather than imposing a priori assumptions about who or what is important, or using sampling frames from beyond the network to make the data manageable. Second, we must capture the dynamic flow of tweets, to explore the network as it grows. Third, we must overcome methodological polarisation between macro and micro analysis: between large-scale metrics—which measure the structures and patterns of Big Data—and analysis of micro-level interactions—the communications of individuals... allowing the combination of technical capabilities with in-depth qualitative research methods. (Tinati et al., 2014, p. 667)

They explain that their tool adopts elements of social network analysis, such as counting nodes (number of users), edges (communication between one user and another), in-degree (number of tweets or retweets directed at an individual user) and out-degree (measuring the mentions made or retweets by that user of another user). In addition , the tool also enables them to:

(1) examine the dynamic properties of Twitter networks, incorporating an adaptable graphical user interface to visualise this; (2) develop associated metrics to measure the flow of information at scale and over time; and 'zoom in' to examine the content of conversations and communications between individuals.... [The] tool filters the data stream following the primary principle: that is, starting with the network itself, drawing on user-generated hashtags. Hashtags are produced to link a tweet to a particular topic, effectively a 'bottom- up' curation of tweets around a particular topic into a single stream of data. Second, the tool uses an algorithmic filtering solution to reduce the volume of data based on the characteristics that individuals display within the network: the number of times they have tweeted, the number of times they have retweeted or been retweeted, their connectivity within the network and the role that they play in the diffusion of information. (Tinati et al., 2014, p. 668)

Retweets play an important role in this process because it allows the researcher to follow the information flow. Observing individual tweets provides no way of knowing whether they have been read and their content passed on to anyone. Retweets also offer a way to observe 'which information and which actors become important as the network evolves: what the network produces, rather than using the network as a data source to observe actors or tweets selected in advance' (Tinati et al., 2014, p. 669).

They demonstrate the method through an analysis of the #feesprotest Twitter network that drew together tweets around the rise of student fees and a protest that took place in November 2011. There were a total of 12,831 tweets made by 4737 Twitter users from 8 October to 21 November 2011. They were unevenly spread with a large increase around the day of the protest that then tailed off. 'Over 54 per cent of the tweets are retweets —passing on others' messages—whilst only 18 per cent of all tweets direct messages to another user, showing a high recirculation of information intended for a general, rather than specific, audience' (Tinati et al., 2014, p. 670).

The research questions include: What information is flowing? Which actors are most widely cited? How well connected are the tweeters? And do these change over time?

The analysis focuses on the retweet network in order to trace the information flow and consequent networks. The tool they used filtered data from #feesprotest to trace tweets that have been retweeted 100+ times, a level that was chosen to ensure the level of detail was appropriate to the questions being researched. They constructed the network for the six days surrounding the protest, and created static network maps of retweets for specific days as well as a dynamic visual demonstration of the growth of the network (see (accessed 31 January 2017)). The pictorial mapping of retweets makes it

immediately obvious that there are only a small number of highly retweeted users (in these data, only 0.26 per cent or 12 individuals were retweeted more than 100 times). These are not necessarily the most prolific tweeters—their average tweet-retweet ratio is 1:12—so they would not have been identified on these grounds alone, but their place in the flow of information is clearly significant. Four of these users were already apparent almost a week before the protest, and by 9am on the morning of the protest, nine of the 12 were already present, showing the emergence of consistent key players who, indeed, only consolidate their role as central nodes in the network over the period. In contrast to previous research that identifies the interesting actors as a way of sampling data, our method means that these key players are derived from the network itself. (Tinati et al., 2014, pp. 672–3)

As such, some of the less known figures would not necessarily have been picked up by sampling well-known persons. This inductive construction from the data, the authors imply, provides a better way of constructing and understanding networks than by sampling tweets or using key informants.

The analysis showed that the network became less heterogeneous as it grew with the most highly retweeted messages from known anti-cuts tweeters who, once they have gained a voice, increased their audience and therefore volume over time. This preferential attachment, makes it harder to become popular as the network grows.

Prior to the day of protest itself, four individuals were identified as highly retweeted, and this number quickly rose to double that within five days. However, subsequently, the rate of growth of individuals to become highly retweeted decreased and, instead, the already highly retweeted individuals reinforced their voice within the network, although they were not necessarily adding new tweets…. The flow of information within the network becomes saturated with the tweets of these highly retweeted individuals, overshadowing the unknown users and their tweets. (Tinati et al., 2014, p. 673)

In addition to the temporal pattern in user popularity the content of the highly circulated information shifts from calls to participation to discussion of police tactics and brutality. Nine of the top 10 most retweeted posts concerned policing and allegations of brutality.

The single most retweeted post from @Potemkin—a user with no apparent political affiliation and a relatively small number of followers (c.600)—begins 'I got told not to post these pictures...' suggesting an appetite amongst retweeters for using Twitter as a mechanism of direct defiance, although the chain dies away within 24 hours. In comparison, the longest chain, also highlighting policing tactics sustains itself over four days, and was posted by a journalist with over 8000 followers. (Tinati et al., 2014, pp. 673–4)

The authors explained that the number of followers of an individual (re)tweeter is important, as any post will show up in the timelines of all those followers. The more followers, the more widely the information is circulated. In addition, any hyperlinks within retweets get distributed widely, enabling more information than can be provided in the 140 characters in tweet. If the link is to a different social media format, then the users of that social media page are potential network members. This reinforces the view of Segerberg and Bennett (2011) that Twitter needs to be seen within a broader set of tactics for political mobilization.

The pattern of retweets, as shown by the network mapping, is not random or evenly spread across the network. Some users, although not generating much content themselves, play an important role in passing information on, being the first to retweet, pushing information on to new audiences, often very swiftly. Analysis of the #feesprotest network shows that one particularly active user, '@politicalweb', throughout the lifetime of the network, was the first to retweet three of the four most highly retweeted messages, initiating the wider circulation of these original posts. However, this amplification role was selective, with emphasis on the organisation and coordination of the protest. A second important role is not in being the first to retweet but in retweeting posts from diverse streams of information, connecting discrete networks aggregating threads of information into a single channel. The result was a complex interconnected network, dominated by a few highly retweeted individuals, whose position strengthens over time, narrowing down the information in flow, specifically, in this case, to concentrate on concerns about the policing of this protest.

The authors conclude that their method enables them

to 'zoom' from analysis of the macro-structure of the network—where our analysis is based on quantitative algorithmic methods—to the micro-level of individual users and tweets. This allows us to see how information diffuses and flows between users over time, and to explore the networks that emerge as a consequence. Whilst previous research concentrates on content and aims to link this to offline activities, our research shows for the first time how specific pieces of information flow and how the incremental actions of individual users produce social roles and networks inside Twitter. This shows, very clearly, that Twitter is not one thing but many. Twitter is neither a medium for news nor a method of organising but both: its form is contingent produced in the multiple iterations of users. ' They note that although the #feesprotest was used extensively it had a short lifetime, reflecting the dynamic, fluid and changing nature of Networks. (Tinati et al., 2014, pp. 676–7)

Their illustrative study was a demonstration of a different methodological approach, which was not just about technical processes; as they said:

our argument suggests that sociological concepts, theories and methods are critical to Big Data analysis. As the zeitgeist shifts towards 'data driven' research we must be clear that data are not naturally occurring or unmediated but are sociotechnically constructed: produced and represented in the artefacts that have been designed for particular platforms and through the users' adoption and adaption of these. Furthermore, the meaning of these data is not self-evident but requires robust methodologies, nuanced conceptual vocabularies and theoretical frameworks drawn inter alia from sociology. However, the existing sociological repertoire of methods (and perhaps theories) will not be sufficient in this endeavour.' (Tinati et al., 2014, p. 678)




Return to Big data (Section 7.4.3)