Data and Analytics

Social Network Analysis Part 1

This is the first of three articles about Social Network Analysis and how we created, classified, justified and used the data available to us to generate analytics.

Our platform allows us to extract a large number of tweets attached to a dense cloud of information. In order to untangle it, we can use Network Analysis to understand the relationships between users. In this way we, can start to build a picture of the sorts of communities and the types of people who make up the space of the political Twittersphere in South Africa.

If we were to look at all the information we have, it may resemble a Jackson Pollock painting. By restricting the data we looked at, we were able to make sense of the data by focusing on who the most important users are; how to find those users and how to define an important user. The metrics we chose to look at are:

  1. Number of followers
  2. Number of distinct tweets made in a certain time period (frequency)
  3. Number of friends
  4. Number of retweets of a tweet attributed to a particular user
  5. Number of mentions and hashtags referring to a user

These metrics can be separated into two groups: the activity of a user and the influence of a user. A user’s influence is largely determined by the number of followers the user has as this indicates the number of people that will potentially see a tweet and/or retweet it. These users are seen as having the ability to control the flow of information, which we can think about it in terms of a cascade effect, for example if a celebrity tweets, the people directly following them will be more likely to retweet that tweet and their followers’ followers are likely to retweet that post etc. Say a celebrity does not retweet a particular tweet, how many people are likely being cut off from potentially seeing that tweet by direct and indirect (followers of followers) means? In our analysis, we will not look at the indirect “cascade effects” and will focus on the most direct users.

A factor that is likely to amplify the number of followers is the frequency of postings. The more frequently a user posts, the more likely people are to see the tweet and potentially retweet it. If this effect occurs for an influential user, this would create a compounding effect for number of followers. However, it does not directly affect the influence of a user.

In constructing our network, we care about the most influential users. We also care about what the general communities look like and what the average user looks like. While constructing our network, we focused on key influential users, general communities and an average user’s characteristics. In order to give us a better understanding of this, we look at the most active users, (users who have the highest post frequency and the most number of mentions and related hashtags). This will show us the potential communities occurring and also distinctly highlight more influential users and users with a higher importance by mentioning them in their twitter conversations. This set of restricted users forms the set of nodes of our network. In order to identify relationships between these users we can look at a number of key attributes; who follows who or is friends with who (this builds a network hierarchy).

Focusing on content and conversation between specific users allowed us to generate a usable network diagram. Our network is generated by noting every time a user mentions another user in their text, has hashtags with their screen-names or mentions another user, creating a directed graph with 150 nodes and 613 edges. The short time span, from 22 June to 29 June, may result in missing connections and nodes that may have been observed over a longer time period, for example, Julius Malema does not seem to be picked up too many times and the dominant entities are linked to the DA. This allows for additional analyses to be performed after the elections to get a fuller picture as well as identify any changes to this network. The graphs below provide an idea about our users.

Sentiment Graph

The graph above shows the maximum number of the relevant Twitter users’ follower and friend counts from the 22 June (until 26 July 2016) sorted in descending order. This information may differ from that reflected one month ago but still provides a clear enough picture of number of followers and friends unless there was some very dramatic spike in one of the user’s counts.

Sentiment Graph

The graph above shows Twitter users friend and follower counts contrasted with the number of distinct tweets that were made by the users in the time period. Most of these users make up our most active users.

Looking at the above two graphs it is interesting to note the different types of users that are prevalent. There are journalists, news agencies, political parties, party representatives and affiliates and activists. There are a few users who are ordinary citizens with lots of relevant posts but these tend to be insignificant.