Data and Analytics

Social Network Analysis Part 2

This post is a follow up of Part I’s Social Network Analysis and explore the construction of the network itself.

Sentiment Graph

Sentiment Graph

The above network diagram has too many edges and nodes to be able to interpret or understand anything but it is evident that some nodes on the edges have very limited connectivity compared to the nodes in the centre. The nodes or users on the outer edges tend to be some of the larger news agencies whose content is not strictly local and therefore a bit diluted in content.

The edges are weighted based on the number of connections between users and then normalized by the number of maximum connections of a user with any other user within the network. This corresponds to the user with screen name “gumede783” who had over 97 000 connections over the time period we looked at. This seems like an unexplained outlier that we should ignore - until we look more closely at his content which is all politically motivated, varied in opinions and content and varied in time between postings, leading us to the conclusion that this is not a bot. We believe that this is an important character that can’t be discarded. This account is not a verified one and therefore we cannot definitively state who this user is.

We focused on influential users specifically and therefore look at a graph measure called ‘eigenvector centrality’. The premise of this is that not all connections are equal and connections to other influential people will then make the person connected, have a greater influence [1]. This measure looks at the number and the quality of the connections each node has. The eigenvector centrality approach is used by Google in order to ranks their web pages. The colour of the nodes corresponds to their eigenvector centrality. The purple nodes have zero centrality, meaning that they are not influential in our context, while the black nodes have the most influence. The same key used above is shown below for more clarity.

Sentiment Graph

We cleaned the network diagram up and removed some of the noise to allow us to focus on creating an analytical view that would allow us to extract the information we were after. Our next version of the network analysis used the same colour of the node as discussed, however we also included the size of the node which is now proportional to how many incoming connections a node has. This means people may be their posts, tweeting to them directly or mentioning them in texts or hashtags often. The largest of the nodes is Hellen Zille, Zizi Kodwa (ANC spokesperson) and EWNreporter all of which have large personas. This can be seen in our updated network diagram below:

Sentiment Graph

To clean up the network diagram to another version and make a smaller community, we remove the purple nodes as well as the nodes with low eigenvector centralities. The most important users according to our calculations are as follows:

Sentiment Graph

The above table shows the most influential users within the network according to our methodological approach.