Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Social Media Analytics with NodeXL

Below you will find a brief tutorial for retrieving, visualizing, and analyzing graph network data from twitter using the free NodeXL platform provided by the non-profit Social Media Research Foundation (SMRF)

Table of Contents

About

Last Updated Feburary 2017
Created by Paul Vieth
University of Oklahoma Libraries

Lingo

Prep Work

Download and install the NodeXL Microsoft Excel plugin

Dipping Your Toe In

The Look and Feel

When you open the NodeXL template, most everything you’re comfortable with about Excel will be there waiting for you. Just to make sure we’re on the same page, your screen should look like this:

NodeXL Screen Layout

You’ll notice some changes. This isn’t a fresh Excel workbook, but has been prepopulated with everything you’ll need to work NodeXL’s magic.

Here’s a good overview of some of the different graphing algorithms (it’s pretty self-explanatory which shape algorithms correspond to which examples below):

Graph Algorithm Example Images

NodeXL functions

Network Graph Introduction to NodeXL

Another neat, and multimodal, way to learn about NodeXL’s functionality is to let NodeXL tell you using NodeXL’s functionality. That sounds meaningless, but will become clear.

Opening that sample workbook automatically opens a new Excel window, creates a new NodeXL template workbook, and populates that workbook with sample (but not random, lorem ipsum-esque) vertices and nodes, and their properties.

You’ll get something like this:

NodeXL Functionality Graph

This is, in broad strokes, everything that NodeXL (the central hub of this graph) can do. As for the secondary nodes, those connected directly to the hub, this workshop will go through three of them in detail:

Getting Data

Under the NodeXL tab –> “Data” section –> “Import” dropdown menu, you will see a list of all the potential data sources.

NodeXL Import Options

If your data has already been collected, you can import it into NodeXL as a Pajek, GraphML, or UCINET Full Matrix DL file. We’re not going to get into that today, and that would ignore some of NodeXL’s best functionality anyway.

Notice, from the bottom half of the “Import” dropdown, that data can also be taken from

Well, at least in theory. To import data from most of these sources you need to have NodeXL Pro installed. But NodeXL basic does allow you to import “From Twitter Search Network,” meaning you can pull up tweets containing keywords or hashtags, and can control the parameters of that data cull. Like this:

NodeXL Import from Twitter Search Network

This window gives us several options.

NodeXL Twitter Advanced Search Operators

NodeXL Basic and Friends Networks

Though they take longer to generate, showing friendship networks can be very informative, and can facilitate the creation of more snugly clustered groupings (something we’ll get to in the next section)

NodeXL Twitter Authorization

Rendering Graphs

Once you enter your search terms, select your parameters, and twiddle your thumbs while NodeXL communicates with twitter through its API, your NodeXL worksheet will become populated with vertex identities (of twitter users) and miscellaneous metadata:

The edges worksheet will also become populated with:

With this degree of detail in the metadata, there’s little limit to what you can do with NodeXL– only what Twitter will and will not allow through its Rest API

To Generate Network Graphs in NodeXL

Play around with this a litte bit. There’s lots to be done. The default graphing algorithm is the Fructerman-Reingold. You can change this from the drop down menu to the “Harel-Koren Fast Multiscale” or some shape-driven geometry. Try “Lay[ing] out [the graph] again” and see how the network reconstitutes itself with each iteration of the algorithm (usually in the direction of entropy, but sometimes fruitfully).

Analyzing Networks

The first step of analyzing networks is with your eyes. A good place to start with this is to use NodeXL’s dynamic filters (“NodeXL” tab –> “Analysis” pane –> giant filter icon, says “dynamic filters” right underneath it

NodeXL Dynamic Filters

These filters allow you to selectively view the graph according to parameters of your choosing – when the tweet was sent, the age of the relationship between users, the number of followers, other stuff. What was a messy jumble of cubist spaghetti-still-life can be narrowed according to your research priorities.

The next analytic tool is grouping algorithms. Also in the analysis pane of the NodeXL tab, you’ll see a dropdown menu for groups:

NodeXL Groups

As you can see, NodeXL allows you to rearrange your graph according to different parameters (motif, cluster, vertex attribute, connected component). These grouping algorithms befit different priorities, but I find the cluster algorithms best for highlighting nodal relationships inherent to your network. You can then tell the graph (in the “Document Actions” pane) to “Lay Out Again,” and it will rearrange your nodes according to your grouping algorithm, centralizing the kind of relationships you’re interested in, and casting those of a lower priority, or irrelevance, to the periphery of the graph.

After you’re done using your brain to analyze the graph, let the computer do its job. NodeXL has built in computational metrics commensurate with standard graph theoretic techniques.

In the “NodeXL” tab –> “Analysis” pane –> “Graph Metrics” dropdown menu, you will see a list of measurements and their descriptions NodeXL can take of your graph.

NodeXL Graph Metrics List

I can’t go into the usefulness or mathematical bases of all of these metrics, but the descriptions provided by NodeXL suffice for introductory purposes. With the nodes of 2,000 tweets (with NodeXL basic) or the 18,000 allowed by twitter, the topology of the graph surpasses the limits of human intuition, and these metrics become necessary and powerful ways to make sense of networks.

Further Reading

General

Ethics