Tuesday, May 22, 2012

Promoted Accounts on Twitter, the Great Enigma

For a class project (CSS692/ECO895, Social Network Analysis) my group - Kevin May, Echo Keif and I - took on a project a almost bigger than we could chew: identifying astroturf on Twitter. It turned out to be more ambitious than we realized, but even starting with a low level of technical sophistication we were able to find some interesting results.

What is astroturf? While most social movements are said to resemble a "grassroots", sometimes wealthy organizations will attempt a "cashroots" strategy instead - paying for people to spread a pre-chosen message. This has been a problem since the dawn of democracy, but social media has given many more opportunities for astroturfing.

The Truthy Project is one attempt to track how online memes spread, and distinguish authentic movements from fabricated ones. However, there still isn't much agreement on what an astroturfer looks like, compared to a genuine grassroots movement.

We focused on Twitter for our project. The recently unveiled Promoted Accounts feature, used by Twitter to generate revenue, might uncharitably be described as a tool for astroturfing. Promoted Accounts are put at the top of the "Who To Follow" list shown to each Twitter user, but otherwise not tracked or recorded in a publicly accessible way. Our goal was to identify common characteristics of Promoted Twitter accounts, and thereby develop a profile of what an astroturfer might look like.

Methodology: using a script to interface with the Twitter API, we collected networks by picking a Promoted Account or someone listed as "Similar" to a promoted account. Then we created a graph out of everyone that account follows, and everyone that each of those friends follows. Given how unselective some people are in following others on Twitter, these graphs got big very quickly! Here's a sample, after trimming out accounts with less than 75 connections:

Mitt Romney's core Twitter network.
After collecting 150 such graphs, we starting looking at various network metrics. Here's a visual comparison of Promoted versus Similar accounts based on those metrics:

Radial graphs created by Echo.
("Normalized" means fitting all values into the interval [0,1]. This makes measures more comparable between large numbers; e.g. cliques were often measured in the millions).



At a glance, it looks like Promoted accounts have higher Closeness Centrality (roughly speaking, this reflects the relative importance of the account in terms of connections with others). Promoted accounts also tend to have less followers. This makes sense -- Bill Gates or Justin Bieber don't need to pay for promotion, because they already have millions of followers. It tends to be mid-size accounts which are promoted, and this is reflected in the numbers.

Finally, using multiple linear regression, we tried to see which attributes can predict whether an account is Promoted or not. Here is the output for several different specifications:


(1)
(2)
(2)
(4)
Following
2.55E-05
(0.000093)
0.000449
(0.000388)
4.04E-05
(0.00018)
1.02E-03
(5.65E-04)
Followers
-3.82E-08**
(1.11E-08)
-1.57E-08
(1.81E-08)
-2.81E-08**
(1.19E-08)
-1.11E-08
(1.47E-08)
Nodes after Trim
-1.55E-06
(3.12E-06)
5.04E-06
(8.38E-06)
8.21E-06
(9.63E-06)
1.44E-05
(1.82E-05)
Edges After Trim
-2.02E-07
(4.38E-07)
-1.60E-06
(1.35E-06)
-1.75E-06
(1.47E-06)
-3.66E-06
(3.04E-06)
Pendants
8.15E-07
(5.35E-07)
-3.12E-07
(1.45E-06)
-1.45E-07
(1.06E-06)
-1.60E-06
(2.31E-06)
Network Density
--
--
-0.06754
(0.197919)
0.264582
(0.474809)
Closeness
--
--
-0.29216
(0.688034)
8.57E-01
(8.93E-01)
Cliques
--
--
1.19E-08
(2.60E-08)
3.48E-08
(3.80E-08)
PageRank
--
--
0.372417
(0.591458)
-1.8429
(1.452847)
Constant
0.088058
(0.046366)
0.267441
(0.258445)
0.035358
(0.312639)
-0.09244
(0.580219)
Similar FE?
NO
YES
NO
YES
n
136
136
120
120

(95% significance level or above is shown by **. Standard errors are robust to heteroskedasticity). Nodes/edges after trim is the number after removing all accounts with only a single connection ("pendants").


This output isn't very satisfying, because almost none of our measures proved to be statistically significant. This may reflect the relatively small sample size, or just low levels of variation in the measures of interest. 

If you're interested in reading the final paper or seeing the script used to collect our data, you can find it here (majority of credit for writing goes to Echo Keif; Kevin and I were mostly involved in the data collection and statistics side).

This project was interesting because as far as I know, there is still very little information out there about Promoted accounts. Wild stab in the dark this might be, but since it's an early stab in the dark I think it still represents a contribution. Until Twitter makes info about Promoted accounts available via the API, broader efforts to understand who is promoted and what they gain from it will remain a very rough science.

No comments:

Post a Comment