Unfashionably Economic: statistics

Showing posts with label statistics. Show all posts

Thursday, May 24, 2012

Are Twitter Lists another social network?

Twitter has a "List" feature, which allows users to organize their followers, or view tweets from only a select group. Until recently, the number of lists following each Twitter account was visible on that person's homepage, but that changed in the most recent overhaul of the Twitter interface.

Lists are now much less visible in the average Twitter user's experience. This leads me to wonder, do Twitter lists follow the pattern of other social networks? Did the change even make any difference? With NetworkX and some fiddling around on the Twitter API, I was able to answer that question.

Promoted Accounts on Twitter, the Great Enigma

For a class project (CSS692/ECO895, Social Network Analysis) my group - Kevin May, Echo Keif and I - took on a project a almost bigger than we could chew: identifying astroturf on Twitter. It turned out to be more ambitious than we realized, but even starting with a low level of technical sophistication we were able to find some interesting results.

What is astroturf? While most social movements are said to resemble a "grassroots", sometimes wealthy organizations will attempt a "cashroots" strategy instead - paying for people to spread a pre-chosen message. This has been a problem since the dawn of democracy, but social media has given many more opportunities for astroturfing.

The Truthy Project is one attempt to track how online memes spread, and distinguish authentic movements from fabricated ones. However, there still isn't much agreement on what an astroturfer looks like, compared to a genuine grassroots movement.

We focused on Twitter for our project. The recently unveiled Promoted Accounts feature, used by Twitter to generate revenue, might uncharitably be described as a tool for astroturfing. Promoted Accounts are put at the top of the "Who To Follow" list shown to each Twitter user, but otherwise not tracked or recorded in a publicly accessible way. Our goal was to identify common characteristics of Promoted Twitter accounts, and thereby develop a profile of what an astroturfer might look like.

Fun with Twitter Metrics

Using tweepy I've been looking at the characteristics of my Twitter following. I found these histograms pretty interesting.

The first shows how many people my followers are following and followed by. (The x-axis is the relevant number, the y-axis shows how many incidences of that number of friends/followers occur).

Followers (Blue) and Following (Green).

Next is the number of status updates posted. Looks like lots of my followers haven't tweeted much at all! It's a dilemma: do I unfollow them for being inactive? But, because the inactives aren't tweeting, they aren't flooding my timeline with stuff I don't want to read, either... Twitter-vanity might make me keep them, just so that whatever bot is running those accounts doesn't unfollow me.

Status Count.

Finally, number of favorite tweets by user. Lots of people don't seem to use the "Favorite" function of Twitter at all. I probably have less than 10 tweets I've marked as "Favorite" (it seems like such a commitment). It's nice to be able to tag a link or something worth going back to later, so I'm glad Twitter has this feature... even though it is, apparently, hardly used.

Favorite Tweets.

Then there are a few accounts at the far right of the distribution with lots of favorites. What's going on here?

Generally the distributions resemble a power law, which is not surprising when looking at social networks.

Twitter metrics will be an ongoing project, so this is just the beginning. If you find this stuff interesting, check back in a few days.

Thursday, November 17, 2011

Statistical Fallacy #176: Ignoring Selection Effects

I stumbled on a post at a credit-related blog. It starts off with the bombastic first line

"The average consumer is saddled with $29,985 in student loan debt..."

Wow! That's a lot of debt! It's true that the U.S. population has a giant amount of student loan debt -- even more so than the amount of credit card debt. Last year, I wrote about the subject. But, the figure above is pretty high. That statistic is drawn from "262,887 CreditKarma.com user scores." Sounds pretty robust. But, some simple math reveals there's more to the story.

Facts:

Total student loan debt in the U.S. is about $1 trillion (~$1,000,000,000,000).
The U.S. population is 308,745,538. Of that, 24% are under 18, leaving 234,646,609 adult consumers.

Do some division, and you'll find that the average adult consumer has $4261.73 in credit card debt. That's about $25,000 less than the Credit Karma estimate!

What went wrong? My guess: selection effects. Members of a site specializing in credit advice are not a random sample of the population. People who join are probably concerned about their credit... and people who are concerned about their credit probably have a lot of debt.

Nothing personal against the writers for that site, as it would be an easy mistake to make (and they were very nice, even in response to my snarky comment pointing this out). But still, they should have been more careful. A quick test, by multiplying their estimate of average debt by the number of consumers, finds that the U.S. has a total of $7,035,878,570,865 in student loans outstanding, about seven times the real figure. If it were true, that would be about 11% of the entire world GDP owed by American students!

The lesson: look out for non-random sampling due to self-selection, or your numbers will be nonsense.

Sunday, November 6, 2011

Fun Facts about Microsoft Co.

Why would anyone bother reading shareholder reports? They're dry, long-winded, and functionally outdated by the time of arrival, so there's no way to profit from the information. Reasons for reading would have to include boredom, duress, or idle curiosity. It was the latter which led me to the Microsoft Annual Report for 2011. A few interesting facts pulled from that document:

Microsoft is divided into five segments. The Windows & Windows Live Division gets 75% of its revenue from selling Windows to computer manufacturers, to be pre-installed for end users. The remaining 25% comes from sale of miscellaneous hardware products and online advertising on Windows Live.
In the Windows Division, most growth over the last year was business sales (+11%) while consumer purchases went down (-1%). A substantial part of the drop in consumer PC sales was from netbooks (-32%).
Employee severance expenses were $59 million in 2010 and $330 million in 2009. Why the huge change? Microsoft: "In January 2009, we announced and implemented a resource management program to reduce discretionary operating expenses, employee headcount, and capital expenditures."
Research and Development costs took up 15% of Microsoft's revenue, or $9.0 billion, in 2011. That investment is well-protected -- by 26,000 U.S. and international patents, and another 36,000 pending.
Kinect for Xbox 360 is the fastest-selling consumer electronics device; confirmed by Guinness World Records.
If you'd bought $100 of Microsoft stock in June 2006, six years later it would be worth $122.71 (compare to $115.61 for the S&P Index, or $157.48 for the Nasdaq Computer Index).

What, if anything, does this say about the corporation and its future? Microsoft's product focus is split between entertainment/gaming and business services, while the company's prior breadwinner - bundling software with new PCs - is taking a back seat. As stated in a note from their CEO, Steven Ballmer: "increasingly, we will view ourselves as a devices and services company." It sounds closer to Mattel than the Evil Empire. Regardless, Microsoft's diverse selection of both patents and products provides a foothold to compete against intimidating rivals like Google, Apple, and Salesforce.com.

Wednesday, October 26, 2011

How to judge campus safety?

A few days ago I was emailed a pdf document: the 2011 Annual Security Report for George Mason University. As mandated by the Jeanne Clery Disclosure of Campus Security Policy and Campus Crime Statistics Act (yeah I hadn't heard of it before either) it provides a breakdown of all criminal activity which occurred on campus, by year, and with special columns for "Hate Crimes." The picture I attached has the numbers for Fairfax. This is the most interesting part of the document to me because it contains some raw figures on different offenses committed in the campus I attend. Statistics for the other George Mason campuses (Arlington, Prince William, Loudoun, etc.) are also available but are a lot less edifying, because the columns have just a bunch of zeroes. Coincidentally, Fairfax also happens to be the only campus with attached undergraduate housing -- make of it what you will.

The most exciting table I've seen since breakfast.

This report is obviously intended to increase public awareness about crime rates on campus, allowing potential students and their parents to make an informed decision when comparing different universities. What I wonder is, how does someone look at this report and get any sense of the probability that they themselves will be victimized? This blog post is a rough attempt at answering that question.

Some useful figures to get started with:

Roughly 30,000 students attend GMU, 7,000 of them on living on campus.
From the BJS, around 35% of property crimes and 45% of violent crimes are reported to the police.

How to Gain Twitter-Fame for Penny Stock Advice, with no Skill, Knowledge (or Profits) Required.

Along with upcoming rappers, Bieber fans, and ad-bots there’s a rash of penny stock advice to be found on Twitter. At first I dismissed it as one of many eccentricities of the platform, but after seeing a few dozen assorted “penny stock” accounts I started to wonder. What could explain these accounts peddling advice on securities that most investors wouldn’t line a litter-box with?

So-called "penny stocks" may range in cost from a few dollars to a fraction of a cent. For example, instead of buying one share of IBM at $164.68, it would be possible to instead purchase 4,450 shares of Double Eagle Gold Holdings (DEGH) at $.037 per share (amusingly, both stocks are currently near their respective peak historical values). DEGH had been running at an average price of about $.003 for most of the last year. If an investor had a crystal ball and could foresee this recent ten-fold run up in price, there would have been a lot of money to be made; therein lies the temptation of penny stocks.

Of course, anyone who actually had that crystal ball and put it to use in the market would be far too rich to bother with running a Twitter account. So why are there hundreds of penny stock tweeters out there? To explain, here is a theory of how ANYONE can appear blessed with penny stock clairvoyance.

The Five-Step Guide to Achieving Twitter-fame with Penny Stock Advice:

Step 1: Pick out 100 penny stocks at random, and buy $10 worth in each of them for a total cost of $1,000 plus brokerage fees (or, if you’re cheap, just consistently follow the prices of 100 penny stocks).

Step 2: Wait. As is normal for inexpensive and highly volatile stocks, the price of some will go up dramatically and others down equally dramatically.

Step 3: Ignore the stocks that go down. Out of the 100, by random chance you’re almost assured to see one go up every now and then. Get on Twitter and brag about how well your picks in the stocks that went up are going.

Step 4: Construct self-promotional statistics to describe how well an investor could have done if they had known exactly when these volatile stocks would move up and down, then tweet about anyone can generate “POTENTIAL 237% PROFITS!!!” based on your expert advice.

Step 5: Bask in fame and adulation. If you are lucky, people will buy a subscription to your newsletter. Or, if they follow your advice, it will drive up the price of penny stocks you own. Then sell off the penny stocks that went up due to your “wisdom” and leave your followers to eat the losses as the stock shifts back down.

I can’t verify that every penny stock tweeter uses this self-serving strategy. However, it’s the only way I can think of making money off penny stocks, so I’d guess that a large ratio of those Twitter accounts have something like this in mind.

In the time it took me to write the above, DEGH – which I noticed as a result of a penny stock tweet – has dropped 35%. IBM, on the other hand, changed 0.30% in that hour. In a nutshell, this is why investing in penny stocks is probably not a good idea: you get all the risk of stock market speculation without much stake in any real value (or else why is the stock so cheap?). Markets tend to be efficient and integrate available information into stock prices, so when a stock costs a fraction of a cent, it’s probably because many people rate its investment value somewhere near a lottery ticket.

The DEGH rollercoaster, courtesy of Google Finance. Notice the peak, then sudden drop at the end.

To make matters even worse, even if you successfully buy low and sell high with penny stocks – a difficult proposition, given how quickly the values change – you’ll be eaten alive in brokerage fees. For the example above, even if one used a discount brokerage like Scottrade, the cost of each purchase would be a $7 flat fee – making a $1,000 investment cost a total of $1,700. It would take a crystal ball, extraordinary luck, or loads of self-serving information delivered to a mass audience in order to generate enough returns to cover that cost. When you see someone giving investment advice on Twitter, mentally ask which of those three categories you think they fall into.

Note: for entertainment purposes only. I’m not dispensing investment advice; the stocks named were solely for example purposes, not as endorsement. If you’re reading this and run a penny stock service I’m sure you’re the exception to the above, and love children, flowers, kittens and your advisees all equally and would never pull such a scam on them. I’m just writing about your competition. But I would awfully like to peak at your crystal ball sometime when you get a chance.

Monday, September 13, 2010

Statistical Fallacy #002: Confusing Correlation with Causation. Does a strong handshake really make you live longer?

Even highly educated and intelligent medical researchers aren't immune to statistical errors. A recent study in the British Medical Journal referenced 33 other studies on personal mobility and life expectancy, and compiled their results. According to Reuters,

They found simple measures of physical capability like shaking hands, walking, getting up from a chair and balancing on one leg were related to life span, even after accounting for age, sex and body size.

While the phrase "accounting for age, sex and body size" makes this process sound very objective and scientific, there are obviously a lot of other factors that can play a role in life expectancy. Personal differences, such as leading a more active lifestyle, could cause someone to have both more hand strength and also better health in general which contributes to their longevity.

While statistics saying "the death rate over the period of the studies for people with weak handshakes was 67 percent higher than for people with a firm grip" sound very dramatic, it's hard to say if that relationship is reverse-causal; in other words, having a weak grip may signal your lifespan will be short, but will improving your grip really make you live longer? Probably not, which suggests it's far more likely that a common variable - for example, sitting on the coach all day - causes both weak hands and a lower life expectancy.

Common sense says that working with a stress ball or doing forearm exercises to develop a crushing handshake probably won't substantially reduce your chance of death from heart disease, cancer, stroke, or the other leading causes of death for adult Americans. However, this is exactly the impression given by the Reuters article title "Want to live longer? Get a grip!" Heavens forbid someone took this seriously and developed gorilla-like forearms only to find out their fitness investment had been in vain.

Tuesday, August 24, 2010

Statistical Fallacy #317: Holding Constant that which Changes. See: 'median household income'.

Alternately,

"Torture numbers, and they'll confess to anything."*

Our inquisitionist of the day hails from Bloomberg. In an article today, Venessa Wong wrote the following:

While many Americans dream of a windfall that will take care of their financial needs for life, the sobering reality is most of us are not getting far: U.S. Census Bureau data show median household income barely changed in the 10 years following 1998 as the price of housing and other goods increased. In consumer price index-adjusted dollars, the median household income in 2008 was $50,303, compared with $51,295 in 1998. [Emphasis added.]

What's wrong with the above? In a fairly common maneuver to paint a doom-and-gloom image of the times, average household size is treated as a constant to compare incomes over time. That just isn't the case.

The problem with using 'median household income' as a measuring stick is that it's actually a factor of two other variables: combined income, and number of people per household. The latter aspect is conveniently overlooked by pessimists, who are looking to demonstrate a negative trend over time. When everything is considered, a different picture emerges.

Unfashionably Economic

Navigation: