DiggStatus: Digging Deep Into Digg Users' Statistics -- Page 5
December 31, 2006
The Experiment
DiggStatus Was Born
Page 2: Submission/Promotion Graph
Do Top Users Control the Front Page?
Details on Top 5,000 Submitters
Page 3: Lame Users
Does the Average User Even Try?
Counting Inactivity
Front Paging Without Friends
Page 4: DiggStatus Usage
Most Queried User Names
Queries Per IP Address
Obligatory Traffic Graph
Page 5: Summary
Conclusions
Most Digg Users Are Pretty Lame
Dollars and Cents
Conclusions
I went into the DiggStatus project with a few expectations. I was fairly certain it would hit the front page, and it did. I was expecting
to end up with tons of data, and I did. I was hoping I would walk away from the data with some interesting discoveries, and -- in my opinion
-- I did that as well. I would love to get my hands on the statistics for the entire Digg user base, but at the moment, that goal is out of
reach (unless I decided to write a script to scrape the entire site.. but I don't typically launch DoS-like scripts on my favorite sites).
There are a few things I would like to state, even though anyone who read through this entire article should already realize these conclusions.
Is there any evidence that the Digg algorithm favors Top Users? No. Is there evidence that there are factors contributing to the higher success rate of Top Users? Yes. As the graph on Page 2 clearly illustrates, the promotion ratio of a user's stories is generally inversely proportional to the number of stories he or she submits. However, the 200 users that submit the most stories have significantly more users befriending them. As this occurs, the declining ratio trend is broken. Users who have more people befriending them have much higher promotion ratios than users who don't. This evidence shows that Top Users have more exposure, but not necessarily more weight. It is also useful to note that the average promotion ratio for the users who submit the most stories is noticeably higher, but not significantly higher.
Let me show you the type of MySQL queries I used for the graph on Page 2. I am using an older version of MySQL that doesn't support subqueries. This is definitely not the most efficient way of finding this information with MySQL versions 4.1 and higher. Basically, what I did was create a temporary table for each 100 users, then select the averages from that set.
select storiessubmitted, promotedstories, befriended
from tblDiggStatus where username != 'kevinrose'
order by id limit , 100";
$query2 = "select avg( promotedstories / storiessubmitted ) as ratio,
avg( storiessubmitted ) as submitted, avg( befriended ) as avgbefriended
from tmpTable";
$query3 = "drop table if exists tmpTable";
Another interesting note about the Top Submitters from the graph on Page 2 is that even if their average promotion ratio was lower than the trend would predict, they would still dominate the front page because of the sheer bulk of their submissions. Many average users assume the Top Users can get any article to the front page. This is quite far from the truth. Only 8 of the Top 100 ranked Digg users have a promotion ratio higher than 50% (one of them is 'kevinrose' at 100%). This means that over 90% of the Top 100 Digg users fail over half the time!
This is the MySQL query I used to find the Top 100 users that have a promotion ratio over 50%:
promotedstories, storiessubmitted FROM tblDiggStatus
WHERE overallranking > 0 AND overallranking < 101 AND (promotedstories/storiessubmitted*100) > 50
ORDER BY overallranking
This is the result of that query:
+-----------------+----------------+--------+-----------------+------------------+ | username | overallranking | ratio | promotedstories | storiessubmitted | +-----------------+----------------+--------+-----------------+------------------+ | p9s50w5k4gud2c6 | 3 | 50.67 | 752 | 1484 | | webtech | 13 | 53.94 | 411 | 762 | | darkhack | 16 | 60.99 | 358 | 587 | | webtickle | 29 | 70.14 | 242 | 345 | | kevinrose | 33 | 100.00 | 191 | 191 | | sahaskatta | 53 | 60.10 | 116 | 193 | | danhuard | 84 | 58.65 | 61 | 104 | | frgmstr | 100 | 82.26 | 51 | 62 | +-----------------+----------------+--------+-----------------+------------------+
Most Digg Users Are Pretty Lame
The purpose of Digg is clear. The content is user-generated. The point of being a Digg user is to contribute and/or to vote. Out of the 10,000+
users in my database, the 2,746 users who have not submitted a single story have dugg a total of almost 1,000,000 stories. There are three main
types of users: those that contribute; those that vote, but don't contribute; and those that neither contribute nor vote.
I feel this query is a reasonable baseline for measuring the number of people who contribute:
Unfortunately, by this measurment, only about 1197/10691 users are "contributors". Even more unfortunate is the fraction of users who are neither voters nor contributors. This group can be aptly named the "leeches". Here is the query:
This query determines the number of people who Digg less than and average of one story per day. They also submit an average of zero or one stories per month. The result is 4773, meaning nearly half the users in my database are "leeches". Instead of complaining about how Top Users get on the front page so often, people should complain about the staggering number of users who don't contribute anything to Digg. If every Digg user submitted one story per month more than they usually do, the effect would trivialize the debate about Top Users.
Dollars and Cents
Well, mostly cents.
I was rather pleased to find out that an advertisement on a front page submission does not provide a significant boost in ad revenue. If I had a new DiggStatus every month for a year, I could pay for my web hosting for that year! Either sploggers are really cheap and get their jollies from earning an unethical $20 here and there, or they try to submit a high volume of splog links hoping they will all add up to a reasonable sum. The next time you complain about a splog on Digg, think first about the wage that individual made creating that page and submitting the link. Even if it was a quick copy and paste job from the original source, they likely profited about as much per minute of their time than a Taco Bell shift manager. If a splogger spends more than a couple hours researching, writing (cough-copying-cough), and promoting their splog, they would be better off making chalupas for me. At least they would be serving society in some way, right? ;-)
Yes, I did put a text ad on each of the pages of this article, but rest assured that I will be compensated very poorly considering the amount of time I spent on the DiggStatus experiment.
The End. Now Go Digg Something!
Feedback is always welcome: brian@shaler.name
Page 1: Introduction
Page 2: Submission/Promotion Graph
Page 3: Lame Users
Page 4: DiggStatus Usage
:: Page 5: Summary
This site is powered by Brian's keyboard.


