
Hello world. It’s been awhile since the Web Ecology community last made a peep on the web. Some had been speculating that we had simply up and disappeared, but reports of our demise were greatly exaggerated, as they say.
Thanks to the completely amazing work of our affiliate researchers at Bennington, we’re glad today to announce the public launch of 140Kit, Web Ecology’s very own free-to-use toolkit for exploring and data mining Twitter. It’s the final product of the various provisional tools we’ve used to produce our previous reports on the social phenomena of Twitter, and of lead researcher Devin Gaffney’s own work on high throughput humanities.
So what does it do? Notably:
Best of all, we are making this platform open and free to use for all interested users. This includes opening up an API for queries of all sorts, an honest, open, and editable codebase, and plans already in the works to make the program extensible to allow developers to write their own analytics for the kit, on whatever sort of metrics in whatever programming language (stay tuned for details).
So, get in there and play, people. And let us know if you have any questions! tim.hwang@webecologyproject.org or contact@webecologyproject.org.
This paper represents an initial study of ChatRoulette.com, conducted between February 6th and 7th, 2010 by researchers in attendance at Web Ecology Camp III in Brooklyn, NY. We sampled 201 ChatRoulette sessions, noting characteristics such as group size and gender. We also conducted 30 brief interviews with users to inquire about their age, location, and frequency of ChatRoulette use.
Summary
• ChatRoulette represents an example of a probabilistic community: a community shaped by a platform which mediates the encounters between its users by eliminating lasting connections between them.
• After ChatRoulette users become more acquainted with the system (ie., do not browse solely to explore), we predict a decrease in explicit content, an increase in the consolidation of content genres, and an increase in the formation of celebrity figures.
• Our survey shows that ChatRoulette’s current community continues to consist of males age 18-24, concurrent with Alexa data.
You can download our report here.
One of the tenets of Web Ecology is accessibility to the field through open tools and open data. At the Web Ecology Project, we’re working to get more of our code in a clean, commented, and releasable state. The first tool that we have queued up for release is a Python module allowing easy use of Google Language Tools, involving language detection and translation, with transliteration in an experimental state (Google has not yet released the API spec for the transliteration portion so that was reverse-engineered).
Now for some sample uses of the tool:
>> from googlelanguage import *
>> print lang_detect("this is a sentence in English")
{'isReliable': True, 'confidence': 0.31734600000000002, 'language': 'en'}
>> print lang_translate("comment dit on 'WebEcology' en francais?", dest_lang="en")
{'translatedText': 'how it says 'WebEcology' in French?', 'detectedSourceLanguage': 'fr'}
We used it ourselves to detect the language of each tweet in a sample of 1 million tweets from our database, with the following results:
We’ve also found it easy to combine the tool with SQLAlchemy to create metadata tables with linguistic information.
It is our hope that this small, MIT/X11-licensed release will prove useful to some in the Web Ecology community. Until we figure out which platform we’re going to use for open repository hosting, you can download the file here. And if you would like to contribute patches or additions, or if you have any questions, feel free to send them to Jon.Beilin@webecologyproject.org
I would also like to thank Sam Gilbert for his invaluable contributions, feedback, and support.
Afghan citizens went to the polls on August 20, 2009 after a controversial delay recommended by Afghanistan’s Independent Election Commission to allow ample time to prepare for fair and safe elections. Karzai was favored to win the election amid a large pool of contending candidates; the most serious challenge coming from former Foreign Minister of Afghanistan Abdullah Abdullah. In pre-election polling, Abdullah gained significant momentum as election day drew nearer and other candidates dropped their campaigns.
In a clear reference to the protests following the June presidential election in Iran, Abdullah’s campaign manager was quoted predicting street violence if Abdullah doesn’t win. Here at the Web Ecology Project, we wondered if Twitter would play as significant a role in reporting the election as it did in Iran. In a country where mobile phone subscriptions add up to an estimated 50% of the population, but internet access was roughly 1.5% at last estimate with the status of network expansion [pdf] unclear, could the available ICT infrastructure and awareness of social media prompted by the “twitter revolution” in Iran enable a similar phenomenon post-August 20?
Using a new methodology based on the content and responses of 12 popular users, we determined measurements of relative influence on Twitter. We examined an ecosystem of 134,654 tweets, 15,866,629 followers, and 899,773 followees, and in response to the 2,143 tweets generated by these 12 users over a 10-day period, we collected 90,130 responses published by other users.
Summary of Findings
An analysis of our methodology and statistics suggests that on Twitter, among various configurable conclusions:

Click to expand image. A larger version with more temporal depth is linked at the bottom of this report.
We would also like to thank Jon Beilin, Mac Cowell, and Tim Hwang for their invaluable contributions, feedback, and support.
10 Days of Influence Tracked by Density of Responses (2993.27 KB jpg)
Read the rest of this entry »
When deciding whether someone is worth following or talking to on Twitter, most of us make a snap judgment based on a user’s follower count, but what does this really tell us?
For our fourth publication, the Web Ecology Project decided to move beyond follower count to find a better way to measure influence on Twitter. Focusing in on a handful of celebrities, news outlets, and social media experts widely perceived to be among the Twitter elite, we looked at the extent to which each of these users can:
The results, taken from 10 days of Twitter activity (August 15th through August 24th), were surprising. Consider how the users we looked at rank by follower count:
When you look at the extent to which any given tweet can spread content or foster conversation, these rankings change significantly:
iJustine, for example, can spread more content than MCHammer, who has over twice as many followers, and in terms of generating conversation celebrities like THE_REAL_SHAQ and aplusk tower dominate news outlets and social media experts alike.
When you look at how much each of these users is able to generate conversation and spread content relative to their follower counts, however, the rankings shift even more dramatically:
Values for MCHammer, aplusk, THE_REAL_SHAQ and CNNbrk plummet, while the social media experts, especially Chris Brogan, become powerful players.
These figures are just a taste of what’s to come. In our full report, we’ll unpack these numbers further and explore the somewhat surprising nuances and types of influence on Twitter.
By Elsa Kim and Sam Gilbert
with Michael J. Edwards and Erhardt Graeff
Michael Jackson’s death created an emotional outpouring of unprecedented magnitude on Twitter. In this report, we examine 1,860,427 tweets about Jackson’s death in order to test various methods of sentiment analysis and gain insights into how people express emotion on Twitter.
Key findings
We would like to thank Jonathan Beilin, Evan Burchard, David Fisher, Tim Hwang, Alex Leavitt, Dharmishta Rood, Max van Kleek, Jue Wang, and Seth Woodworth for their invaluable feedback and support.
Like the web itself, the study of the web is mostly an improvised structure. A group of progressive scholars, swept up by the technological transformation of the past decade, have done their best to keep up with understanding the massive cultural and social effects of our communication infrastructure.
Not surprisingly, the inevitable outcome of this state of affairs is that the body of research about the web is fatally fragmented. Economists are caught attempting to assert dated models against new motivational frameworks. Journalists attempt to prescribe weak methods to maintain traditional standards around the creation and transfer of information. Marketers and social media experts, still largely divorced from a universe of quantitative and technical research, fail to provide a useful approach. No coherent body of research has emerged focusing on studying the internet as the internet.
This has resulted in fundamental weaknesses in the approach to studying social phenomena online. Relevant approaches are being ignored and opportunities for applying cutting edge research from a number of siloed traditions are going unexplored.
Key Findings