Alex Leavitt – Web Ecology Project

Archiving Internet Subculture: Encyclopedia Dramatica

alexleavitt — Sat, 16 Apr 2011 01:05:49 +0000

EDIT: Pointed out in comments that we wrote 4901 articles instead of actual total of 9401 articles.

The Web Ecology Project is dedicated to the preservation of digital culture and folklore. In a recent talk about the Archive Team, Jason Scott elucidated the usual strategy that companies employ for dealing with digital artifacts, platforms, and communities:

Disenfranchise. Cut off any amount of support or awareness by users of their environment and what they are putting their lives into.

Demean. When a site falls out of favor, act like itâ€™s an electronic ghetto, not worth consideration as a valid entity. Think Friendster, orkut, myspace, geocities and a dozen others. Say their name in the company of people who understand the technical issues, and they snort. For a lot of people, these sites are parties, and the party is over.

Delete. Give a random amount of warning, and I mean, it really is completely arbitrary and made up, and then delete, with no recourse, nobody to ask for a copy, nobody to contact to retrieve your lost data, your husbandâ€™s history, your childâ€™s photos. Iâ€™ve seen periods as long as a year and as short as 48 hours. Thereâ€™s nothing, no standardization, no agreed upon procedure for decommissioning these sites. Itâ€™s all just being made up as it goes along.

Discover the newest gambling games schemes to play more games online.

Recently, Encyclopediae Dramatica (ED) — a wiki dedicated to the archiving of -chan subculture, celebrity, and the lulz — was removed from its servers with no effort to preserve the information contained within. While it has been replaced with a new wiki, we at the Web Ecology Project remain disheartened that no opportunity for the preservation of ED was offered nor any warning given.

Luckily, during a recent Web Ecology Camp in mid-February 2011, researchers Seth Woodworth and Alex Leavitt — during a scoping session for a project on Anonymous and Operation Payback — scraped ED and downloaded the textual elements of the wiki. We currently possess .txt files detailing the wiki markup used in the 9401 pages of ED (total at the time of collection), including links and records of images (though we do not possess the actual image files; we also do not have the edit histories, discussion pages, or user pages).

Taking a cue from Archive Team, “we are going to rescue your shit.” For the betterment of culture and research, you can find a link to a .zip that contains all 9401 .txt files, the archive of Encyclopediae Dramatica, below.

http://webecology.net/ED_archive.zip

ChatRoulette

alexleavitt — Mon, 01 Mar 2010 14:00:07 +0000

An Initial Survey

by Alex Leavitt & Tim Hwang

with Patrick Davison, Mike Edwards, Devin Gaffney, Sam Gilbert, Erhardt Graeff, Jennifer Jacobs, Dan Luxemburg, Kunal Patel, Mike Rugnetta, & Karina van Schaardenburg

This paper represents an initial study of ChatRoulette.com, conducted between February 6th and 7th, 2010 by researchers in attendance at Web Ecology Camp III in Brooklyn, NY. We sampled 201 ChatRoulette sessions, noting characteristics such as group size and gender. We also conducted 30 brief interviews with users to inquire about their age, location, and frequency of ChatRoulette use.

Summary

â€¢ ChatRoulette represents an example of a probabilistic community: a community shaped by a platform which mediates the encounters between its users by eliminating lasting connections between them.

â€¢ After ChatRoulette users become more acquainted with the system (ie., do not browse solely to explore), we predict a decrease in explicit content, an increase in the consolidation of content genres, and an increase in the formation of celebrity figures.

â€¢ Our survey shows that ChatRouletteâ€™s current community continues to consist of males age 18-24, concurrent with Alexa data.

You can download our report here.

The Influentials

Web Ecology Research — Wed, 02 Sep 2009 16:20:09 +0000

New Approaches for Analyzing Influence on Twitter

By Alex Leavitt
with Evan Burchard, David Fisher, & Sam Gilbert

Using a new methodology based on the content and responses of 12 popular users, we determined measurements of relative influence on Twitter. We examined an ecosystem of 134,654 tweets, 15,866,629 followers, and 899,773 followees, and in response to the 2,143 tweets generated by these 12 users over a 10-day period, we collected 90,130 responses published by other users.

Summary of Findings

An analysis of our methodology and statistics suggests that on Twitter, among various configurable conclusions:

mashable is more influential than CNN.
sockington is more influential than MCHammer, while MCHammer is more influential than three major social media analysts (garyvee, Scobleizer, and chrisbrogan).
Celebrities with higher follower totals (eg., THE_REAL_SHAQ and ijustine) foster more conversation than provide retweetable content.
News outlets, regardless of follower count, influence large amounts of followers to republish their content to other users.

Click to expand image. A larger version with more temporal depth is linked at the bottom of this report.

We would also like to thank Jon Beilin, Mac Cowell, and Tim Hwang for their invaluable contributions, feedback, and support.

The Influentials (pdf)

10 Days of Influence Tracked by Density of Responses (2993.27 KB jpg)

The Folly of Following Followers: Judging Influence on Twitter

As a simple online platform for conversation, Twitter is an ideal an ecological system through which we can understand the relationship between users and their environments on the Web. Especially compared to other social networks, Twitter simplifies most of the extraneous features and boils down its environment to people and content. The unusual simplicity of Twitter, though, continues to warp perception of how the relationship between user and platform operates. Many of the popularized studies examining influence on Twitter fail to identify the nuances of social interaction in the system. While attempts have been made (eg., http://twinfluence.com/about.php), the analyses tend to focus on the connections between users rather than the relationship of users, content, and platform. This report therefore aims to supplement previous investigations of the Twitter environment with more comprehensive data sets to enhance new approaches to understanding the concept of “influence” on social networks.

A focus solely on the connections between users skews an understanding of how influence operates and flows on Twitter. A popular metric of perceived influence on Twitter measures the quantity of a user’s followers. In general, the more followers a user possess, the more impact he appears to make in the Twitter environment, because he seems more popular (namely, that users follow him). This statement makes sense assuming that Twitter acts as a successful broadcast medium, where a user publishes a tweet and it is read by every follower. However, this view of Twitter as a broadcast medium ignores the potential for users to interact with the content on the platform.

A similar and equally popular metric to measure influence on Twitter relies on the ratio between the number of a user’s followers and the number of other people that the user follows (his audience, or as we designate in this report, followees). This ratio, while better than the former method of counting followers, is still imprecise. Again, a ratio based on audience ignores the ability for a user to interact with content on the platform. However, the ratio of followers to followees does inform a better understanding of how influence can operate in Twitter’s environment.

The ratio of followers to followees may communicate the intended purpose or emergent practices of a user. For example, if the ratio approaches infinity (high follower total versus low followee total), the user account might be described as focusing on the material aspect of Twitter. By material, we mean a compulsion toward moving content to other users in the environment. In another instance, if the ratio approaches 1 (an equal or near-equal amount of followers and followees), the user might be categorized as a conversationalist. The user most likely follows back a majority of his followers, to retain familiarity with more personal conversations. Contrarily, the materialistic user aims to collect followers as contacts to whom the user may push content (who may then share the same content with other users). Finally, if the ratio approaches zero (low follower total versus high followee total), we might categorize the user as a spammer. As an emergent behavior, the stereotypical spammer attempts to collect users with the intent to push content to as many people as possible after achieving a high follower tally. However, most contemporary users can spot the stereotypical behavior of a spammer or bot, resulting in the low follower total on the spammer’s account.

While the follower to followee ratio does not represent an accurate measurement of influence on Twitter, the ratio does inform the community to types of users. Before we apply these types to our understanding of online influence, we must first define influence.

Defining Influence on Twitter

An attempt to define a universal concept of influence on the Web remains difficult, because we must account for the variations of platforms, fluidity of environments, and evolving behaviors of users online. Because each platform is different, this report will rely on a definition of online influence specific to the environment of Twitter. Therefore, we define influence on Twitter as the potential of an action of a user to initiate a further action by another user. The term user is defined by Twitter’s platform. The term action deserves further explanation.

Understanding the term action as it relates to influence on Twitter depends on the fundamental structure of ideas in the environment and how these ideas move. The fundamental unit of content on Twitter is the tweet (a user may type up to 140 characters and publish them to the web interface), so an action on Twitter comprises all interactions of a user and that unit of content (tweet). While we can analyze various types of influential actions (eg., a view on YouTube or a like on Facebook), this report will primarily focus on actions specific to Twitter. Our analysis of influence on Twitter, then, relies on the understanding of how actions shape behavior on the platform.

Influence as Actions; Actions as Responses

While actions on Twitter comprise both those interactions recognized by the platform as well as unexpected emergent behaviors that become widely used by users, Twitter recognizes two actions intrinsic to the system that can occur: the reply and the retweet.

Reply: @username {content}
Example:
@chrisbrogan Thanks for this. I’m new to twitter and it was really helpful
Digitaltonto (on 2009-08-15 at 00:47:17)

Retweet: RT @username {content}
RT @aplusk great article thank U RT @Morgan_Johnston: this great article on health care by Whole Foods cofounder/CEO
cheerok (on 2009-08-15 at 00:31:10)

The reply and retweet are categorized as actions because they are applied by a user to a piece of content. The reply acts as a response to another user’s tweet using new content, while the retweet operates as a citation or paraphrase of another user’s previous content. While both actions have different purposes, both are meant to move content to other users (albeit in differing ways). If a reply or retweet exists with respect to a given tweet, the actions are evidence for influence that has occurred. A reply occurs because a user is influenced to reply to the content; a retweet occurs because a user is influenced to reproduce the content. Literally, the actions are markers of influence.

Two other actions that appear frequently on Twitter, extrinsic to the system yet popular enough to have become adopted by users, require explanation: the mention and the attribution.

Mention: {content} @username ({content})
Watching @BarackObama speak in Colorado on @CNN
RareAir24 (on 2009-08-15 at 19:08:51)

Attribution: {content} via @username ({content})
Fire at Kuwaiti wedding kills dozens, official media says http://bit.ly/wn95A (via @cnnbrk)
ChilliGaz (on 2009-08-15 at 19:40:18)

Similar to the reply and the retweet, the mention and the attribution are categorized as actions because they too are applied by a user to a piece of content. We have separated the mention and the attribution from the more fundamental reply and retweet because the former two actions are not officially recognized by the Twitter platform. In fact, a mention is similar to a reply, except a mention occurs at some point in the tweet other than at the beginning. Comparably, an attribution is similar to a retweet, except an attribution borrows the symbology of the reply to provide a citation for previously published content. We must also note here that, first, while we distinguish the attribution from the mention, we have calculated them from the same database query. Any measurement in this report of mentions also encapsulates attributions; however, we will distinguish the attribution as separate from the mention later in the paper (by tallying it alongside retweets in certain equations). Second, since mentions theoretically serve the purpose of replies, and attributions the purpose of retweets, we have not expounded upon their use in the explanation of influence in the following paragraphs. However, we can hypothesize that the applications of replies include mentions and the applications of retweets include attributions.

Categorizing Actions: Conversation & Content

In the second-to-previous paragraph, we hint at a similar categorization for actions that we previously applied to users. Given two probable types of users, one focused on conversation and another on content, we can map these classifications to actions — replies and retweets, respectively — to explain how the relationship between users of and the actions on a platform shapes influence on Twitter. The purpose of replies assumes that a conversation is the intended goal of the action. In writing a reply, the user has been influenced to respond to a previous unit of content published by another user. Similarly, with a retweet (the objective of which is to push content), the user has been influenced by a previous user’s content to reproduce the content for other users to view. In basic terms, we can see the reply as talking back to the first user and the retweet as passing on content to a third user. However, when assigning values of influence to these types of actions, we do not give preference to one or the other.

Previously, we examined two possible approaches to measuring influence on Twitter: 1) counting the total number of followers a user possesses, and 2) calculating the ratio of a user’s followers to a user’s followees. These two approaches still ignore the relationship between the user, the content, and the platform. The goal of this report is to move beyond these basic assertions about influence by analyzing a comprehensive set of replies, retweets, and other actions on Twitter that act as evidence for the influential potential of users.

Understanding Influence with New Data

For this report, we gathered relevant data from 12 Twitter users for 10 days, between 12:00 am 15 August 2009 and 12:00 am 25 August 2009. We focused on a small number of celebrities, news outlets, and social media analysts widely perceived to be among the more influential users on Twitter. Based on the content and connections of these 12 users, we examined a total of 134,654 tweets, 15,866,629 followers, and 899,773 followees. In response to the 2,143 tweets generated by these 12 users of the 10 day period, we collected 90,130 responses (actions) published by other users (which equates to 87,987 more messages than total original tweets, or a total average of 42 responses per tweet).

We have listed the 12 users below, categorized into three distinct groups that we feel ultimately represent the user types previously discussed. We have also calculated the total number of tweets published by each user, the total number of each users’ followers, and the total number of users that each of our 12 users follows. These statistics were updated between 28 August 2009 and 30 August 2009, so they may not necessarily reflect the exact number of tweets, followers, and followees present during the 10-day window that our data encompasses.

Celebrities	Username	Tweets	Followers	Followees

Ashton Kutcher	aplusk	3,205	3,407,385	209
Shaquille O’Neil	THE_REAL_SHAQ	2,072	2,092,541	562
Stanley Kirk Burrell	MCHammer	6,016	1,331,797	31,202
Sockington	sockington	5,711	1,089,984	380
Justine Ezarik	ijustine	7,718	605,441	3,039

News Outlets	Username	Tweets	Followers	Followees

CNN Breaking News	cnnbrk	1,096	2,712,530	18
BarackObama.com	BarackObama	330	2,018,016	761,851
Mashable.com	mashable	17,914	1,363,510	1,925
CNN	cnn	11,607	193,625	50

Social Media Analysts	Username	Tweets	Followers	Followees

Gary Vaynerchuk	garyvee	7,532	862,790	9,683
Chris Brogan	chrisbrogan	48,341	94,715	88,431
Robert Scoble	Scobleizer	23,112	94,295	2,423

The above table has been arranged in decreasing order by total followers, based on the three distinct categories of users. These categories reveal certain resemblances to aspects of content user types and conversation user types. Generally, news outlets aim to push content, social media analysts strive to perpetuate conversations, and celebrities tend to do both (dependent on their personal practices and the community who follow them). While there are some anomalies (eg., BarackObama), most news outlets have a higher follower to followee ratio (materialistic) while most analysts have a more-equal follower-to-followee ratio (conversationalist). For celebrities, the ratio appears to favor a materialistic purpose on Twitter, but the responses generated by celebrities favor the conversationalist type.

In the graph below, we present a comprehensive diagram of total follow count, to reemphasize the perceived influence that each user projects. Keep in mind that although Robert Scoble (Scobleizer, ranked 12th) appears unimportant compared to Ashton Kutcher (aplusk, ranked 1st), Scoble still retains a high level of perceived influence across the entirety of Twitter, since his total number of followers amounts to over 94,000 (compared to many users that have between 50 and 1,000 followers).

Influence According to Audience Response

Followers, as stated before, cannot account for a reliable measurement of influence on Twitter. Instead, we must take into account the markers of influence — replies, retweets, mentions, and attributions — to inform which user holds more sway over his followers. The graph below measures the percentage of replies, retweets, and mentions per user, based on the total number of responses respective to each user.

Of course, the graph above does not visually portray an accurate instance of influence, because the values are not weighted. Instead, the graph illustrates the relationship between responses by each user’s follower network. Therefore, to further examine the effects that followers have on influence, we present the following two graphs that measure the average number of responses in relation to followers.

In the following diagrams, we have utilized the concepts of content and conversation to create equations for calculating new measurements of influence. We have defined conversation-related responses as the total number of replies added to the total number of mentions (@r+@m), and we have defined content-related responses as the total number of retweets added to the total number of attributions (@RT+@via). The graphs below utilize the equations “content/followers” and “conversation/followers” to illustrate the average number of responses per follower of each of the 12 designated users.

The two graphs above present an interesting theory, in that the social media analysts appear to dominate both realms of content and conversation, thanks to their follower network. CNN and Mashable.com also appear high on the list of users that are able to interact well with their followers as well as push content easily to others.

While the above diagrams suggest that a user’s audience impacts how ideas move around said user to a large extent, these graphs do not take into account the tweets created by our 12 users, especially in relation to the responses the tweets generate. Returning to the graph representing the percentage of all responses, this illustration of influence is not entirely accurate because it does not account for the relative amount of content produced. This is especially important since the original tweets are the influencers that inspire replies, retweets, etc. Below, we present the same percentages of responses in a graph that weighs the comparison of responses against the total number of responses of other users.

The weighted graph above illustrates a significantly different measurement of influence than the previous diagram. If we were to state that influence is dictated by how many responses are generated, then we could certainly argue that Mashable.com is more influential than CNN Breaking News — a bold statement, especially when more than twice as many users follow cnnbrk than follow mashable. However, the weighted response statistics above must be compared to the amount of original tweets that inspired response. We have provided these statistics in the graph below:

The relationship between the original tweet and any subsequent responses certainly matters. For example, even though mashable and aplusk boast similar amounts of reactions (with a difference of 1620 in favor of mashable), mashable originated more than 2.5 times as many original tweets to influence those responses. Therefore, aplusk exerted less effort to achieve near-similar success. Similarly, BarackObama genereated more than 3 times as many responses in the ten-day period than did MCHammer; however, MCHammer originated over 8 times as many original tweets, meaning that the much larger effort he exerted was ultimately not as influential as the effort by BarackObama.

We have addressed the problematic relationship of original tweets and responses by averaging the statistics in the graphs below. The graphs utilize the equations “conversation/tweets” (@r+@m/tweets) and “content/tweets” (@RT+@via/tweets):

The measurement of influence reflected in these graphs most likely approaches the most accurate estimation of influence detailed in this report. To affirm this statement, we must return to our Twitter-specific definition of influence online: the potential of an action of a user to initiate a further action by another user. The two graphs above account for the responses (further actions) in relation to original tweets (actions with potential), while still theoretically accounting for the size of each user’s audience. Still, these graphs do not account for the network of the 12 users’ followers, and as such remain significantly different from the previous graphs depicting average response per 1,000 followers. The optimal situation of maximum influence would account for the most followers possible executing the most actions. However, it is entirely possible that one follower published all of the responses for a given user.

What, therefore, do the discrepencies between original tweets and followers tell us about the data? In the previous follower graphs, social media analysts held most of the top ranks. Contrarily, in the tweet graphs, they make up the last three spots in both graphs. On average, the data suggest that social media analysts receive minimal reward for the effort they exert in maintaining a conversation with their followers. For those users that succeed, most news outlets were more successful at having their content pushed to other users. Celebrities, on the other hand, appear to inspire conversational responses with their followers, yet with more success than the analysts.

These graphs suggest many statements based on various relationships of users, data, and platform. However, although the graphs above represent relative influence among the 12 users, by no means do these diagrams suggest that those ranked last are not influential. For the most part, a general user on Twitter tends to depend heavily on perceived influence, whether it be total number of followers or the ratio of followers to followees. This report, though, attempts to move beyond simple assertions of influence to create a better study of influence on Twitter, supported by new approaches and quantitative data.

Future Approaches for Influence Analysis

This report strives to influence other researchers to pursue influence analysis based not solely on followers but also on the relationship between followers and content, and the interaction of both in Twitter’s system. Although we analyze how actions (responses to a user) represent the influence of a user, our study is limited by sample size, time range, and the ability to collect data. For instance, we hope in the future to develop a more complex algorithm that accounts for the combined influence of both followers and responses. We were not able to calculate user growth rate nor measure the number of responses per exact original tweet. Also, given that this report studies influence on Twitter, we cannot account for any external influence with respect to each user in our sample.

Though we admit our limitations, along with this report we are publishing a comprehensive visualization that marks each original tweet and each response (reply, retweet, and mention) along our 10-day timeline. The graph specifically shows density as a factor of influence over time for the 2,143 original tweets and 90,130 responses related to our dozen users. While our graph does not provides labels for tweet, time, etc., we encourage individual exploration of the data presented in the visualization.

The density of data varies considerably per user and per tweet. While we cannot assign each reply, retweet, and mention to a specific original tweet, we can at least determine certain patterns of density per any given tweet. The two excerpts above reflect the difference in density of responses that a certain tweet might generate. By tracking the density of responses over time, we hope to inspire further research into models of influence and web ecology as a whole.

Click to expand. Warning: image is 2993.27 KB in size.

Detecting Sadness in 140 Characters:

Web Ecology Research — Tue, 18 Aug 2009 14:01:11 +0000

Sentiment Analysis and Mourning
Michael Jackson on Twitter

By Elsa Kim and Sam Gilbert
with Michael J. Edwards and Erhardt Graeff

Michael Jackson’s death created an emotional outpouring of unprecedented magnitude on Twitter. In this report, we examine 1,860,427 tweets about Jacksonâ€™s death in order to test various methods of sentiment analysis and gain insights into how people express emotion on Twitter.

Key findings

At its peak, the conversation about Michael Jacksonâ€™s death on Twitter proceeded at a rate of 78 tweets per second.
Users tweeting about Jacksonâ€™s death tend to use far more words associated with negative emotions than are found in â€˜everydayâ€™ tweets.
Roughly 3/4 of tweets about Jacksonâ€™s death that use the word â€œsadâ€ actually express sadness, suggesting that sentiment analysis based on word usage is fairly accurate.
That said, there is extensive disagreement between human coders about the emotional content of tweets, even for emotions that we might expect would be clear (like sadness).
Tweets expressing personal, emotional sadness about the Jacksonâ€™s death showed strong agreement among coders while commentary on the auxiliary social effects of Jackson’s death showed strong disagreement.
We argue that this pattern in the “understandability” of certain types of communication across Twitter is due to the way the platform structures the expression of its users.

We would like to thank Jonathan Beilin, Evan Burchard, David Fisher, Tim Hwang, Alex Leavitt, Dharmishta Rood, Max van Kleek, Jue Wang, and Seth Woodworth for their invaluable feedback and support.

Detecting Sadness in 140 Characters (pdf), Appendices (pdf)

1. Introduction

On June 25, 2009, news reports announced the death of Michael Jackson, leading to a flood of reactions on Twitter. From 9pmâ€”10pm EDT alone, there were over 279,000 tweets about Michael Jackson, or roughly 78 tweets per second (See graph above). What can be said about this massive body of tweets? What sorts of emotions did people express about Michael Jacksonâ€™s death?

Michael Jacksonâ€™s death provided occasion for a large wave of digital mourningâ€”that is, the expression of grief online, usually coordinated via a common method or localized to a particular webpage. The latter type of mourning has become popular practice on social networking sites such as MySpace and Facebook, where the profile of the individual who has died is transformed into a digital memorial onto which friends and family leave last goodbyes and testaments.

After Michael Jackson’s death, common digital mourning practices emerged on a variety of platforms. Testimonials and goodbyes poured into Michael Jackson’s Myspace page, Facebook saw a similar influx of grievers on Jacksonâ€™s main fan page and in newly created groups. The outpouring of tweets about Michael Jackson contains many similar expressions of grief, but as of yet there has been no research about digital mourning on Twitter.

The body of tweets about Michael Jacksonâ€™s death also offers an opportunity to explore strategies for sentiment analysisâ€”the process of determining the attitude of a speaker or speakers towards a particular topic in a large corpus of text. Because of its 140 character limit on messages and the social mores of the platform, Twitter offers challenges to the natural language processing and statistics-based techniques typically used to analyze sentiment.

This report represents a step towards understanding digital mourning and analyzing sentiment on Twitter. After describing our data, this report presents the results of an analysis of sentiment words in that data and findings from hand-coding tweets about Michael Jackson. This closer look at tweets about Jacksonâ€™s death provides insights into digital mourning practices on Twitter, assesses the validity of our first attempt at sentiment analysis by zeroing in on a word important to that analysis, and gauges the feasibility of doing larger scale sentiment studies in the future.

2. Description of the dataset

For this project, we made use of a dataset of 2,331,066 tweets about celebrity deaths (rumored or actual) collected for reasons that go beyond the scope of this report. These tweets were posted to Twitter between June 24 at 12:37am EDT (the day before Jacksonâ€™s death) and July 6 at 6:48pm EDT and were collected from Twitterâ€™s search API using the following search terms:

MJ
Michael Jackson
Jackson
Farrah
Fawcett
Jill Munroe
Micheal (a very common misspelling)
Goldblum
Billy Mays

From this dataset of tweets, we worked with the 1,860,427 tweets that contain â€œmjâ€ or â€œmichaelâ€ or â€œjacksonâ€ for this particular report. Because we do not yet have a reliable mechanism for filtering tweets by language, this set contains a small portion of non-english tweets; these tweets are excluded in the analysis that follows.

We also isolated those 44,383 tweets in this set that contained the word â€œsad.â€ In addition to analyzing this set of tweets using the ANEW dataset, described below, we randomly selectedÂ 346 tweets for human coding.

3. ANEW Analysis

The Affective Norms for English Words (ANEW) dataset contains normative emotional ratings for 1034 English words. Each word in the dataset is associated with a rating of 1â€“9 along each of three dimensions of emotional affect: valence (pleasure vs. displeasure), arousal (excitement vs. calmness), and dominance (strength vs. weakness) (Bradley & Lang, 1999).

We used this set to conduct sentiment analyses on large sets of tweets by looking at the usage of ANEW words within those tweets. For each analysis, average valence, arousal, and dominance ratings are calculated by determining the frequency of each ANEW word within the set and calculating the average ratings of the ANEW words weighted by this frequency. Similar ANEW analysis has proven useful in other online contexts (Dodds & Danforth, 2009), but has yet to be done with Twitter.

In analyzing the set of 1,860,427 tweets about Michael Jacksonâ€™s death, we found 849,603 instances of an ANEW word being used, and these 849,603 â€˜hitsâ€™ contained the following average ratings:

Valence: 5.713
Arousal: 5.243
Dominance: 5.175

To give these numbers a point of comparison, we ran the same analysis on two different random samples of 1,860,427 â€˜everydayâ€™ tweets, pulled from Twitterâ€™s streaming API between June 8 and June 23, 2009.

Sample 1:
675,137 hits
Valence: 6.350
Arousal: 5.256
Dominance: 5.559

Sample 2:
676,846 hits
Valence: 6.351
Arousal: 5.257
Dominance: 5.60

Given the remarkably similar hit counts and ratings observed between the two random baseline sets, we understand the differences between these baseline tweets and the tweets about Michael Jackson to be significant. In particular, the sizable difference in average affective valence ratings between the sets (~.64) suggests that those users tweeting about Michael Jackson are collectively choosing words in their tweets that expressed negative emotions, as would be expected from digital mourners.

The goal of sentiment analysis, however, is not to learn what words people on twitter are using, but to gain insight into how people are feeling. Can we reasonably infer from the low valence score of our set of Michael Jackson tweets that the people who created these tweets are less happy than normal?

4. Human Coding of â€œSadâ€ Tweets

To better understand the significance of our ANEW analysis, which applies independent ratings of emotion to the words used in a set of tweets, we decided to zero in on a particular ANEW wordâ€”â€œsadâ€â€”to see how it is used.

Within the ANEW dataset, â€œsadâ€ has a very low valence (1.61), and it appears 53,300 times in our set of Michael Jackson tweets, roughly 16 times more often than it appears in our random samples of tweets. As compared to all Michael Jackson tweets, which had an average valence of 5.713, these â€œsadâ€ tweets have an average valence of 3.317. Use of the word â€œsadâ€ appears to be an important reason why the average valence of the Michael Jackson tweets is lower than that of the baseline tweets. By looking at the use of this word, we can better understand what the ANEW analysis method can and cannot tell us about sentiment on twitter.

We hand-coded a set of 346 â€œsadâ€ tweets to see if usage of that word within our set of tweets aligns with the valence rating ascribed to sad within the ANEW dataset. If people tweeting the word â€œsadâ€ were indeed expressing sadness, it would suggest that our ANEW analysis is giving us reliable knowledge about the emotional state of the Michael Jackson tweeters.

4.1 Rating Methods

For each of these 346 â€œsadâ€ tweets, each of our 6 raters determined whether or not the person who had created the tweets was expressing sadness. Raters were told to give each tweet one of four nominal ratings:

â€œYâ€ â€“ yes; the person who created this tweet is expressing sadness
â€œNâ€ â€“ no; the person who created this tweet is not expressing sadness
â€œMâ€ â€“ mixed; the person who created this tweet expresses sadness as well as another conflicting emotion
â€œUâ€ â€“ unclear; the tweet in question is spam, is not in English, or is otherwise impossible to interpret with respect to sentiment.

Beyond giving these directions, we did not do any training of our raters; over the course of coding, we did, however, remind raters several time of the criteria mentioned above (for example several raters needed to be reminded that if you think the tweet is spam, mark â€œUâ€).

4.2 Rating Results

Of 346 tweets containing “sad,” raters, while not necessarily agreeing on any given tweet, reported on average that 271.83 (74.68%) tweets expressed sadness, and there were 222 tweets (64.16%) that all 6 raters judged as expressing sadness.

Raters reported on average that 28.33 (7.78%) tweets did not express sadness, 20.67 (5.68%) tweets expressed mixed emotion, and 25.17 (6.91%) tweets were unclear. There were 6 (1.73%) tweets that all raters reported as not expressing sadness, 7 (2.02%) tweets that all reported as unclear, and no tweets that all raters reported as expressing mixed emotions (See graphs for a summary of these results).

As part of the rating process, coders also highlighted certain tweets that they found interesting or difficult to interpret; these tweets are illustrative of the types of disagreements observed across coders. In addition to discussing these tweets below, Appendix A lists some of these tweets, arranged according to decreasing levels of agreement, and Appendix B lists some particularly illustrative tweets sorted by type.

4.3 Measures of Inter-rater Agreement

As the above results suggest, there was far from perfect consensus among raters interpreting tweets. All 6 raters agreed on only 235 (67.92%) tweets, and at least 5/6 raters agreed on 284 (82.08%) tweets.

Given that there was a relatively large group of raters and a number of categories to choose from, this level of consensus might seem acceptable. However, one must take into account the prevalence of “yes” ratings; with such a large majority of tweets falling under one code, one should expect higher levels of consensus (Sim & Wright, 2005).

In order to better measure how reliably our raters interpreted sentiment in tweets, we calculated Fleiss’ Kappa Îº, a measure of inter-rater reliability well-suited to our coding procedure (Fleiss, 1971). Like other Kappas, this method accounts for random agreement, essentially comparing the amount of agreement seen among coders (defined as an average of every tweetâ€™s P-value, a measure from 0 to 1 of the variation in each tweetâ€™s ratings) to the agreement one would see in a random distribution of ratings. For this set of ratings, we found a Îº of .561; while there are not clear standards for what is considered an acceptable kappa, .7 and above typically suggests strong agreement (Fleiss, 1971).

4.4 The Shifting Contextual Definition of â€œSadâ€

This low level of agreement between coders suggests that even though there are a sizeable number of tweets that clearly express sadness, there is a lot of difficulty in interpreting emotion on Twitter. When we sorted the tweets by their P-value (which range from 1, representing complete agreement, to 0, representing complete disagreement), we found differences between tweets that had varying levels of agreement. From total agreement to near-complete disagreement, tweets varied in type from expressing personal or objective sadness to offering commentary on the auxiliary societal effects of the death, such as the media frenzy. Generally, it was easier for coders to agree on personal declarations of sadness than on instances where â€œsadâ€ was used to describe a circumstance tangential to the death.

Those tweets with a P-value of 1 generally expressed sadness. These tweets ranged from the calm, equivocal statements of sadness to the hyper-emotional. A calm, sad example was:

â€œMichael Jackson’s death is a sad loss…thoughts and prays go out to his family.â€

Note that this tweet displays both an emotional reaction and objective reportage of the social situation. A hyper-emotional one read:

“Michael Jackson Died!! whatt??? im saddened…deeply sad :(â€œ

There were also tweets that combined emotion and objective reportage on the events of the tweeterâ€™s life, including:

“Feeding the baby and feeling sad about Michael Jackson! He left is too soon!â€

and

“Shocked by Michael Jackson’s death.Â Such a sad, sad day.Â Going out for a couple of sales calls, late.â€

This combination of life status update and emotional update leads to consensus among the coders, perhaps because the accompanying life status update helps clarify that the tweeter is not being sarcastic.

Those tweets with lower P-values more often include different uses of the word â€œsadâ€, suggesting that these other types of tweets are more difficult for coders to reliably interpret. At a P-value of .6667, tweets started to include commentary on the death, often of a moral nature, for example:

“sick of hearing about michael jackson now sad yes end of world no and he was no saint people need to remember thatâ€

At a P-value of .46667, tweets began to express frustration at the media frenzy. According to these users, Jacksonâ€™s death was certainly something to be acknowledged and even honored, but it was inappropriate and bothersome for the media to focus on it so heavily. For example:

“@AnnCurry I agree – enough of Michael Jackson. Sad, but . . . others have died, too, but now ignored, thanks to MIchael.â€

At a P-value of .4, one sees more instances of personal commentary, that is, observations about the self that are tangentially related to Jacksonâ€™s death. Examples include:

“Sadd… i love Michael Jackson…!! rest in peace… my mom better buy me a MJ T-shirt……â€

“TMZ.com claims that Michael Jackson is dead, but his Wikipedia page has yet to be updated. How sad is it that I went to Wikipedia?â€

In this latter instance, it is difficult to tell whether or not the tweeter was sad about the death of Jackson at all.

As the P-value decreased to .2 and .2666 and finally .1333, the tweets included confusing grammar, commentary such as:

“Celebrity triple – Ed McMahon, Farah F and MJ – despite the fame, not one of them died in peace – broke and feuding with family – sad…â€

and of course, the appearance of what seemed to be spam:

“RT @bowlsey @JamieC: Very sad about Michael Jackson. HABITAT – for all your furniture needs – habitat.co.uk.â€

At the two levels of highest disagreement, humor was introduced as well. For example:

“Michael Jackson, Billy Mays, and now XHTML 2â€”so very, very sad…â€

The tweets with the least agreement do not report specifically on Michael Jacksonâ€™s death. They volleyed back and forth between mourning Farrah Fawcett and Jackson:

“Who’da thunk that today would be the day that Michael Jackson died? It feels fake.Â I’m SO sad about Farrah Fawcett. Such a surreal day…â€

or commented on Jacksonâ€™s death as a phenomenon that impacted society:

“Saddened and unsurprised watching the prices change on Michael Jackson CDs in second hand shops.â€

5. Discussion and Further Research

At the outset of this study, ANEW analysis revealed a significant difference in the valence values between an average day of tweets and those tweets about Michael Jacksonâ€™s death. But these values do not necessarily correspond to a userâ€™s expressed emotions or explain the variation and nuance in human sadness. For this, we turned to human coders, asking them to rate tweets containing the word â€œsadâ€ as sad, not sad, mixed or unsure. Codersâ€™ ratings suggest that approximately 75% of tweets express sadness, giving credence to the ANEW analysis.

These results indicate that the ANEW dataset is a promising tool for sentiment analysis on Twitter. Having proven useful in this pioneering analysis, ANEW should now be applied to a variety of different samples of tweetsâ€”a larger set of analyses will give us a better sense of ANEWâ€™s strengths and weaknesses and provide a more robust set of referents for any given valence, arousal, or dominance rating.

Hand-coding the emotion in tweets will always provide a more nuanced picture than analysis with ANEW, however, because ANEW measures the presence of individual words instead of considering a wordâ€™s context. Unfortunately, comparing codersâ€™ ratings resulted in a Kappa value of .561, indicating that our hand-coders did not display a high rate of agreement. An important next step, then, is to attempt new rounds of coding with different parameters in hopes of better understanding what is achievable with such coding. If we are able to improve IRA for certain types of analysis, we may be able to perform large-scale human coding projects with tools like Amazon Mechanical Turk. Enough of this coding data could provide the basis for a training corpus with which to automate the process of detecting emotion by its context, instead of simply through individual words as ANEW does.

Developing advanced, pragmatic human or AI coding techniques will facilitate the data-gathering necessary to compare emotional content between platforms, being conscious of the varying constraints of those platforms. After additional studies, we hope to be able to identify which platforms a researcher should first examine in order to gain insight into how particular emotional, social, or psychological phenomena are articulated by different web ecosystems.

Through our hand-coding of tweets, we also developed a typology for tweets that contained the word â€œsad.â€ The further a tweet was from describing a personal emotional experience or the objective social experience of Michael Jacksonâ€™s death, the more difficult it was for our coders to pinpoint whether there was sadness expressed in the tweet or not.

This more careful analysis of tweets about Michael Jacksonâ€™s death paints a complex picture of digital mourning on Twitter. As a loosely organized messaging network, Twitter does not operate as a â€œmemorialâ€ akin to clearly delimited online spaces like Myspace and Facebook; as seen even within tweets that contain the word â€œsad,â€ Twitter seems to support a wide spectrum of reactions to Jacksonâ€™s death, some of which have little to do with mourning. Given the short-lived nature of data on Twitter (the tweets discussed here are no longer available in Twitterâ€™s search, which only goes back roughly a week), users appear more inclined to report Jacksonâ€™s death as a current event and less inclined to memorialize or collectively grieve. Furthermore, Twitter appears to be a far more â€˜personalâ€™ medium than other online spaces: tweeters tended to comment on sadness as individuals watching the public reaction instead of commiserating with particular friends or communities.

Appendix A: Sampling of Tweets ordered from most agreement to least

Coders highlighted these tweets as illustrative of the types of disagreements they saw around coding. Tweets are sorted by these types in Appendix B; see for further explanation

1, Complete agreement : tweets generally sad (statement made from observation, not from stats)

Michael Jackson’s death is a sad loss…thoughts and prays go out to his family.

Emotional + Objective news reportage

Wow. sick to my stomach. Rest in peace, Michael Jackson. So sad. he may’ve been accusof a lot, but he also helped a lot

Hyper-Emotional

Michael Jackson Died!! whatt??? im saddened…deeply sad :(

Hyper-Emotional

Sad, sad day. Still can’t believe Michael Jackson died

Emotional (disbelief)

Feeding the baby and feeling sad about Michael Jackson! He left is too soon!

Emotion + Objective self-reportage

Shocked by Michael Jackson’s death.Â Such a sad, sad day.Â Going out for a couple of sales calls, late.

Emotion + Objective self-reportage

is going to listen to 114 michael jackson songs …its a sad day

Emotion + Objective self-reportage

.66667, Some agreement: tweets generally offering commentary, often moral

I’m so sad about about Michael Jackson! I can’t even get on eonline, wtf?!

Emotion + Self-reportage

MJ and Farrah??? What is the world coming to??? Such a sad day in Hollywood!!! RIP to some of the greats :-(

Emotion

This is bad, real bad, Michael Jackson. Now I’m sad, real sad, all the jacksons….

Humor

It’s sad we lost Michael Jackson. But how many others die and we never hear of it? http://ow.ly/g10W

Commentary, media

I wonder if Murray contributed to Michael Jackson’s death through ineptitude. How sad that such a great star used an outcast doctor.

Commentary, MJâ€™s life

First Michael Jackson then Billy Mays…. what a sad week.

Commentary, possibly Humor

Am I the only not pretending to be sad about Michael Jackson? He was a child fucker…remember?

Commentary, Moral

is watching the rerun of Michael Jackson night on American Idol.Â Suddenly sad in a completely different way ;-)

Humor, sad used to mean â€œpitifulâ€

sick of hearing about michael jackson now sad yes end of world no and he was no saint people need to remember that

Commentary, Moral

HAHAHAHAHAHA MICHAEL JACKSON FINALLY DIED. i know its sad but my god he was a freak tehe that made my week

Commentary/Self-Reportage

OMG Michael Jackson guys! we talking about Michael fucking Jackson!! I am floored!! I mean michael jackson!!!! I’m hella sad!

Hyper-Emotional

.466666, Disagreement: tweets generally ranting

too caught up in wimbledon.. but still saddened by MJ’s passing..

Objective self-reportage + Emotion

@AnnCurry I agree – enough of Michael Jackson. Sad, but . . . others have died, too, but now ignored, thanks to MIchael.

Commentary, media

@JazzyClark For God Sake I Lke Michael Jackson And Everythink Andim sad hes dead but come on enough of the man !!! x :L

Emotion, frustration

i get mj’s death was tragic but does it have to be shown everywhere?

Emotion, frustration

.4, More Disagreement: tweets generally personal commentary

Sadd… i love Michael Jackson…!! rest in peace… my mom better buy me a MJ T-shirt……

Emotion + Personal Commentary

TMZ.com claims that Michael Jackson is dead, but his Wikipedia page has yet to be updated. How sad is it that I went to Wikipedia?

Personal Commentary, note use of sad as â€œpitifulâ€

.2, Greater Disagreement: tweets generally commentary

I wish MJ’s legacy wasnt tainted by lies. Its sad.

Commentary, sad means â€œpitifulâ€

sadd because michael jackson diessÂ : ( buhh lovess my baybee ohdee tehe111308

Confusing grammar

Celebrity triple – Ed McMahon, Farah F and MJ – despite the fame, not one of them died in peace – broke and feuding with family – sad…

Commentary, sad means â€œpitifulâ€

.2666, Severe Disagreement:

i’m sick and tired of hearing about MJ’s death, yes he died, that’s sad. Just leave the man alone already!

Emotion, rant/frustration

RT @bowlsey @JamieC: Very sad about Michael Jackson. HABITAT – for all your furniture needs – habitat.co.uk.

Spam

Michael Jackson, Billy Mays, and now XHTML 2â€”so very, very sad…

Humor

.133333, Least Agreement: tweets tend to report sadness that is not specifically a response to Michael Jacksonâ€™s death

3/4ths of everything on blip.fm right now are Michael Jackson songs. This one = great jam / sadly fitting. â™« http://blip.fm/~8vuad

Commentary, real-time events

Its sad how farrah has been overshadowed by MJ. She was just as great as him just i n a different career! R.I.P. FARRAH!!

Commentary, media

Who’da thunk that today would be the day that Michael Jackson died? It feels fake.Â I’m SO sad about Farrah Fawcett. Such a surreal day…

Emotion, multiple

Saddened and unsurprised watching the prices change on Michael Jackson CDs in second hand shops.

Commentary, real-time events

Appendix B: Typology of Tweets with examples

Objective: Reporting Sadness as news, part of updates on tweeterâ€™s life

It is a sad day

too caught up in wimbledon.. but still saddened by MJ’s passing..

Feeding the baby and feeling sad about Michael Jackson! He left is too soon!

Shocked by Michael Jackson’s death.Â Such a sad, sad day.Â Going out for a couple of sales calls, late.

Emotion: Simple expression of sadness

I am sadden by MJ’s death…

RIP Michael

Emotion: Personal sadness/extreme sadness

I’m devastated about Michael Jackson.What a sad day!!!

it sunk in..Â MJ is gone..Â as don lemmon put it, â€œMichael Jackson’s music is the soundtrack to my childhoodâ€..Â my life.Â i’m sad..

Emotion: Rant, expressing frustration at the media

i’m sick and tired of hearing about MJ’s death, yes he died, that’s sad. Just leave the man alone already!

i get mj’s death was tragic but does it have to be shown everywhere?

Commentary/Editorial: Regret

Itâ€™s so sad that he died so young

Commentary/Editorial: Chastising others for what appeared like forgiveness of his â€œsinsâ€; sad used as â€œpitiful.â€

So Michael Jackson died today . . . like i care.. I am more saddened about Farrah Fawcett’s death then a shiesty child molesters death…

Am I the only not pretending to be sad about Michael Jackson? He was a child fucker…remember?

I hope I’m not offending my friends for not being sad over MJ’s passing. I won’t be sad when OJ dies, either.

Humor: Making light of something about the event

Is watching the rerun of Michael Jackson night on American Idol.Â Suddenly sad in a completely different way ;-)

Michael Jackson, Billy Mays, and now XHTML 2â€”so very, very sad…

Sources Cited

Bradley, M.M., & Lang, P.J. (1999).Â Technical report C-1, Gainesville, FL. The Center for Research in Psychophysiology, University of Florida. Retrieved from http://www.uvm.edu/~pdodds/files/papers/others/1999/bradley1999a.pdf

Dodds, P.S. & Danforth, C.M. (2009). Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents. Journal of Happiness Studies. Retrieved August 16, 2009 from http://www.springerlink.com/content/757723154j4w726k/

Fleiss, J. L. (1971) Measuring nominal scale agreement among many raters. Psychological Bulletin , 76(5), 378â€“382
Sim, J., & Wright, C.C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257-268. Retrieved from http://www.physicaltherapyjournal.com/cgi/reprint/85/3/257

Reimagining Internet Studies:

Web Ecology Research — Mon, 10 Aug 2009 22:04:49 +0000

A Web Ecology Perspective

Like the web itself, the study of the web is mostly an improvised structure. A group of progressive scholars, swept up by the technological transformation of the past decade, have done their best to keep up with understanding the massive cultural and social effects of our communication infrastructure.

Not surprisingly, the inevitable outcome of this state of affairs is that the body of research about the web is fatally fragmented. Economists are caught attempting to assert dated models against new motivational frameworks. Journalists attempt to prescribe weak methods to maintain traditional standards around the creation and transfer of information. Marketers and social media experts, still largely divorced from a universe of quantitative and technical research, fail to provide a useful approach. No coherent body of research has emerged focusing on studying the internet as the internet.

This has resulted in fundamental weaknesses in the approach to studying social phenomena online. Relevant approaches are being ignored and opportunities for applying cutting edge research from a number of siloed traditions are going unexplored.

One such nascent carbon offsetting solution are the retail systems offered by CarbonClick.com, helping vendors offering a POS carbon offsetting options to consumers.

Our field poses two simple questions to researchers:

“Where have studies about the web failed?” and,
“How can we do better?”

The emerging field of Web Ecology is an attempt to unify contemporary research and practice under a common focus, set of principles, and general approach to promote new insights and more fruitful forms of exchange in this space. We believe that these lay the groundwork for a more vibrant, more dynamic, and more useful field of research and community of researchers.

Focus

Web Ecology studies the relationship of the nature of data and the behavior of actors on the internet.

By the nature of data, we mean the form and meaning of platforms and content.

Form is the structure and capabilities of a platform or piece of content.
Meaning is what is contained and conveyed by a platform or a piece of content.
A Platform is any system which contains content and dictates behaviors which can be taken within (a platform can be recursive).
Content is a discrete piece of media (that can be acted upon).

Comprising the behavior of actors, we define:

Behavior is a pattern of action, where even a single “action” like clicking a link may represent one step in a much larger set of actions.
And an Actor is an entity that performs, takes, or generates an action.

Principles

Comprehensiveness

Web Ecology seeks an understanding of the expansive interconnected ecology of social systems, and is agnostic with regard to specific networks and services.

Researchers studying the web tend to focus on networks as isolated environments, rather than as platforms which are elements of a larger whole. Web Ecology aims to examine the social dynamics powering these platforms. By identifying parallels between them and documenting interactions, Web Ecology strives to investigate the interrelationships across the expanse of all networks and the dynamics that power their platforms.

Interdependence

Web Ecology is aware of the holistic nature of the internet. It declares that code and users are part of an inseparable aggregate web phenomenon.

Users and code are often seen as separate entities worthy of independent study. Web Ecology views users and code as associated and dependent elements.

Boundedness

Web Ecology emphasizes that the web is constrained by various forces and configurations.

Rather than a utopian or deterministic perspective, Web Ecology recognizes that the web is not limitless nor truly divorceable from various geographic, social, historical, and other realities.

Significance

Web Ecology acknowledges that content on the web retains inherent value.

Social media researchers often discount memes, transient cultural fads, and similar content as nothing more than an amusing distraction. Web Ecology is not only concerned with “major world events” as translated to the web, but acknowledges that patterns, in general, are extremely important. While the day-to-day bustle of online communities may appear at first to be nothing more than noise, Web Ecology contends that a closer examination reveals a valuable picture of how culture lives on the web.

Pragmatism

Web Ecology seeks to develop a methodology for understanding the structure of the web in order to inform a further comprehension of platforms, content, and users.

Among others, the field of social computing has achieved remarkable advances during the past five years in analyzing group behavior online. However, these academic fields have largely opted to focus on the continuing optimization of these methods, rather than the use of them to tackle the questions of actual social behavior online. As a result, the range of applications identified for the existing toolbox is woefully incomplete. Web Ecology intends to move from a narrow focus around the methods of measuring the web to an active effort to report on it, just as broadly as we report on the weather.

Approach

Experimental

Web Ecology takes an experimental approach to research, testing hypotheses and theories to produce reproducible conclusions.

The web ecologist views the internet as a social lab, to be studied with an emphasis on empiricism while encompassing qualitative approaches. Recognizing too that the ecosystem of the web is an ever fluid and lively space, web ecologists strive to establish interactive, live-updating, dynamic metrics on the state of the web when possible.

Empirical

Web Ecology favors the systematic creation and testing of models by employing an empirical, data-driven approach to craft a body of knowledge that defines basic axioms and builds to general principles.

The web ecologist studies community and culture online as its own unique environment. Studies of the web that merely apply methods from other fields of research inevitably promote a fragmented, inconsistent body of knowledge. To that end, Web Ecology works from a set of initial assumptions and principles to define the elements that construct the web and the ways in which they interact to influence social phenomena. We assert that this approach will provide better outcomes in shaping a unique field of research that is optimized for exploring the web.

Accessible

Web Ecology endorses openness with regard to publication.

The aspiration of Web Ecology is to better understand the relationship between the nature of data and the behavior of actors. While this holds the possibility of engineering better, more vibrant communities, the private development of these insights also opens the door to the exploitation of individuals. Taking an ethical stance, web ecologists endorse a position of openness for reporting of research and data.

Diligent

Web Ecology promotes the general documentation of social data and development of archives.

The establishment of specific, dedicated archives benefits the maturation of related scholarship. Rich archives form the groundwork for the work of web ecologists to go forward. Accordingly, Web Ecology stresses efforts to curate data for potential analysis in this and related areas of research.

This statement was developed during Web Ecology Camp, July 24â€“26, and in subsequent discussions on the future of internet studies throughout the summer of 2009. Contributing scholars (in alphabetical order) were: Jonathan Beilin, Bill Bushey, Patrick Davison, Sam Gilbert, Erhardt Graeff, Tim Hwang, Sawyer Jackson, Elsa Kim, Alex Leavitt, AJ Mazur, Dharmishta Rood, Mike Rugnetta, Frank Tobia, and Seth Woodworth.

Reimagining Internet Studies (pdf)

The Iranian Election on Twitter:

Web Ecology Research — Fri, 26 Jun 2009 17:37:56 +0000

The First Eighteen Days

Key Findings

From 7 June 2009 until the time of publication
(26 June 2009), we have recorded 2,024,166
tweets about the election in Iran.
Approximately 480,000 users have contributed
to this conversation alone.
59.3% of users tweet just once, and these users
contribute 14.1% of the total number.
The top 10% of users in our study account for
65.5% of total tweets.
1 in 4 tweets about Iran is a retweet of another
userâ€™s content.

The Iranian Election on Twitter (pdf)

You may have notice that I Write a Lot, and you can learn how to write correctly at that link!

Alex Leavitt – Web Ecology Project

Archiving Internet Subculture: Encyclopedia Dramatica

ChatRoulette

The Influentials

The Influentials (pdf)

The Folly of Following Followers: Judging Influence on Twitter

Defining Influence on Twitter

Influence as Actions; Actions as Responses

Categorizing Actions: Conversation & Content

Understanding Influence with New Data

Influence According to Audience Response

Future Approaches for Influence Analysis

Detecting Sadness in 140 Characters:

Detecting Sadness in 140 Characters (pdf), Appendices (pdf)

1. Introduction

2. Description of the dataset

3. ANEW Analysis

4. Human Coding of â€œSadâ€ Tweets

4.1 Rating Methods

4.2 Rating Results

4.3 Measures of Inter-rater Agreement

4.4 The Shifting Contextual Definition of â€œSadâ€

5. Discussion and Further Research

Appendix A: Sampling of Tweets ordered from most agreement to least

Appendix B: Typology of Tweets with examples

Reimagining Internet Studies:

Focus

Principles

Comprehensiveness

Interdependence

Boundedness

Significance

Pragmatism

Approach

Experimental

Empirical

Accessible

Diligent

The Iranian Election on Twitter:

4. Human Coding of â€œSadâ€ Tweets

4.4 The Shifting Contextual Definition of â€œSadâ€