Quantitative Data – Web Ecology Project

The Revolutions Were Tweeted:

Erhardt Graeff — Tue, 04 Oct 2011 11:51:24 +0000

Information Flows During the 2011 Tunisian and Egyptian Revolution

By Gilad Lotan, Erhardt Graeff, Mike Ananny, Devin Gaffney, Ian Pearce, and danah boyd

Web Ecology goes peer-review! In a new International Journal of Communication article, Web Ecologists Erhardt Graeff, Devin Gaffney, and Ian Pearce collaborated with friends of Web Ecology Gilad Lotan, Mike Ananny, and danah boyd on an analysis of Twitter data from the Arab Spring. Here is the abstract:

This article details the networked production and dissemination of news on Twitter during snapshots of the 2011 Tunisian and Egyptian Revolutions as seen through information flowsâ€”sets of near-duplicate tweetsâ€”across activists, bloggers, journalists, mainstream media outlets, and other engaged participants. We differentiate between these user types and analyze patterns of sourcing and routing information among them. We describe the symbiotic relationship between media outlets and individuals and the distinct roles particular user types appear to play. Using this analysis, we discuss how Twitter plays a key role in amplifying and spreading timely information across the globe.

You can download and read the full article (open access) in PDF format from the IJOC website: http://ijoc.org/ojs/index.php/ijoc/article/view/1246/613.

Lead author Gilad Lotan also produced an online data navigator to accompany the article: http://www.danah.org/projects/IJOC-ArabSpring/.

Sample Analytics You Can Create On 140Kit

Tim Hwang — Thu, 01 Jul 2010 18:32:06 +0000

User Follower Distribution

Account Creation Timeline

Retweet Networks

Afghanistan and its Election on Twitter: The Macro Picture

Web Ecology Research — Fri, 11 Sep 2009 05:53:54 +0000

Preview of an Upcoming WEP Report

By Erhardt Graeff
with Seth Woodworth

Data Summary

111,741 tweets about Afghanistan and its presidential election posted between August 11, 2009 and September 9, 2009
11,255 tweets on August 20, 2009, the day of the election
29,642 users talked about Afghanistan in our dataset
Top 10% of tweeters contributed 65% of tweets (same as Iran Election)
Number of retweets for a user was not correlated to their tweeting volume (same as Iran Election)
483 hashtags were used at least 3 times
No single, dominant hashtag (differs from Iran Election)
3 most used hashtags: #Afghan09, #Afghanistan, and #AfghanElection

Introduction

Afghan citizens went to the polls on August 20, 2009 after a controversial delay recommended by Afghanistan’s Independent Election Commission to allow ample time to prepare for fair and safe elections. Karzai was favored to win the election amid a large pool of contending candidates; the most serious challenge coming from former Foreign Minister of Afghanistan Abdullah Abdullah. In pre-election polling, Abdullah gained significant momentum as election day drew nearer and other candidates dropped their campaigns.

In a clear reference to the protests following the June presidential election in Iran, Abdullah’s campaign manager was quoted predicting street violence if Abdullah doesn’t win. Here at the Web Ecology Project, we wondered if Twitter would play as significant a role in reporting the election as it did in Iran. In a country where mobile phone subscriptions add up to an estimated 50% of the population, but internet access was roughly 1.5% at last estimate with the status of network expansion [pdf] unclear, could the available ICT infrastructure and awareness of social media prompted by the â€œtwitter revolutionâ€ in Iran enable a similar phenomenon post-August 20?

Tweets

We pulled tweets from Twitter containing 42 “English” search terms:

Zabihullah
“NATO Headquarters”
karzai
kandahar
jalalabad
kabul
herat
kunduz
khost
abdullah
abdulah
abdulla
abdula
taliban
taleban

“mullah omar”
mujahideen
mujahid
ghani
bashardost
mazari
khumri
ghazni
eikenberry
afghanelection
sabari
paktya
Haqqani
Dostum
Pajhwok

hazari
afghan09
parwan
paktika
lashkar
“puli khumri”
khowst
charikar
karzay
aliveinafghanistan
aliveinafghan
afpfail
garmser
garmsir

Using these search terms, we have archived 111,741 tweets posted between 11:00pm EDT on August 11 and 11:00pm EDT on September 9, including a complete set of 11,255 tweets from the day of the election. Currently, our dataset is noisy; we are aware of complications with the common name “Abdullah” in its different forms as well as the presence of the Taliban in Pakistan. We believe that about 10,000 tweets are affected by such irregularities while we improve our filter.

(We also compiled a list of Dari and Pashto search terms that correspond with our English search terms. These have yielded only 3389 tweets over the same time period, 33 of which were also picked up by the English equivalents. More work still needs to be done to prepare this corpus for analysis.)

Using the tools we first developed for our report The Iran Election on Twitter, we have generated a timeline of volume of tweets pulled using the English search terms over the four weeks of data collection; data points are at one hour increments [click for larger view].

Adjusted for time difference, August 20, election day, saw the most activity on twitter—over 10% of all tweets across our 28-day dataset. The spike on that day begins between 10:00pm and 11:00pm EDT (UTC-4:00) on August 19, which coincides with the opening of polls in Afghanistan at 7:00am AFT (UTC+4:30), August 20. The second largest spike on the graph occurs on August 15, which was the date the Taliban bombed the UN Headquarters in Kabul.

Users

Our dataset involves 29,642 users, making an aggregate rate of 3.76 tweets per user. Using a Lorenz curve [plotted below], we found a pattern of unequal distribution of Twitter activity across the user population that was similar to our findings for the Iran Election.

The steepness of the above curve illustrates that the top 10% of users contributed 65.3% of all tweets, which is almost identical to the distribution found for the Iran Election tweets (the top 10% of users contributed 65.5% of all tweets). Another finding similar to our Iran study is the disconnect between a user’s number of tweets and the number of times they were retweeted. (Our list of retweets only account for tweets that contained either upper or lower case forms of “RT”, with any form of punctuation, then followed by any number of spaces and @user.)

The top tweeters on Afghanistan are more heterogeneous in their affiliations than the the top retweeted users. A number of high profile news organizations, individual journalists, and official and semi-official military channels comprise the list of top retweeted users. Notable accounts are those of the Pajhwok Afghan News (@pajhwok) and the Alive in Afghanistan project (@aliveinafghan), as well as the latter’s founder Brian Conley (@BaghdadBrian) of Small World News (@smallworldnews). These accounts are the strongest “local” voices offering Afghan perspectives on events. In the same way that individuals with close affiliations in Iran were both prolific and influential sources of information, these represent similar sources for Afghanistan.

Hashtags

One significant difference between Iran and Afghanistan is the lack of a common hashtag like #iranelection. Although multiple top hashtags were often used in tandem, only recently has #Afghan09 shown more dominance. The pie chart below shows the usage of the most common hashtags as a percentage of all tweets containing a hashtag.

Out of the 483 different hashtags used at least three times, #Afghan09 was most prominent, first adopted by @aliveinafghan, @smallworldnews, and @BaghdadBrian. #Afghanistan (a non-unique hashtag) had the next highest frequency, followed by #AfghanElection, originally adopted by @pajhwok. Additionally, the extremely generic #news and support our troops hashtag #militarymon were used often. Important to note is that the #Afghan09

Next Steps for this Research

Although no protests have ensued, Twitter activity has been kept alive by regular reports of official vote counts and allegations of voter fraud—spiking a more consistent conversation about the war. The statistics in this post are based on all aggregated tweets captured using English search terms over 4 weeks worth of activity. The next step in this project is to analyze the evolution of the conversation and its key players over time. We have broken down the usage of hashtags, the volume of tweets per user, and retweets, per day. We will be studying how the ranks of the top users in terms of output and â€œinfluenceâ€ have changed since the election day. Beyond quantitative analysis, this effort will require a qualitative classification of users to better understand the nature of the user population (Daniel Bennett’s “Who to follow: Twitter for the Afghanistan election” offers a good starting point). We hope to analyze the Dari and Pashto tweets in the same manner.

The need for a more detailed study of the conversations on Twitter is exemplified by Stephen Colbert (@StephenAtHome) and his single tweet on the Afghan election, which received enough retweets to elevate him to the 38th most retweeted user in our dataset:

“despite rumors of voter fraud in afghanistan, it looks like it went smoothly for new afghan president-elect mahmoud ahmadinejad.”

Expect a full report on this soon, as well as a much anticipated follow-up to our Iran Election report after that! Please send us your feedback on the progress of this research and ideas for other ways to analyze and interpret these data.

We would like to thank Jon Beilin, Sam Gilbert, and Javed Rezayee for their continuing contributions, feedback, and support.

The Influentials

Web Ecology Research — Wed, 02 Sep 2009 16:20:09 +0000

New Approaches for Analyzing Influence on Twitter

By Alex Leavitt
with Evan Burchard, David Fisher, & Sam Gilbert

Using a new methodology based on the content and responses of 12 popular users, we determined measurements of relative influence on Twitter. We examined an ecosystem of 134,654 tweets, 15,866,629 followers, and 899,773 followees, and in response to the 2,143 tweets generated by these 12 users over a 10-day period, we collected 90,130 responses published by other users.

Summary of Findings

An analysis of our methodology and statistics suggests that on Twitter, among various configurable conclusions:

mashable is more influential than CNN.
sockington is more influential than MCHammer, while MCHammer is more influential than three major social media analysts (garyvee, Scobleizer, and chrisbrogan).
Celebrities with higher follower totals (eg., THE_REAL_SHAQ and ijustine) foster more conversation than provide retweetable content.
News outlets, regardless of follower count, influence large amounts of followers to republish their content to other users.

Click to expand image. A larger version with more temporal depth is linked at the bottom of this report.

We would also like to thank Jon Beilin, Mac Cowell, and Tim Hwang for their invaluable contributions, feedback, and support.

The Influentials (pdf)

10 Days of Influence Tracked by Density of Responses (2993.27 KB jpg)

The Folly of Following Followers: Judging Influence on Twitter

As a simple online platform for conversation, Twitter is an ideal an ecological system through which we can understand the relationship between users and their environments on the Web. Especially compared to other social networks, Twitter simplifies most of the extraneous features and boils down its environment to people and content. The unusual simplicity of Twitter, though, continues to warp perception of how the relationship between user and platform operates. Many of the popularized studies examining influence on Twitter fail to identify the nuances of social interaction in the system. While attempts have been made (eg., http://twinfluence.com/about.php), the analyses tend to focus on the connections between users rather than the relationship of users, content, and platform. This report therefore aims to supplement previous investigations of the Twitter environment with more comprehensive data sets to enhance new approaches to understanding the concept of “influence” on social networks.

A focus solely on the connections between users skews an understanding of how influence operates and flows on Twitter. A popular metric of perceived influence on Twitter measures the quantity of a user’s followers. In general, the more followers a user possess, the more impact he appears to make in the Twitter environment, because he seems more popular (namely, that users follow him). This statement makes sense assuming that Twitter acts as a successful broadcast medium, where a user publishes a tweet and it is read by every follower. However, this view of Twitter as a broadcast medium ignores the potential for users to interact with the content on the platform.

A similar and equally popular metric to measure influence on Twitter relies on the ratio between the number of a user’s followers and the number of other people that the user follows (his audience, or as we designate in this report, followees). This ratio, while better than the former method of counting followers, is still imprecise. Again, a ratio based on audience ignores the ability for a user to interact with content on the platform. However, the ratio of followers to followees does inform a better understanding of how influence can operate in Twitter’s environment.

The ratio of followers to followees may communicate the intended purpose or emergent practices of a user. For example, if the ratio approaches infinity (high follower total versus low followee total), the user account might be described as focusing on the material aspect of Twitter. By material, we mean a compulsion toward moving content to other users in the environment. In another instance, if the ratio approaches 1 (an equal or near-equal amount of followers and followees), the user might be categorized as a conversationalist. The user most likely follows back a majority of his followers, to retain familiarity with more personal conversations. Contrarily, the materialistic user aims to collect followers as contacts to whom the user may push content (who may then share the same content with other users). Finally, if the ratio approaches zero (low follower total versus high followee total), we might categorize the user as a spammer. As an emergent behavior, the stereotypical spammer attempts to collect users with the intent to push content to as many people as possible after achieving a high follower tally. However, most contemporary users can spot the stereotypical behavior of a spammer or bot, resulting in the low follower total on the spammer’s account.

While the follower to followee ratio does not represent an accurate measurement of influence on Twitter, the ratio does inform the community to types of users. Before we apply these types to our understanding of online influence, we must first define influence.

Defining Influence on Twitter

An attempt to define a universal concept of influence on the Web remains difficult, because we must account for the variations of platforms, fluidity of environments, and evolving behaviors of users online. Because each platform is different, this report will rely on a definition of online influence specific to the environment of Twitter. Therefore, we define influence on Twitter as the potential of an action of a user to initiate a further action by another user. The term user is defined by Twitter’s platform. The term action deserves further explanation.

Understanding the term action as it relates to influence on Twitter depends on the fundamental structure of ideas in the environment and how these ideas move. The fundamental unit of content on Twitter is the tweet (a user may type up to 140 characters and publish them to the web interface), so an action on Twitter comprises all interactions of a user and that unit of content (tweet). While we can analyze various types of influential actions (eg., a view on YouTube or a like on Facebook), this report will primarily focus on actions specific to Twitter. Our analysis of influence on Twitter, then, relies on the understanding of how actions shape behavior on the platform.

Influence as Actions; Actions as Responses

While actions on Twitter comprise both those interactions recognized by the platform as well as unexpected emergent behaviors that become widely used by users, Twitter recognizes two actions intrinsic to the system that can occur: the reply and the retweet.

Reply: @username {content}
Example:
@chrisbrogan Thanks for this. I’m new to twitter and it was really helpful
Digitaltonto (on 2009-08-15 at 00:47:17)

Retweet: RT @username {content}
RT @aplusk great article thank U RT @Morgan_Johnston: this great article on health care by Whole Foods cofounder/CEO
cheerok (on 2009-08-15 at 00:31:10)

The reply and retweet are categorized as actions because they are applied by a user to a piece of content. The reply acts as a response to another user’s tweet using new content, while the retweet operates as a citation or paraphrase of another user’s previous content. While both actions have different purposes, both are meant to move content to other users (albeit in differing ways). If a reply or retweet exists with respect to a given tweet, the actions are evidence for influence that has occurred. A reply occurs because a user is influenced to reply to the content; a retweet occurs because a user is influenced to reproduce the content. Literally, the actions are markers of influence.

Two other actions that appear frequently on Twitter, extrinsic to the system yet popular enough to have become adopted by users, require explanation: the mention and the attribution.

Mention: {content} @username ({content})
Watching @BarackObama speak in Colorado on @CNN
RareAir24 (on 2009-08-15 at 19:08:51)

Attribution: {content} via @username ({content})
Fire at Kuwaiti wedding kills dozens, official media says http://bit.ly/wn95A (via @cnnbrk)
ChilliGaz (on 2009-08-15 at 19:40:18)

Similar to the reply and the retweet, the mention and the attribution are categorized as actions because they too are applied by a user to a piece of content. We have separated the mention and the attribution from the more fundamental reply and retweet because the former two actions are not officially recognized by the Twitter platform. In fact, a mention is similar to a reply, except a mention occurs at some point in the tweet other than at the beginning. Comparably, an attribution is similar to a retweet, except an attribution borrows the symbology of the reply to provide a citation for previously published content. We must also note here that, first, while we distinguish the attribution from the mention, we have calculated them from the same database query. Any measurement in this report of mentions also encapsulates attributions; however, we will distinguish the attribution as separate from the mention later in the paper (by tallying it alongside retweets in certain equations). Second, since mentions theoretically serve the purpose of replies, and attributions the purpose of retweets, we have not expounded upon their use in the explanation of influence in the following paragraphs. However, we can hypothesize that the applications of replies include mentions and the applications of retweets include attributions.

Categorizing Actions: Conversation & Content

In the second-to-previous paragraph, we hint at a similar categorization for actions that we previously applied to users. Given two probable types of users, one focused on conversation and another on content, we can map these classifications to actions — replies and retweets, respectively — to explain how the relationship between users of and the actions on a platform shapes influence on Twitter. The purpose of replies assumes that a conversation is the intended goal of the action. In writing a reply, the user has been influenced to respond to a previous unit of content published by another user. Similarly, with a retweet (the objective of which is to push content), the user has been influenced by a previous user’s content to reproduce the content for other users to view. In basic terms, we can see the reply as talking back to the first user and the retweet as passing on content to a third user. However, when assigning values of influence to these types of actions, we do not give preference to one or the other.

Previously, we examined two possible approaches to measuring influence on Twitter: 1) counting the total number of followers a user possesses, and 2) calculating the ratio of a user’s followers to a user’s followees. These two approaches still ignore the relationship between the user, the content, and the platform. The goal of this report is to move beyond these basic assertions about influence by analyzing a comprehensive set of replies, retweets, and other actions on Twitter that act as evidence for the influential potential of users.

Understanding Influence with New Data

For this report, we gathered relevant data from 12 Twitter users for 10 days, between 12:00 am 15 August 2009 and 12:00 am 25 August 2009. We focused on a small number of celebrities, news outlets, and social media analysts widely perceived to be among the more influential users on Twitter. Based on the content and connections of these 12 users, we examined a total of 134,654 tweets, 15,866,629 followers, and 899,773 followees. In response to the 2,143 tweets generated by these 12 users of the 10 day period, we collected 90,130 responses (actions) published by other users (which equates to 87,987 more messages than total original tweets, or a total average of 42 responses per tweet).

We have listed the 12 users below, categorized into three distinct groups that we feel ultimately represent the user types previously discussed. We have also calculated the total number of tweets published by each user, the total number of each users’ followers, and the total number of users that each of our 12 users follows. These statistics were updated between 28 August 2009 and 30 August 2009, so they may not necessarily reflect the exact number of tweets, followers, and followees present during the 10-day window that our data encompasses.

Celebrities	Username	Tweets	Followers	Followees

Ashton Kutcher	aplusk	3,205	3,407,385	209
Shaquille O’Neil	THE_REAL_SHAQ	2,072	2,092,541	562
Stanley Kirk Burrell	MCHammer	6,016	1,331,797	31,202
Sockington	sockington	5,711	1,089,984	380
Justine Ezarik	ijustine	7,718	605,441	3,039

News Outlets	Username	Tweets	Followers	Followees

CNN Breaking News	cnnbrk	1,096	2,712,530	18
BarackObama.com	BarackObama	330	2,018,016	761,851
Mashable.com	mashable	17,914	1,363,510	1,925
CNN	cnn	11,607	193,625	50

Social Media Analysts	Username	Tweets	Followers	Followees

Gary Vaynerchuk	garyvee	7,532	862,790	9,683
Chris Brogan	chrisbrogan	48,341	94,715	88,431
Robert Scoble	Scobleizer	23,112	94,295	2,423

The above table has been arranged in decreasing order by total followers, based on the three distinct categories of users. These categories reveal certain resemblances to aspects of content user types and conversation user types. Generally, news outlets aim to push content, social media analysts strive to perpetuate conversations, and celebrities tend to do both (dependent on their personal practices and the community who follow them). While there are some anomalies (eg., BarackObama), most news outlets have a higher follower to followee ratio (materialistic) while most analysts have a more-equal follower-to-followee ratio (conversationalist). For celebrities, the ratio appears to favor a materialistic purpose on Twitter, but the responses generated by celebrities favor the conversationalist type.

In the graph below, we present a comprehensive diagram of total follow count, to reemphasize the perceived influence that each user projects. Keep in mind that although Robert Scoble (Scobleizer, ranked 12th) appears unimportant compared to Ashton Kutcher (aplusk, ranked 1st), Scoble still retains a high level of perceived influence across the entirety of Twitter, since his total number of followers amounts to over 94,000 (compared to many users that have between 50 and 1,000 followers).

Influence According to Audience Response

Followers, as stated before, cannot account for a reliable measurement of influence on Twitter. Instead, we must take into account the markers of influence — replies, retweets, mentions, and attributions — to inform which user holds more sway over his followers. The graph below measures the percentage of replies, retweets, and mentions per user, based on the total number of responses respective to each user.

Of course, the graph above does not visually portray an accurate instance of influence, because the values are not weighted. Instead, the graph illustrates the relationship between responses by each user’s follower network. Therefore, to further examine the effects that followers have on influence, we present the following two graphs that measure the average number of responses in relation to followers.

In the following diagrams, we have utilized the concepts of content and conversation to create equations for calculating new measurements of influence. We have defined conversation-related responses as the total number of replies added to the total number of mentions (@r+@m), and we have defined content-related responses as the total number of retweets added to the total number of attributions (@RT+@via). The graphs below utilize the equations “content/followers” and “conversation/followers” to illustrate the average number of responses per follower of each of the 12 designated users.

The two graphs above present an interesting theory, in that the social media analysts appear to dominate both realms of content and conversation, thanks to their follower network. CNN and Mashable.com also appear high on the list of users that are able to interact well with their followers as well as push content easily to others.

While the above diagrams suggest that a user’s audience impacts how ideas move around said user to a large extent, these graphs do not take into account the tweets created by our 12 users, especially in relation to the responses the tweets generate. Returning to the graph representing the percentage of all responses, this illustration of influence is not entirely accurate because it does not account for the relative amount of content produced. This is especially important since the original tweets are the influencers that inspire replies, retweets, etc. Below, we present the same percentages of responses in a graph that weighs the comparison of responses against the total number of responses of other users.

The weighted graph above illustrates a significantly different measurement of influence than the previous diagram. If we were to state that influence is dictated by how many responses are generated, then we could certainly argue that Mashable.com is more influential than CNN Breaking News — a bold statement, especially when more than twice as many users follow cnnbrk than follow mashable. However, the weighted response statistics above must be compared to the amount of original tweets that inspired response. We have provided these statistics in the graph below:

The relationship between the original tweet and any subsequent responses certainly matters. For example, even though mashable and aplusk boast similar amounts of reactions (with a difference of 1620 in favor of mashable), mashable originated more than 2.5 times as many original tweets to influence those responses. Therefore, aplusk exerted less effort to achieve near-similar success. Similarly, BarackObama genereated more than 3 times as many responses in the ten-day period than did MCHammer; however, MCHammer originated over 8 times as many original tweets, meaning that the much larger effort he exerted was ultimately not as influential as the effort by BarackObama.

We have addressed the problematic relationship of original tweets and responses by averaging the statistics in the graphs below. The graphs utilize the equations “conversation/tweets” (@r+@m/tweets) and “content/tweets” (@RT+@via/tweets):

The measurement of influence reflected in these graphs most likely approaches the most accurate estimation of influence detailed in this report. To affirm this statement, we must return to our Twitter-specific definition of influence online: the potential of an action of a user to initiate a further action by another user. The two graphs above account for the responses (further actions) in relation to original tweets (actions with potential), while still theoretically accounting for the size of each user’s audience. Still, these graphs do not account for the network of the 12 users’ followers, and as such remain significantly different from the previous graphs depicting average response per 1,000 followers. The optimal situation of maximum influence would account for the most followers possible executing the most actions. However, it is entirely possible that one follower published all of the responses for a given user.

What, therefore, do the discrepencies between original tweets and followers tell us about the data? In the previous follower graphs, social media analysts held most of the top ranks. Contrarily, in the tweet graphs, they make up the last three spots in both graphs. On average, the data suggest that social media analysts receive minimal reward for the effort they exert in maintaining a conversation with their followers. For those users that succeed, most news outlets were more successful at having their content pushed to other users. Celebrities, on the other hand, appear to inspire conversational responses with their followers, yet with more success than the analysts.

These graphs suggest many statements based on various relationships of users, data, and platform. However, although the graphs above represent relative influence among the 12 users, by no means do these diagrams suggest that those ranked last are not influential. For the most part, a general user on Twitter tends to depend heavily on perceived influence, whether it be total number of followers or the ratio of followers to followees. This report, though, attempts to move beyond simple assertions of influence to create a better study of influence on Twitter, supported by new approaches and quantitative data.

Future Approaches for Influence Analysis

This report strives to influence other researchers to pursue influence analysis based not solely on followers but also on the relationship between followers and content, and the interaction of both in Twitter’s system. Although we analyze how actions (responses to a user) represent the influence of a user, our study is limited by sample size, time range, and the ability to collect data. For instance, we hope in the future to develop a more complex algorithm that accounts for the combined influence of both followers and responses. We were not able to calculate user growth rate nor measure the number of responses per exact original tweet. Also, given that this report studies influence on Twitter, we cannot account for any external influence with respect to each user in our sample.

Though we admit our limitations, along with this report we are publishing a comprehensive visualization that marks each original tweet and each response (reply, retweet, and mention) along our 10-day timeline. The graph specifically shows density as a factor of influence over time for the 2,143 original tweets and 90,130 responses related to our dozen users. While our graph does not provides labels for tweet, time, etc., we encourage individual exploration of the data presented in the visualization.

The density of data varies considerably per user and per tweet. While we cannot assign each reply, retweet, and mention to a specific original tweet, we can at least determine certain patterns of density per any given tweet. The two excerpts above reflect the difference in density of responses that a certain tweet might generate. By tracking the density of responses over time, we hope to inspire further research into models of influence and web ecology as a whole.

Click to expand. Warning: image is 2993.27 KB in size.

MC Hammer Can’t Touch Social Media Geeks, Tweeting Cats When it Comes to Influence on Twitter

Web Ecology Research — Mon, 31 Aug 2009 13:46:09 +0000

A Preview of WEP Report #4

When deciding whether someone is worth following or talking to on Twitter, most of us make a snap judgment based on a user’s follower count, but what does this really tell us?

For our fourth publication, the Web Ecology Project decided to move beyond follower count to find a better way to measure influence on Twitter. Focusing in on a handful of celebrities, news outlets, and social media experts widely perceived to be among the Twitter elite, we looked at the extent to which each of these users can:

Spread content through twitter by generating Retweets and Via’s
Foster conversation by generating @s and Replies

The results, taken from 10 days of Twitter activity (August 15th through August 24th), were surprising. Consider how the users we looked at rank by follower count:

When you look at the extent to which any given tweet can spread content or foster conversation, these rankings change significantly:

iJustine, for example, can spread more content than MCHammer, who has over twice as many followers, and in terms of generating conversation celebrities like THE_REAL_SHAQ and aplusk tower dominate news outlets and social media experts alike.

When you look at how much each of these users is able to generate conversation and spread content relative to their follower counts, however, the rankings shift even more dramatically:

Values for MCHammer, aplusk, THE_REAL_SHAQ and CNNbrk plummet, while the social media experts, especially Chris Brogan, become powerful players.

These figures are just a taste of what’s to come. In our full report, weâ€™ll unpack these numbers further and explore the somewhat surprising nuances and types of influence on Twitter.

Detecting Sadness in 140 Characters:

Web Ecology Research — Tue, 18 Aug 2009 14:01:11 +0000

Sentiment Analysis and Mourning
Michael Jackson on Twitter

By Elsa Kim and Sam Gilbert
with Michael J. Edwards and Erhardt Graeff

Michael Jackson’s death created an emotional outpouring of unprecedented magnitude on Twitter. In this report, we examine 1,860,427 tweets about Jacksonâ€™s death in order to test various methods of sentiment analysis and gain insights into how people express emotion on Twitter.

Key findings

At its peak, the conversation about Michael Jacksonâ€™s death on Twitter proceeded at a rate of 78 tweets per second.
Users tweeting about Jacksonâ€™s death tend to use far more words associated with negative emotions than are found in â€˜everydayâ€™ tweets.
Roughly 3/4 of tweets about Jacksonâ€™s death that use the word â€œsadâ€ actually express sadness, suggesting that sentiment analysis based on word usage is fairly accurate.
That said, there is extensive disagreement between human coders about the emotional content of tweets, even for emotions that we might expect would be clear (like sadness).
Tweets expressing personal, emotional sadness about the Jacksonâ€™s death showed strong agreement among coders while commentary on the auxiliary social effects of Jackson’s death showed strong disagreement.
We argue that this pattern in the “understandability” of certain types of communication across Twitter is due to the way the platform structures the expression of its users.

We would like to thank Jonathan Beilin, Evan Burchard, David Fisher, Tim Hwang, Alex Leavitt, Dharmishta Rood, Max van Kleek, Jue Wang, and Seth Woodworth for their invaluable feedback and support.

Detecting Sadness in 140 Characters (pdf), Appendices (pdf)

1. Introduction

On June 25, 2009, news reports announced the death of Michael Jackson, leading to a flood of reactions on Twitter. From 9pmâ€”10pm EDT alone, there were over 279,000 tweets about Michael Jackson, or roughly 78 tweets per second (See graph above). What can be said about this massive body of tweets? What sorts of emotions did people express about Michael Jacksonâ€™s death?

Michael Jacksonâ€™s death provided occasion for a large wave of digital mourningâ€”that is, the expression of grief online, usually coordinated via a common method or localized to a particular webpage. The latter type of mourning has become popular practice on social networking sites such as MySpace and Facebook, where the profile of the individual who has died is transformed into a digital memorial onto which friends and family leave last goodbyes and testaments.

After Michael Jackson’s death, common digital mourning practices emerged on a variety of platforms. Testimonials and goodbyes poured into Michael Jackson’s Myspace page, Facebook saw a similar influx of grievers on Jacksonâ€™s main fan page and in newly created groups. The outpouring of tweets about Michael Jackson contains many similar expressions of grief, but as of yet there has been no research about digital mourning on Twitter.

The body of tweets about Michael Jacksonâ€™s death also offers an opportunity to explore strategies for sentiment analysisâ€”the process of determining the attitude of a speaker or speakers towards a particular topic in a large corpus of text. Because of its 140 character limit on messages and the social mores of the platform, Twitter offers challenges to the natural language processing and statistics-based techniques typically used to analyze sentiment.

This report represents a step towards understanding digital mourning and analyzing sentiment on Twitter. After describing our data, this report presents the results of an analysis of sentiment words in that data and findings from hand-coding tweets about Michael Jackson. This closer look at tweets about Jacksonâ€™s death provides insights into digital mourning practices on Twitter, assesses the validity of our first attempt at sentiment analysis by zeroing in on a word important to that analysis, and gauges the feasibility of doing larger scale sentiment studies in the future.

2. Description of the dataset

For this project, we made use of a dataset of 2,331,066 tweets about celebrity deaths (rumored or actual) collected for reasons that go beyond the scope of this report. These tweets were posted to Twitter between June 24 at 12:37am EDT (the day before Jacksonâ€™s death) and July 6 at 6:48pm EDT and were collected from Twitterâ€™s search API using the following search terms:

MJ
Michael Jackson
Jackson
Farrah
Fawcett
Jill Munroe
Micheal (a very common misspelling)
Goldblum
Billy Mays

From this dataset of tweets, we worked with the 1,860,427 tweets that contain â€œmjâ€ or â€œmichaelâ€ or â€œjacksonâ€ for this particular report. Because we do not yet have a reliable mechanism for filtering tweets by language, this set contains a small portion of non-english tweets; these tweets are excluded in the analysis that follows.

We also isolated those 44,383 tweets in this set that contained the word â€œsad.â€ In addition to analyzing this set of tweets using the ANEW dataset, described below, we randomly selectedÂ 346 tweets for human coding.

3. ANEW Analysis

The Affective Norms for English Words (ANEW) dataset contains normative emotional ratings for 1034 English words. Each word in the dataset is associated with a rating of 1â€“9 along each of three dimensions of emotional affect: valence (pleasure vs. displeasure), arousal (excitement vs. calmness), and dominance (strength vs. weakness) (Bradley & Lang, 1999).

We used this set to conduct sentiment analyses on large sets of tweets by looking at the usage of ANEW words within those tweets. For each analysis, average valence, arousal, and dominance ratings are calculated by determining the frequency of each ANEW word within the set and calculating the average ratings of the ANEW words weighted by this frequency. Similar ANEW analysis has proven useful in other online contexts (Dodds & Danforth, 2009), but has yet to be done with Twitter.

In analyzing the set of 1,860,427 tweets about Michael Jacksonâ€™s death, we found 849,603 instances of an ANEW word being used, and these 849,603 â€˜hitsâ€™ contained the following average ratings:

Valence: 5.713
Arousal: 5.243
Dominance: 5.175

To give these numbers a point of comparison, we ran the same analysis on two different random samples of 1,860,427 â€˜everydayâ€™ tweets, pulled from Twitterâ€™s streaming API between June 8 and June 23, 2009.

Sample 1:
675,137 hits
Valence: 6.350
Arousal: 5.256
Dominance: 5.559

Sample 2:
676,846 hits
Valence: 6.351
Arousal: 5.257
Dominance: 5.60

Given the remarkably similar hit counts and ratings observed between the two random baseline sets, we understand the differences between these baseline tweets and the tweets about Michael Jackson to be significant. In particular, the sizable difference in average affective valence ratings between the sets (~.64) suggests that those users tweeting about Michael Jackson are collectively choosing words in their tweets that expressed negative emotions, as would be expected from digital mourners.

The goal of sentiment analysis, however, is not to learn what words people on twitter are using, but to gain insight into how people are feeling. Can we reasonably infer from the low valence score of our set of Michael Jackson tweets that the people who created these tweets are less happy than normal?

4. Human Coding of â€œSadâ€ Tweets

To better understand the significance of our ANEW analysis, which applies independent ratings of emotion to the words used in a set of tweets, we decided to zero in on a particular ANEW wordâ€”â€œsadâ€â€”to see how it is used.

Within the ANEW dataset, â€œsadâ€ has a very low valence (1.61), and it appears 53,300 times in our set of Michael Jackson tweets, roughly 16 times more often than it appears in our random samples of tweets. As compared to all Michael Jackson tweets, which had an average valence of 5.713, these â€œsadâ€ tweets have an average valence of 3.317. Use of the word â€œsadâ€ appears to be an important reason why the average valence of the Michael Jackson tweets is lower than that of the baseline tweets. By looking at the use of this word, we can better understand what the ANEW analysis method can and cannot tell us about sentiment on twitter.

We hand-coded a set of 346 â€œsadâ€ tweets to see if usage of that word within our set of tweets aligns with the valence rating ascribed to sad within the ANEW dataset. If people tweeting the word â€œsadâ€ were indeed expressing sadness, it would suggest that our ANEW analysis is giving us reliable knowledge about the emotional state of the Michael Jackson tweeters.

4.1 Rating Methods

For each of these 346 â€œsadâ€ tweets, each of our 6 raters determined whether or not the person who had created the tweets was expressing sadness. Raters were told to give each tweet one of four nominal ratings:

â€œYâ€ â€“ yes; the person who created this tweet is expressing sadness
â€œNâ€ â€“ no; the person who created this tweet is not expressing sadness
â€œMâ€ â€“ mixed; the person who created this tweet expresses sadness as well as another conflicting emotion
â€œUâ€ â€“ unclear; the tweet in question is spam, is not in English, or is otherwise impossible to interpret with respect to sentiment.

Beyond giving these directions, we did not do any training of our raters; over the course of coding, we did, however, remind raters several time of the criteria mentioned above (for example several raters needed to be reminded that if you think the tweet is spam, mark â€œUâ€).

4.2 Rating Results

Of 346 tweets containing “sad,” raters, while not necessarily agreeing on any given tweet, reported on average that 271.83 (74.68%) tweets expressed sadness, and there were 222 tweets (64.16%) that all 6 raters judged as expressing sadness.

Raters reported on average that 28.33 (7.78%) tweets did not express sadness, 20.67 (5.68%) tweets expressed mixed emotion, and 25.17 (6.91%) tweets were unclear. There were 6 (1.73%) tweets that all raters reported as not expressing sadness, 7 (2.02%) tweets that all reported as unclear, and no tweets that all raters reported as expressing mixed emotions (See graphs for a summary of these results).

As part of the rating process, coders also highlighted certain tweets that they found interesting or difficult to interpret; these tweets are illustrative of the types of disagreements observed across coders. In addition to discussing these tweets below, Appendix A lists some of these tweets, arranged according to decreasing levels of agreement, and Appendix B lists some particularly illustrative tweets sorted by type.

4.3 Measures of Inter-rater Agreement

As the above results suggest, there was far from perfect consensus among raters interpreting tweets. All 6 raters agreed on only 235 (67.92%) tweets, and at least 5/6 raters agreed on 284 (82.08%) tweets.

Given that there was a relatively large group of raters and a number of categories to choose from, this level of consensus might seem acceptable. However, one must take into account the prevalence of “yes” ratings; with such a large majority of tweets falling under one code, one should expect higher levels of consensus (Sim & Wright, 2005).

In order to better measure how reliably our raters interpreted sentiment in tweets, we calculated Fleiss’ Kappa Îº, a measure of inter-rater reliability well-suited to our coding procedure (Fleiss, 1971). Like other Kappas, this method accounts for random agreement, essentially comparing the amount of agreement seen among coders (defined as an average of every tweetâ€™s P-value, a measure from 0 to 1 of the variation in each tweetâ€™s ratings) to the agreement one would see in a random distribution of ratings. For this set of ratings, we found a Îº of .561; while there are not clear standards for what is considered an acceptable kappa, .7 and above typically suggests strong agreement (Fleiss, 1971).

4.4 The Shifting Contextual Definition of â€œSadâ€

This low level of agreement between coders suggests that even though there are a sizeable number of tweets that clearly express sadness, there is a lot of difficulty in interpreting emotion on Twitter. When we sorted the tweets by their P-value (which range from 1, representing complete agreement, to 0, representing complete disagreement), we found differences between tweets that had varying levels of agreement. From total agreement to near-complete disagreement, tweets varied in type from expressing personal or objective sadness to offering commentary on the auxiliary societal effects of the death, such as the media frenzy. Generally, it was easier for coders to agree on personal declarations of sadness than on instances where â€œsadâ€ was used to describe a circumstance tangential to the death.

Those tweets with a P-value of 1 generally expressed sadness. These tweets ranged from the calm, equivocal statements of sadness to the hyper-emotional. A calm, sad example was:

â€œMichael Jackson’s death is a sad loss…thoughts and prays go out to his family.â€

Note that this tweet displays both an emotional reaction and objective reportage of the social situation. A hyper-emotional one read:

“Michael Jackson Died!! whatt??? im saddened…deeply sad :(â€œ

There were also tweets that combined emotion and objective reportage on the events of the tweeterâ€™s life, including:

“Feeding the baby and feeling sad about Michael Jackson! He left is too soon!â€

and

“Shocked by Michael Jackson’s death.Â Such a sad, sad day.Â Going out for a couple of sales calls, late.â€

This combination of life status update and emotional update leads to consensus among the coders, perhaps because the accompanying life status update helps clarify that the tweeter is not being sarcastic.

Those tweets with lower P-values more often include different uses of the word â€œsadâ€, suggesting that these other types of tweets are more difficult for coders to reliably interpret. At a P-value of .6667, tweets started to include commentary on the death, often of a moral nature, for example:

“sick of hearing about michael jackson now sad yes end of world no and he was no saint people need to remember thatâ€

At a P-value of .46667, tweets began to express frustration at the media frenzy. According to these users, Jacksonâ€™s death was certainly something to be acknowledged and even honored, but it was inappropriate and bothersome for the media to focus on it so heavily. For example:

“@AnnCurry I agree – enough of Michael Jackson. Sad, but . . . others have died, too, but now ignored, thanks to MIchael.â€

At a P-value of .4, one sees more instances of personal commentary, that is, observations about the self that are tangentially related to Jacksonâ€™s death. Examples include:

“Sadd… i love Michael Jackson…!! rest in peace… my mom better buy me a MJ T-shirt……â€

“TMZ.com claims that Michael Jackson is dead, but his Wikipedia page has yet to be updated. How sad is it that I went to Wikipedia?â€

In this latter instance, it is difficult to tell whether or not the tweeter was sad about the death of Jackson at all.

As the P-value decreased to .2 and .2666 and finally .1333, the tweets included confusing grammar, commentary such as:

“Celebrity triple – Ed McMahon, Farah F and MJ – despite the fame, not one of them died in peace – broke and feuding with family – sad…â€

and of course, the appearance of what seemed to be spam:

“RT @bowlsey @JamieC: Very sad about Michael Jackson. HABITAT – for all your furniture needs – habitat.co.uk.â€

At the two levels of highest disagreement, humor was introduced as well. For example:

“Michael Jackson, Billy Mays, and now XHTML 2â€”so very, very sad…â€

The tweets with the least agreement do not report specifically on Michael Jacksonâ€™s death. They volleyed back and forth between mourning Farrah Fawcett and Jackson:

“Who’da thunk that today would be the day that Michael Jackson died? It feels fake.Â I’m SO sad about Farrah Fawcett. Such a surreal day…â€

or commented on Jacksonâ€™s death as a phenomenon that impacted society:

“Saddened and unsurprised watching the prices change on Michael Jackson CDs in second hand shops.â€

5. Discussion and Further Research

At the outset of this study, ANEW analysis revealed a significant difference in the valence values between an average day of tweets and those tweets about Michael Jacksonâ€™s death. But these values do not necessarily correspond to a userâ€™s expressed emotions or explain the variation and nuance in human sadness. For this, we turned to human coders, asking them to rate tweets containing the word â€œsadâ€ as sad, not sad, mixed or unsure. Codersâ€™ ratings suggest that approximately 75% of tweets express sadness, giving credence to the ANEW analysis.

These results indicate that the ANEW dataset is a promising tool for sentiment analysis on Twitter. Having proven useful in this pioneering analysis, ANEW should now be applied to a variety of different samples of tweetsâ€”a larger set of analyses will give us a better sense of ANEWâ€™s strengths and weaknesses and provide a more robust set of referents for any given valence, arousal, or dominance rating.

Hand-coding the emotion in tweets will always provide a more nuanced picture than analysis with ANEW, however, because ANEW measures the presence of individual words instead of considering a wordâ€™s context. Unfortunately, comparing codersâ€™ ratings resulted in a Kappa value of .561, indicating that our hand-coders did not display a high rate of agreement. An important next step, then, is to attempt new rounds of coding with different parameters in hopes of better understanding what is achievable with such coding. If we are able to improve IRA for certain types of analysis, we may be able to perform large-scale human coding projects with tools like Amazon Mechanical Turk. Enough of this coding data could provide the basis for a training corpus with which to automate the process of detecting emotion by its context, instead of simply through individual words as ANEW does.

Developing advanced, pragmatic human or AI coding techniques will facilitate the data-gathering necessary to compare emotional content between platforms, being conscious of the varying constraints of those platforms. After additional studies, we hope to be able to identify which platforms a researcher should first examine in order to gain insight into how particular emotional, social, or psychological phenomena are articulated by different web ecosystems.

Through our hand-coding of tweets, we also developed a typology for tweets that contained the word â€œsad.â€ The further a tweet was from describing a personal emotional experience or the objective social experience of Michael Jacksonâ€™s death, the more difficult it was for our coders to pinpoint whether there was sadness expressed in the tweet or not.

This more careful analysis of tweets about Michael Jacksonâ€™s death paints a complex picture of digital mourning on Twitter. As a loosely organized messaging network, Twitter does not operate as a â€œmemorialâ€ akin to clearly delimited online spaces like Myspace and Facebook; as seen even within tweets that contain the word â€œsad,â€ Twitter seems to support a wide spectrum of reactions to Jacksonâ€™s death, some of which have little to do with mourning. Given the short-lived nature of data on Twitter (the tweets discussed here are no longer available in Twitterâ€™s search, which only goes back roughly a week), users appear more inclined to report Jacksonâ€™s death as a current event and less inclined to memorialize or collectively grieve. Furthermore, Twitter appears to be a far more â€˜personalâ€™ medium than other online spaces: tweeters tended to comment on sadness as individuals watching the public reaction instead of commiserating with particular friends or communities.

Appendix A: Sampling of Tweets ordered from most agreement to least

Coders highlighted these tweets as illustrative of the types of disagreements they saw around coding. Tweets are sorted by these types in Appendix B; see for further explanation

1, Complete agreement : tweets generally sad (statement made from observation, not from stats)

Michael Jackson’s death is a sad loss…thoughts and prays go out to his family.

Emotional + Objective news reportage

Wow. sick to my stomach. Rest in peace, Michael Jackson. So sad. he may’ve been accusof a lot, but he also helped a lot

Hyper-Emotional

Michael Jackson Died!! whatt??? im saddened…deeply sad :(

Hyper-Emotional

Sad, sad day. Still can’t believe Michael Jackson died

Emotional (disbelief)

Feeding the baby and feeling sad about Michael Jackson! He left is too soon!

Emotion + Objective self-reportage

Shocked by Michael Jackson’s death.Â Such a sad, sad day.Â Going out for a couple of sales calls, late.

Emotion + Objective self-reportage

is going to listen to 114 michael jackson songs …its a sad day

Emotion + Objective self-reportage

.66667, Some agreement: tweets generally offering commentary, often moral

I’m so sad about about Michael Jackson! I can’t even get on eonline, wtf?!

Emotion + Self-reportage

MJ and Farrah??? What is the world coming to??? Such a sad day in Hollywood!!! RIP to some of the greats :-(

Emotion

This is bad, real bad, Michael Jackson. Now I’m sad, real sad, all the jacksons….

Humor

It’s sad we lost Michael Jackson. But how many others die and we never hear of it? http://ow.ly/g10W

Commentary, media

I wonder if Murray contributed to Michael Jackson’s death through ineptitude. How sad that such a great star used an outcast doctor.

Commentary, MJâ€™s life

First Michael Jackson then Billy Mays…. what a sad week.

Commentary, possibly Humor

Am I the only not pretending to be sad about Michael Jackson? He was a child fucker…remember?

Commentary, Moral

is watching the rerun of Michael Jackson night on American Idol.Â Suddenly sad in a completely different way ;-)

Humor, sad used to mean â€œpitifulâ€

sick of hearing about michael jackson now sad yes end of world no and he was no saint people need to remember that

Commentary, Moral

HAHAHAHAHAHA MICHAEL JACKSON FINALLY DIED. i know its sad but my god he was a freak tehe that made my week

Commentary/Self-Reportage

OMG Michael Jackson guys! we talking about Michael fucking Jackson!! I am floored!! I mean michael jackson!!!! I’m hella sad!

Hyper-Emotional

.466666, Disagreement: tweets generally ranting

too caught up in wimbledon.. but still saddened by MJ’s passing..

Objective self-reportage + Emotion

@AnnCurry I agree – enough of Michael Jackson. Sad, but . . . others have died, too, but now ignored, thanks to MIchael.

Commentary, media

@JazzyClark For God Sake I Lke Michael Jackson And Everythink Andim sad hes dead but come on enough of the man !!! x :L

Emotion, frustration

i get mj’s death was tragic but does it have to be shown everywhere?

Emotion, frustration

.4, More Disagreement: tweets generally personal commentary

Sadd… i love Michael Jackson…!! rest in peace… my mom better buy me a MJ T-shirt……

Emotion + Personal Commentary

TMZ.com claims that Michael Jackson is dead, but his Wikipedia page has yet to be updated. How sad is it that I went to Wikipedia?

Personal Commentary, note use of sad as â€œpitifulâ€

.2, Greater Disagreement: tweets generally commentary

I wish MJ’s legacy wasnt tainted by lies. Its sad.

Commentary, sad means â€œpitifulâ€

sadd because michael jackson diessÂ : ( buhh lovess my baybee ohdee tehe111308

Confusing grammar

Celebrity triple – Ed McMahon, Farah F and MJ – despite the fame, not one of them died in peace – broke and feuding with family – sad…

Commentary, sad means â€œpitifulâ€

.2666, Severe Disagreement:

i’m sick and tired of hearing about MJ’s death, yes he died, that’s sad. Just leave the man alone already!

Emotion, rant/frustration

RT @bowlsey @JamieC: Very sad about Michael Jackson. HABITAT – for all your furniture needs – habitat.co.uk.

Spam

Michael Jackson, Billy Mays, and now XHTML 2â€”so very, very sad…

Humor

.133333, Least Agreement: tweets tend to report sadness that is not specifically a response to Michael Jacksonâ€™s death

3/4ths of everything on blip.fm right now are Michael Jackson songs. This one = great jam / sadly fitting. â™« http://blip.fm/~8vuad

Commentary, real-time events

Its sad how farrah has been overshadowed by MJ. She was just as great as him just i n a different career! R.I.P. FARRAH!!

Commentary, media

Who’da thunk that today would be the day that Michael Jackson died? It feels fake.Â I’m SO sad about Farrah Fawcett. Such a surreal day…

Emotion, multiple

Saddened and unsurprised watching the prices change on Michael Jackson CDs in second hand shops.

Commentary, real-time events

Appendix B: Typology of Tweets with examples

Objective: Reporting Sadness as news, part of updates on tweeterâ€™s life

It is a sad day

too caught up in wimbledon.. but still saddened by MJ’s passing..

Feeding the baby and feeling sad about Michael Jackson! He left is too soon!

Shocked by Michael Jackson’s death.Â Such a sad, sad day.Â Going out for a couple of sales calls, late.

Emotion: Simple expression of sadness

I am sadden by MJ’s death…

RIP Michael

Emotion: Personal sadness/extreme sadness

I’m devastated about Michael Jackson.What a sad day!!!

it sunk in..Â MJ is gone..Â as don lemmon put it, â€œMichael Jackson’s music is the soundtrack to my childhoodâ€..Â my life.Â i’m sad..

Emotion: Rant, expressing frustration at the media

i’m sick and tired of hearing about MJ’s death, yes he died, that’s sad. Just leave the man alone already!

i get mj’s death was tragic but does it have to be shown everywhere?

Commentary/Editorial: Regret

Itâ€™s so sad that he died so young

Commentary/Editorial: Chastising others for what appeared like forgiveness of his â€œsinsâ€; sad used as â€œpitiful.â€

So Michael Jackson died today . . . like i care.. I am more saddened about Farrah Fawcett’s death then a shiesty child molesters death…

Am I the only not pretending to be sad about Michael Jackson? He was a child fucker…remember?

I hope I’m not offending my friends for not being sad over MJ’s passing. I won’t be sad when OJ dies, either.

Humor: Making light of something about the event

Is watching the rerun of Michael Jackson night on American Idol.Â Suddenly sad in a completely different way ;-)

Michael Jackson, Billy Mays, and now XHTML 2â€”so very, very sad…

Sources Cited

Bradley, M.M., & Lang, P.J. (1999).Â Technical report C-1, Gainesville, FL. The Center for Research in Psychophysiology, University of Florida. Retrieved from http://www.uvm.edu/~pdodds/files/papers/others/1999/bradley1999a.pdf

Dodds, P.S. & Danforth, C.M. (2009). Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents. Journal of Happiness Studies. Retrieved August 16, 2009 from http://www.springerlink.com/content/757723154j4w726k/

Fleiss, J. L. (1971) Measuring nominal scale agreement among many raters. Psychological Bulletin , 76(5), 378â€“382
Sim, J., & Wright, C.C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257-268. Retrieved from http://www.physicaltherapyjournal.com/cgi/reprint/85/3/257

The Iranian Election on Twitter:

Web Ecology Research — Fri, 26 Jun 2009 17:37:56 +0000

The First Eighteen Days

Key Findings

From 7 June 2009 until the time of publication
(26 June 2009), we have recorded 2,024,166
tweets about the election in Iran.
Approximately 480,000 users have contributed
to this conversation alone.
59.3% of users tweet just once, and these users
contribute 14.1% of the total number.
The top 10% of users in our study account for
65.5% of total tweets.
1 in 4 tweets about Iran is a retweet of another
userâ€™s content.

The Iranian Election on Twitter (pdf)

You may have notice that I Write a Lot, and you can learn how to write correctly at that link!

Quantitative Data – Web Ecology Project

The Revolutions Were Tweeted:

Sample Analytics You Can Create On 140Kit

Afghanistan and its Election on Twitter: The Macro Picture

Data Summary

Introduction

Tweets

Users

Hashtags

Next Steps for this Research

The Influentials

The Influentials (pdf)

The Folly of Following Followers: Judging Influence on Twitter

Defining Influence on Twitter

Influence as Actions; Actions as Responses

Categorizing Actions: Conversation & Content

Understanding Influence with New Data

Influence According to Audience Response

Future Approaches for Influence Analysis

MC Hammer Can’t Touch Social Media Geeks, Tweeting Cats When it Comes to Influence on Twitter

Detecting Sadness in 140 Characters:

Detecting Sadness in 140 Characters (pdf), Appendices (pdf)

1. Introduction

2. Description of the dataset

3. ANEW Analysis

4. Human Coding of â€œSadâ€ Tweets

4.1 Rating Methods

4.2 Rating Results

4.3 Measures of Inter-rater Agreement

4.4 The Shifting Contextual Definition of â€œSadâ€

5. Discussion and Further Research

Appendix A: Sampling of Tweets ordered from most agreement to least

Appendix B: Typology of Tweets with examples

The Iranian Election on Twitter:

4. Human Coding of â€œSadâ€ Tweets

4.4 The Shifting Contextual Definition of â€œSadâ€