Saturday, November 10, 2012

Mining Tweets of World Cup T20 Match between India and Pakistan: Interesting Insights from Social Network Analysis

I have been quite absent from this space I call my blog for quite sometime now and this is not without reason. The past few months have been extremely busy with lots of traveling (Milan, Venice, Rome, Nijmegen and Copenhagen all in three-four months' time),  and of course the never-ending paper submissions. As I had explained in my previous post on online education initiatives I am also taking the Computer Science online courses on Coursera and this semester I happened to take up a very interesting course by Lada Adamic of University of Michigan (Social Network Analysis). Though I have myself taught some aspects of Social Network Analysis during a summer course at Faculty of Computer Science at IBA, Karachi but despite that I found this course intriguing and the way Lada enriched it with cool applications of SNA was simply amazing.

As an optional part of this course the students were to submit a programming project and I thought what better opportunity than this to submit a part of the TweetCric project being undertaken by our research group. It's always good to get some early feedback on your work in order to gain useful, innovative directions and hence, I decided to blog about my Social Network Analysis project. Readers are welcome to suggest any new directions or give their feedback in comments so as to help us in this project. Following is a description of the project for interested readers of my blog:

Social media applications have considerably influenced the lives of millions and everyday there is a huge amount of updates to various social networks such as Facebook and Twitter. As of March 2012, more than 400 million tweets were being posted on Twitter each day. The volume of tweets becomes significantly high during a sporting event as many sports fans now use social media as a part of their viewing experience. Users describe this as an experience full of pleasure and fun as described in following Facebook status update during the recent World Cup T20 match between India and Pakistan:

"Facebook comments are more interesting than the match. Already more than two pages of comments. Looks like PakInd Vs Facebook"

Interestingly, the huge amount of content produced during sporting events can be used for analysis of players' performance and in light of that sports managers can decide future sports strategies and hence the notion of crowd-sourced sports critics can be realized in practice. Researchers have already begun to explore the possibility of using this huge volume of user-generated content to solve various research issues such as event detection, video annotations for sports summaries etc. [1, 2]. We argue in this work to utilize this huge crowd-sourced content for the usefulness of sports strategy analysts and decision-makers. In this work, we use social network analysis to highlight significant players during the match along with an analysis of the reasons of why social network analysis methods detect these players.

Social Network Modeling
The data was obtained using the Twitter Search API. During the epic match held on September 30th 2012, we gathered tweets for the match using the Twitter Search API. We regularly queried the Search API through a Python script on half-hour intervals thereby collecting fresh tweets as the match progressed. In total we collected a sample of 43,450 tweets during the match with hashtag PakvsInd.

We modeled the social graph of the players and commentators using the text content of the tweets. First, using Wikipedia and ESPN CricInfo as an external resource we compiled a list of players and commentators relevant to the India-Pakistan cricket match. This list was then used to detect tweets containing a mention of any player or commentator; following list shows some sample tweets

  1. hafeez goes, 15 from 28 balls.. idiot, wasted his time big time. game over. #pakvsind
  2. like the world cup of pakistani batsmen falling against yuvraj. kamran also departs edging to dhoni. pak 56/4 after 9. #pakvsind
  3. rt @maria_memon: rt @maria_memon: afridi! quit playing games with our hearts....our hearts....#pakvsind'
  4. hafeez is the reason for todays batting performance.... after nazir he put all of the team under pressure! #pakvsind
  5. is dhoni trying to piss off pakistanis by bringing in kohli? #pakvsind'
We now explain how we formulate the nodes and edges in our social network of players and commentators. Each player/commentator is treated as a node and an edge is represented between players/commentators if they co-occur in a tweet. As an example consider tweet 2 above; there would be edges between yuvraj, kamran and dhoni according to our model. In total 8,587 tweets (19.8%) contained a mention of some player or commentator.

The following figure shows the visualization that was obtained from this social network (Gephi was used for the generation of the graph)

Modeled social network of players/commentators during World Cup T20 India-Pakistan match
As clear from the Figure, there are three communities within this social network. Nodes are sized according to betweenness centrality and it can be seen that Hafeez is the node with highest betweenness: this is because this particular player was the captain of Pakistani team in that match, Pakistan lost the match due to his poor captaincy, poor fielding placements and poor batting (as per most of the tweets). The node with second-highest betweenness i.e. Kohli is the one who got man of the match and scored the highest runs leading India to a comfortable victory. Hence, it can be seen that social network analysis gives important insights into sporting events. Natural language processing as an alternative approach seems to lack the precision and efficiency that social network analysis offers. Our team has been long arguing for a hybrid approach that utilizes both natural language processing and social network analysis approaches to address the various research questions in the fields of Information Retrieval and Web Information Systems given the low scalability and speed of Natural Language Processing alone [3].

We now analyse the communities within this dataset. The community represented in blue is mostly comprised of Indian players and it makes sense as to why they form a separate community. However, the inclusion of Misbah and Ajmal in this community is weird since both are Pakistani players - further analysis reveals as to why this occurred and it was due to Ajmal taking important wicket of Sehwag causing Ajmal to go into that community and Misbah being mentioned with Ajmal once forced him there too. The community in dark green represents for the most part Pakistani players with the exception of Dhoni who is the Indian team captain; this however occurs due to Twitterers comparing Hafeez's captaincy with Dhoni's captaincy thereby forcing Dhoni in that community. Lastly, the community in aqua green represents players who did not play in the match with the exception of Afridi and he was forced into that community due to tweet suggestions from Pakistani cricket fans of dropping him and including him in the list of those not playing the match.

Lastly as I mentioned in the beginning of this post as well any feedback or idea is welcome. Interested students who want to join this project are requested to contact me personally via email or social networks.

[1] J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, IUI ’12, pages 189–198, New York, NY, USA, 2012. ACM.
[2] A. Tang and S. Boring. #epicplay: crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pages 1569–1572, New York, NY, USA, 2012. ACM.
[3] A. Younus, M. Qureshi, F. Asar, M. Azam, M. Saeed, and N. Touheed, “What do the average twitterers say: A twitter model for public opinion analysis in the face of major political events,” in 2011 International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2011, pp. 618–623.

Friday, May 18, 2012

Online Education Initiatives: A Hope for Education in Less Developed Countries

I very well remember my time in the Computer Science Department of Karachi University when teachers who did not take classes really annoyed me. Other class-fellows would call me crazy on account of being so nerdy but I knew this was a valuable period of our life which would never come back. This was the age where the mind is ready to absorb all knowledge which thanks to our messed-up education system (not to forget the loads of politics that pollutes it) was literally being wasted. The dilemma of technological sciences such as Computer Science in lesser developed countries like Pakistan lies in it being more of a hype than a science. In my part of the world students flock to Computer Science to get good jobs after graduation: of course this is a necessity and the point of a good education but shall it be the only goal is the real question we should address.

Back then there was a frustrating time when the Object-Oriented Programming teacher gave us the option of either to learn OOP concepts with C# or C++, and, unfortunately most of the class went for C# due to its being in demand by the job market. At that point, I realized how tough a time Computer Science will get in Pakistan and this remains true to this day. Sadly not only students but teachers have also promoted the job-oriented study model leading to a myth that Computer Science is all about sitting on a desk writing code in .NET or PHP (or any other programming language for that matter).

Meeting the well-known scientist, Rakesh Agarwal from Microsoft Research confirmed my assertions about the pathetic state of affairs of technological sciences in countries like India and Pakistan. He shared the same dissatisfactions as me, and strongly criticized the industry in the lesser-developed countries. Equivalently sad is the state of affairs at the national universities in South Asia, and the situation is changing at a very slow pace. When Stanford announced its online courses, I saw acquaintances in my social network sharing about it and the ones most excited about these online courses were undergraduate students from institutes of my country. This as I see it is a silver lining admist the dark clouds as online education initiatives like Coursera, EdX and Udacity will now grant access to quality education to students from all over the world. This in my opinion is a huge step towards bridging the digital divide and it is now upon students in the developing world to make most of this opportunity. Today's connected society gives easy and massive access to knowledge unlike the situation I had back in my undergraduate days and I feel students today are far more blessed than students of my time.

What started as an educational initiative by accomplished Stanford Professors Daphne Koller and Andrew Ng has now turned into a global phenomenon with the best universities contributing to make knowledge open for all. If studying at world's reputed universities (Stanford University, MIT, Harvard, University of Michigan, University of Pennsylvania  etc.) was ever your dream then there can be no better time to go and get that dream. Some students might take this as an exaggerated statement but this comes from me after personally taking two online courses this semester and enjoying them to the maximum. Furthermore, Coursera statistics also confirm the value that online education has now added to universities; they could never have achieved this value as Andrew Ng puts it: "I normally teach 400 students," Ng explained, but last semester he taught 100,000 in an online course on machine learning. "To reach that many students before," he said, "I would have had to teach my normal Stanford class for 250 years."

It is a generally held notion that the academic culture and the styles of teaching in our part of the world are out-dated and boring. I can certainly confirm this assertion on account of my experience in Pakistani academic circles for quite sometime now. For the most part, higher-education circles in developing regions limit ideas to an academic document on a shelf quite unlike the way that things are done in the top research universities of the world. Students have always wanted to know how the ideas that they study in the classroom apply to the real-world problems around them. With world-class Professors offering online courses, there is an oppurtunity to get much of those questions answered.

Online education as a phenomenon is not new and for years people in less developed regions have been skeptical of them but it's quite different with Coursera and other similar initiatives. The revolutionary ideas behind these initiatives are the concept of testing, grading, student-to-student help and awarding certificates of completion of a course. Daphne Koller, a Stanford computer science professor who founded Coursera with Ng, explained in her talk at LinkedIn last week, "It will allow people who lack access to world-class learning - because of financial, geographic or time constraints — to have an opportunity to make a better life for themselves and their families."

So the next time students come to me seeking advice on how to start with research or how to apply for foreign universities I'd recommend him/her to take some courses (that relate to his/her area of interest) on Coursera or any such platform. With such initiatives coming from the world's top-class universities there is a hope for revolutionization of higher education by allowing students from all over the world to not only hear top-quality lectures, but to do homework assignments, be graded, receive a certificate for completing the course and use that to get a better job or gain admission to a better school.

Sunday, April 15, 2012

WWW2012 Poster: New Media vs. the Old Media

Today's social-media savvy age has considerably changed the paradigm of traditional journalism. Interestingly, it has also led to new debates within the journalism and media industry with supporters of social media terming it as a platform for the masses' voice while opponents terming it as gibberish and noise. Old-school journalism disregards the significance of social media popularity for any article on the pretense of “journalism is not about feeding the masses with whatever crap they want to be fed with.”

It turns out that this entire debate is not as simple as it appears to be on the outlook. What old-school journalism advocates do not take into account is the age-old phenomenon termed as “media bias” by the social sciences research community. A famous paper published in 2004 by the Department of Political Science at UCLA and the Department of Economics at University of Missouri studies the bias of famous news outlets in the US. Since then there have been various attempts at studying biases in traditional media platforms (such as New York Times, Fox News, Washington Post, CBS, Wall Street Journal) with most of these coming from the sciences (social science, political science, Computer Science). Empirical evidence is what is given utmost importance from a scientific viewpoint and unfortunately the social media circles in Pakistan tend to ignore this angle altogether. This brings into the picture a new phenomenon of bias measurement in various forms of media which turns out to be a huge research challenge within itself. The solution: yes, social media with the insights and popularity judgements can serve as a tool not just for the masses' voices but also for measurement of bias in traditional media and this is exactly what a team of researchers in IBA's Web Science group have done.

The crucial nature of the media industry makes it all the more essential to have ways and means of verification of its content. This leads to the natural question of how new media namely the social media can help measure the inevitable biases inherent in traditional media. Few of these questions have been answered by researchers from one of Karachi's most prestigious educational institute, Institute of Business Administration whereby they investigated differences between news appearing on traditional and social media platforms via publicly available data from famous microblog site Twitter. Being a part of this team made me delve deeper into various aspects of media both internationally and in Pakistan with my observation being that today's media tend to ignore the crucial role of social media and does not take into account popular demands. With this conclusion, we argue for a paradigm shift in how traditional media platforms perceive the new media landscape and the sooner they embrace this new world the better for their own survival.

Some technical details of the study warrant an explanation which is as follows. The data mining similarity metric of Jaccard Similarity has been used to investigate the differences in named entity coverage between the 16 million tweets posted during the time period of Egypt uprising (tweets' data obtained from TREC 2011 microblog track) and the New York Times articles corresponding to Egypt. The figure below shows our results:

It demonstrates a significantly low value of coverage (Jaccard Similarity being below 0.5 for all days) thereby proving the presence of media bias. Moreover, we extend this study to a local level (for Pakistani media outlets) on a daily basis for the month of November. The extension utilizes topic models (specifically standard LDA and Twitter-LDA) in order to discover similar topics in the two media followed by a ranking function which computes popularity of a news item in the two platforms. This is then compared with a manually ranked list with the final result being that the ranks obtained from social media (tweets data) match the human-annotated ranks more closely.

For those interested, here's the abstract of our paper:
It is often the case that traditional media provide coverage of a news event on the basis of journalists’ viewpoints - a problem termed in the literature as media bias. On the other hand social media have given birth to an alternative paradigm of journalism known as “citizen journalism”. We take advantage of citizen journalism to detect the bias in traditional media and propose a simple model for empirical measurement of media bias.

Note: This is part of a long-term project by the Web Science research group at Institute of Business Administration, Karachi, Pakistan and we welcome interested students to be a part of our project.

The slides for the work can be viewed here and the full 2-page poster paper can be downloaded from here.

Sunday, March 4, 2012

Three cheers for Professor Moon

Few days back when I read in my Facebook news feed an update from Professor Sue Moon that she is now tenured Professor at KAIST, I was immensely delighted. This post is a special tribute to Prof. Sue Moon from a student that did not get to spend much time with her but whatever time I spent with her played a huge role in my learning path. It all began in Spring 2009 when I took up Professor Moon´s course on Advance Networking. At first she sounded hard to impress but then I figured out it´s her way of teaching the students. The Advance Networking course she was teaching us was special, it turned out to be one of the toughest and yet greatest learning experiences of my life. She had specially designed the course keeping in mind the struggles young researchers have to face. Throughout the semester we were expected to read papers, write a critique of the paper and present some of the selected papers in class as if they were our own papers. This activity turned out to be quite hectic and each student used to dread the day when he/she had to present and one strong reason for that was Professor Moon´s fiery questions about the technical aspects of the paper. She used to spend hours in polishing our presentation and paper reading skills asking us to read papers from a critical angle so as to highlight its strong and weak points. She taught us a skill that is very valuable in the scientific community and that skill was captivating the audience when giving a technical talk, this rare skill is seriously lacking even among the best scientists of our community.

The semester ended and we all got back to our busy research life at KAIST but then in later parts of my Master´s degree I realized that her teaching and the way she groomed us in that course was extremely helpful. She literally taught us how to fall in love with research: an ability quite rare even among graduate students in world´s top universities. She keeps these technical how-to talks on her Web page and I have gone through all of them, I would definitely recommend these for all aspiring Computer Science researchers out there.

I want to particularly thank Prof. Moon for all she gave me. Knowledge, in my opinion is a priceless gift by itself and I am out of words to express my gratitude to her. Thank you Professor Moon for playing a role in my research path, your training has proven to be a great gift for me. Although my own Master´s advisor Professor Kyu-Young Whang taught me the most during my stay at KAIST (his training has also been invaluable in shaping me up as a researcher) but Professor Sue Moon is special due to the fact that she is one of the most outstanding women in Computer Science I have known. This field surely needs more inspiring women like her. I hope to meet her some day in order to thank her in person.