Friday, January 7, 2011

Co-relating News and Tweets: "Ins and Outs of News Twitter as a Real-Time News Analysis Service"

The Web has seen a massive transformation with its read-only nature diminishing more and more and evolving into a read-write nature. Social networks are one of the driving forces behind this transformation and hence, the Social Web can be seen as a fundamental source of more and more UGC (user-generated content).

The phenomenon of "UGC" has also had a significant impact in the domain of Web Search, which happens to be my area of research: a study conducted in 2010 puts Facebook ahead of Google in terms of Web site hits. Many of the major search engine companies such as Google, Yahoo and Bing are now looking at means to take into account the Social Web into their search results. The WWW 2010 paper titled "Anatomy of a Large-Scale Social Search Engine" describes the phenomenon in considerable detail and I recommend it as a must-read to those interested in the field. In fact the team behind this paper created a social search system Aardvark that has now been acquired by Google.

Despite the tremendous amount of importance and attention being given to the concept of social search, one significant domain within this area has not yet been explored much which this recent paper by me and my research group attempts to explore. In this paper we present a system which aims to identify and detect hot news items in real time by taking into account user popularity and temporal features. We present a prototype of the approach using the popular microblogging service "Twitter" and present the results of some initial evaluations of our approach.

The proposed system analyzes real-time news by using the data from Twitter. We give a description of news services, followed by an architecture of how one can assess news popularity. The architecture is built upon a Web crawling framework and a news parser followed by application of natural language processing techniques on the news data which is then finally linked with the Twitter Search API. At the user interface end, we use a simple timeline-based visualization to showcase the popularity of news across time. Furthermore, data from the popular news service over a period of 10 days was crawled on a daily basis and analyzed for co-relation with tweets, this analysis reveals interesting results such as the news bias exhibited by news services. Below is the paper, which can be downloaded as well.

The paper will be published in the proceedings of the workshop "Visual Interfaces to the Social and Semantic Web (VISSW 2011)" co-located with International Conference on Intelligent User Interfaces (IUI 2011) to be held at Stanford University in February, 2011. I am sharing it over my blog on request of some students who have shown a lot of interest in the field. For further details/questions/feedback a personal email to would be preferred. Also, students willing to work with our research group in this dimension may contact me in person. I will also be uploading the slides and talk for this paper soon.