Saturday, August 27, 2011

Visit to Russia: RuSSIR/EDBT Summer School

Although I constantly microblogged on Twitter during my trip to Russia but nothing replaces a detailed blog post when it comes to coverage. I definitely wish to have an archive of details for myself and Information Retrieval (with of course other related areas) students around the world. I along with my husband and colleague Muhammad Atif Qureshi visited St. Petersburg, Russia from 14th August, 2011 to 20th August, 2011 for attending the prestigious Russian Summer School in Information Retrieval (RuSSIR) which was co-located with Russian Young Scientists' Conference where we presented our research work. This year's RuSSIR was quite special as the EDBT summer school was also co-located with it and as such the breadth and depth of the lectures presented at the school was immense. Here is a brief overview of the lecture sessions that I attended along with a good news for students in Karachi, Pakistan.

SocM Session: The Social Mining session was conducted by two well-known industry people namely Vladimir Gorovoy of Yandex and Yana Volkovich of Barcelona Media Innovation Center. It was highly interactive and practical with a practical recommendation task for students for which they were provided with a real dataset from Yandex Market. Here is a link for students who wish to try it out: Yandex Market practical task from RuSSIR. The session fundamentally covered various aspects of mining social media data, it began with a very correct observation borrowed from Google's analytics evangelist Avinash Kaushik that "Social media is the hot thing today, almost every one seems excited to get involved in it but no one actually knows how." This session covered that how with a glimpse into graph mining methods (PageRank, TunkRank and TwitterRank being some examples), models for opinion mining of reviews left by customers, social media engagment metrics and social innovation platforms for the future. In short, it was an extremely engaging and knowledge-enriched session particularly helpful for social media analytics students: I learned a lot during the course of this session and am particularly thankful to Dr. Yana Volkovich for some of her wonderful suggestions that will really help me in my own research.

Plenary Session (Knowledge Harvesting from Web Sources): I found this session very informative and full of pointers for new research ideas although it was a bit away from my own research area. Gerhard Weikum (Research Director at Max-Planck Institute for Informatics (MPII) in Saarbruecken, Germany) presented a comprehensive overview of research methodologies that can turn the Web into a large-scale Knowledge Base and few examples of such Knowledge Bases include DBpedia, KnowItAll, ReadTheWeb, and YAGO-NAGA, as well as industrial ones such as Freebase and Trueknowledge. The tutorial presented research methodologies along the avenue of knowledge harvesting with some examples of work on unification of WordNet and Wikipedia in YAGO, identification of a long tail of instances of entity classes through harvesting textual snippets on the Web and entity search through language model ranking. Overall the session was intense and the slides quite heavy with lots and lots of natural language processing material but definitely a great learning activity from the point of view of tools to use for your own research.
















SentA Session: This session was one of the most exciting ones for me as my own research centered around Sentiment Analysis. Professor Mike Thelwall who heads the Statistical Cybermetrics Research Group at the University of Wolverhampton delivered the talks in this session and it mostly centered around the Sentiment Strength detection tool of his research group namely SentiStrength. We were also taken through a live demonstration of the tool after which Professor Mike Thelwall explained in detail its various features along with the underlying algorithms and its experimental evaluations. The SentiStrength team has done a pretty good job at managing this tool and the best things about it is that the word list marked with a word's positive/negative strength is publicly available for research purposes. During this session students were also introduced to machine learning methods of Sentiment Analysis with detailed explanation on feature selection, gold standard creation and 10-fold cross validation. To sum up, this session was extremely useful for students wishing to make a career in Sentiment Analysis and I specially thank Professor Mike for his valuable suggestions on various aspects of the field.
















ColIR Session: This was a short session conducted by Chirag Shah of Rutgers University. It touched completely new dimensions within the field of Information Retrieval namely Information Retrieval facilitated through collaboration. According to Professor Chirag Shah with the emergence of collaborative Web platforms, information retrieval has also moved towards a completely new dimension. The traditional view of IR is that it is an individual activity: the Collaborative IR community challenges this notion by describing it as a co-ordinated activity and they have also proved their ideas in both theory and practice. This session covered both the theory and practice behind collaborative IR situations, systems, and evaluation techniques.

TopK Session: This session presented by the two charming ladies Sihem Amer-Yahia and Julia Stoyanovich was simply fantastic. We were introduced to a whole new approach of solving some of the toughest problems in social media and this approach comes from the old, classical database field. The session mainly centered around Top K processing, one of the well-known methods for ranked retrieval within the DB-IR research community, which was presented in a unique manner with a special focus on applying it to search and information discovery on the Social Web. Such applications were discussed from two significant viewpoints: 1) efficiency (minimizing both space and time requirements) and 2) user satisfaction. Both the researchers presented a comprehensive overview of their papers published in top Database and Information Retrieval conferences: VLDB, ICWSM, SIGMOD and ACM HT. Their research within the efficiency dimension was based on incorporation of upper bounds on classical top-k algorithms (threshold algorithm and no-random access algorithm) in order to minimize time and space complexity. Their research within the user satisfaction dimension presented the fundamental idea of scaling up user studies to thousands of users through leverage of crowd-sourcing platforms such as Amazon Mechanical Turk.Currently I am reading these papers to look for dimensions that can be applied to my own research in Social Media Analytics.
















Here is an archive of tweets during my attendance at RuSSIR:

#RuSSIR sessions kick off with interesting presentation on Social Media Mining by @yvolkovich and Vladimir Gorovoy

Not many people know abt. a social network exclusively devoted to travel and hospitality: CouchSurfing


Can an online social network build enough trust to allow strangers to sleep on each others’ couches: Adamic's paper http://bit.ly/prdxTy


"The Web today is the largest knowledge encyclopaedia - we need it to turn it into a comprehensive Database" - Gerhard Weikum at #RuSSIR


In a very interesting talk by Mike Thelwall explaining the working of the famous sentiment analysis tool SentiStrength #RuSSIR


Automatic sentiment analysis has more or less the same accuracy as human sentiment analysis due to complexity of problem - Mike Thelwall


A look into inside of Yandex Market by @vgorovoy in session of Social Media Mining http://twitpic.com/66vfkp


Interesting talks in TopK session at #RuSSIR: essentially about converting social media research problems to traditional database problems


Researcher from Barcelona Media Innovation Center explains the science of social media mining #RuSSIR


Mention of work of KAIST's @sbmoon in #RuSSIR in Social Media mining lecture


Andrey Plakhov explains how entity-oriented search works at Yandex: Russia's search engine that has larger market share than Google Russia


Wonder where this rule came from #RuSSIR #Yandex http://twitpic.com/67e4w1


Sihem Amer-Yahia of Qatar Computing Research Institute continues day 3 of session on TopK Processing for Social Applications


Wonderful graphic by @yvolkovich on visualization of social media conversations during Spain protests #RuSSIR


Take-home from ColIR session: Science is all about collaboration unlike the Humanities #RuSSIR
AlJazeera English tracking information of users who visit the site for improved user experience - Sihem of QCRI at #RuSSIR


SearchTogether by Microsoft Research takes user-mediated Collaborative Information Retrieval one step ahead #RuSSIR


ColIR session: reason behind failure of Google Wave was the difficulty of the system requiring a 60-minute video tutorial #RuSSIR


Take-home of TopK #RuSSIR session: Social Web is full of challenges, our online social experience will be as good as we researchers make it


A week of super-duper learning and knowledge-sharing, intense discussions and lots of research take-aways. Hats off to #RuSSIR team!!

In short Russia is a wonderful place to visit and St. Petersburg is mind-blowing. Russian people are extremely hospitable, friendly and what's best about them is their love and passion for Mathematics. All in all Russia is a great place to visit if you are a Computer Science researcher as it is full of wonderful Computer Scientists both established researchers and young science-aspiring students.

At the end I am glad to announce that Web Science group at Institute of Business of Administration will conduct an open seminar which will educate Pakistani students in some of the above-mentioned topics. Feel free to contact me in case of any suggestions for the seminar, or any topic you wish to include.