Friday, July 23, 2010

[From SIGIR 2010]: Best Paper on Value of Search Trails in Web Logs

We search daily for information; in fact it would not be wrong to say that search has become an integral part of our life on the Web. However while searching for particular information comes a complex range of interactions which varies for different users and different queries and it is these complex interactions that are recently attracting focus of researchers at Microsoft for their Bing search engine and this was the theme of the best paper in SIGIR 2010 "Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs."

The paper itself is very interesting and again it seems that the focus of future Information Retrieval researches would heavily come from Human Computer Interaction as was also obvious from keynote talk in SIGIR 2010.

What happens when you enter a keyword for searching on Google, Yahoo or Bing: a list of Web pages are returned which are ranked based on their relevance which has been computed for much time with the much-renowned PageRank and now variants of PageRank are used for the purpose. Now what do you do with these results? You either follow the different links one after another and finally set to a page that you find to be most satisfying for your query: the entire set of pages followed have been referred to as search trails by White of Microsoft Research and Huang of Washington University and in this research they have studied the value that users derive from this entire activity through a log-based analysis. The researchers collected logs of URL visits of users who opted to provide this data through a widely distributed browser toolbar; the data was collected over a three-month period from March 2009 to May 2009. Formally a search trail is defined a temporally-ordered sequence of URLs beginning with a search query and ending with either: (1) another query, (2) a period of inactivity of 30 or more minutes or (3) termination of browser instance or tab; the figure explains this more clearly:



In the figure the circle represents query along with search engine result page, rectangles represent web pages that user navigates to from the search engine result page, double vertical lines represent backtracking to an earlier state and back arrow shows that user has requested to see a page earlier in search trail. Example in the figure shows a typical example of a search trail with query Q1 initiating the trail and user navigating to page P2 from the results page, then to page P3 and from page P3 to page P4; page P4 does not satisfy the user so he returns to page P3 which is why page P3 has the double vertical lines and then finally navigates to page P5. In this context page P2 is origin page and P5 is destination page.

Currently search engines provide only the origin page in their results, this research aims to study the value derived from following of links so that in the future search engines may offer more refined results for example showing of full trails directly on search results, query-specific and user-specific search results etc. The findings showed that following search trails provides users with significant additional benefit in terms of coverage, diversity, novelty and utility: there is a lot of value in the trail and hence we may see in future recommendation pages in Bing with an integration between the recommendation systems and search engines.

Wednesday, July 21, 2010

[From SIGIR 2010]: KeyNote on Refactoring the Search Problem

The largest forum for researchers in the Information Retrieval community "ACM SIGIR" is underway in Geneva, Switzerland and it began yesterday. The best thing about social network platforms of today is that even though you are not in the conference, you are up to date with all the talks , the papers and new innovative ideas being presented and thanks to Twitter and blogosphere much of SIGIR 2010 happenings are coming to me straight and live :)

This year's SIGIR conference has 15 papers from Microsoft Research which clearly shows Microsoft is going to put a lot of effort into IR in the near future and researchers at Microsoft are certainly working hard to make Bing better and better.

Yesterday's keynote speech at SIGIR was presented on refactoring of the search problem in which Gary W Flake of Microsoft Live Labs described and demonstrated Microsoft Pivot.

From what I see it, Pivot seems to be a cross of the aspects of HCI (Human Computer Interaction) and Information Retrieval. Watch this TED talk for a live demo of Pivot:



Pivot's claim is to get rid of the curse of information overload in this information age by making the user search experience more near to a search rather than simple browsing.........this he said is achieved by taking raw data and combining it with metadata for faceted navigation. The idea seems promising but I find it is more so borrowed from Wolfram Alpha who have already experimented with this type of search engine which they call a computational knowledge engine: http://www.wolframalpha.com

Also some hard challenges in this task involve server-side issues and a question: is this style of search a good model for all kinds of searches? That the future will tell as Microsoft has plans to integrate Pivot technology with their recently released search engine Bing.

I will be blogging more on some key papers and talks in SIGIR...........if you are interested in live updates follow on twitter with hastag #sigir2010, #sigir and #sigir10.

Friday, July 2, 2010

Paper Reviews: A Great Learning Experience

Reading a fairly good, published paper is a very different thing from reading an unpublished paper submitted for a conference. Most papers presented in decent conferences are well organized and reasonably written and those are the only papers you probably have read in classes and for your research. For a task of reviewing as part of program committee, you get to read quite different papers and many of them are poorly written in one way or another. This I realized when I had to do my first paper review task for the papers submitted for the reputed CIKM conference of this year as my Professor is part of the program committee for this conference. The paper review task is assigned to all PhD students of our lab for learning how to write a good paper the premise behind it being that if you can review papers well then you can also write good papers, luckily I was the only MS student who got to do such a paper review job because out of the assigned papers by CIKM one was related to my MS thesis topic.

The review process is a very rigorous one with a whole round of discussions between the students and seniors (PhD and PostDoc students) ; we read each and every paper carefully along with identifying the problem statement in each paper, the related works in the dimension and the solution proposed to solve the problem. We then identify strong and weak points in each paper which is of course the tough part and is the determining criteria whether to accept or reject the paper.

This whole activity although time-consuming and cumbersome offers a lot to learn specially for students like me since you are in shoes of a reviewer who reads papers you submit to conferences. You are reminded of the do's and dont's while submitting your own paper and this is the entire point of this activity. Reading the papers makes you learn how to write a good paper by following some essential guidelines and keeping in mind the mistakes you should not do at all.

Most importantly the effort that goes into the review process of these credible and prestigious conferences is what Computer Science community in Pakistan should learn from..........these days there is a whole "blogging" boom with bloggers considering themselves extremely credible about which I also wrote and criticized few days back. The people who are related to Computer Science back in my home i.e. Pakistan and in other developing countries need to learn that credibility comes from published, reviewed and innovative work. At the end of the day Computer Science is a science and has to be treated like that, the technologists might be good in their respective fields but they lack the expertise needed to make a country prosper in the long run...............scientists are needed to create technologies of the future and this is what matters when it comes to real development of a country.

I wrote a quick post to voice out my concerns for the betterment of Computer Science in developing countries and now back to review job as writing the review is another tough part of the job which I have yet to finish.