Tuesday, July 16, 2013

Google Anita Borg 2013 Annual Retreat in Zurich, Switzerland

Being recipient of Google Anita Borg Memorial Scholarship 2013 for Europe, Middle East and Africa I was invited to the annual retreat at Google offices in Zurich, Switzerland. As readers of my blog very well know that I keep a diary record of significant research events that I attend so here goes.

The Google Retreat 2013 was held from 30th June, 2013 to morning of 3rd July, 2013. The main activities of the retreat were spread over two days (i.e., 1st and 2nd July, 2013) with 30th June reserved for registration and the welcome reception at the hotel where the scholars and finalists got to know each other through a very interesting networking Bingo. The final day consisted of a very brief breakfast tram tour of Zurich.

Below is a picture of the networking Bingo given to us by Google; for those unfamiliar with the term Bingo it is a card game played in United States and Canada where a 5x5 matrix has to be completed with numbers printed on a board either vertically, horizontally or diagonally. The difference in Google's version of Bingo was however that rather than making it a game of chance it was a game of socializing and networking with other fellow scholars and finalists; and it was great to know that most of them were fan of nerdy TV shows like "The Big Bang Theory"  and took nerd as a compliment :-)

The retreat officially kicked off with Oliver Heckman, Engineering Director at Google Switzerland, giving an overview of the engineering initiatives at Google Zurich. Many amazing Google products are a result of hard work by engineers in this Europe-based Google office with some example products being Google Maps, Google Knowledge Graph, YouTube etc. Oliver also demoed the upcoming Google's Conversational Search which seems to be a great leap in the world of Web search engines.

Next up was a technical talk by Doug Aberdeen who holds a PhD with his topic of expertise being Reinforcement Learning prior to joining Google, and within Google he works with the Gmail product team on things like spam detection but more recently on my personal favorite namely "Priority Inbox". His talk was full of valuable insights for those working in Machine Learning which is why I enjoyed it a lot. Doug's talk was different than traditional machine learning talks in the sense that it considered machine learning from a practical and realistic point of view i.e., from point of view of how to approach machine learning when building large-scale products that have to be deployed in the real-world. He said that machine learning people may seem fascinated by the huge amount of data available to Google engineers but the fact of the matter is that even Google does not have ground truth labels all the time and this is where the real Machine Learning challenge comes in. A somewhat astonishing fact for me was that 90% of machine learning algorithms at Google are simple parallel logistic regression; however, parallelizing logistic regression algorithm at Google-scale is definitely something not trivial. Doug's talk was followed by a tech talk on Engineering behind YouTube and how YouTube detects copyrights' violations; it reminded me of the following TED Talk by Margaret Gould Stewart:

Moving on we entered the Product Design workshop which was a fun experience and this activity turned out to be wonderful from a learning point of view giving an interesting insight into product management. We learnt about Google's APM (Associate Product Management) program which is a two-year product management training program specifically designed for those who love managing engineers and coming up with ideas for new products; normally those who are not so good at programming and/or do not enjoy programming enter this line (with lots of those at undergraduate or graduate level). Mind you the product managers are not above engineers in hierarchy as they are simply the people who understand what products people need and then work with engineers to build that product. At the end of the session we were divided into six groups and each group had to work on one of four product ideas; my group got the School Diary idea which we had to chalk out as a product with various features. The following pictures were taken during the product design workshop:

We then moved on to the poster show where each of us presented our respective research and it was wonderful to get feedback from the fellow scholars/finalists along with Google engineers and interns. Many lines of future work came into my mind after those interactions. We were then taken for half-an-hour Office tour around Google Zurich office and the work environment there was fantastic with loads and loads of isolation compartments where programmers/engineers could lie down for a while, think alone (you know during the tough programming phases when you're stuck badly in some problems) and even talk on the phone. The entire office was full of free snacks, coffee plus various beverages and ice-cream; there was a Sky Lounge, a Jungle Lounge, Water Lounge and my personal favorite the restaurant named Fork() (yes, it is inspired from fork() command under Linux). The day's final activity was the talk by SVP of Knowledge namely Alan Eustace straight from Mountain View via video conference. Alan Eustace is the pioneer of Google Anita Borg program; he told us a bit of history behind the scholarship and some of the time he spent with Dr. Anita Borg along with some funny stories about his daughter and how he explains Computer Science to her.

The second day was full of more fun for all of us as most of it had been divided into parallel sessions based on the attendees' year of study and research interests. Following is a list of the parallel sessions with the ones attended by me in bold font:

09:00 - 11:00    Parallel session 1: Android coding challenge
09:00 - 11:00    Parallel session 2: UX web design
09:00 - 11:00    Parallel session 3: SRE Workshop
09:00 - 11:00    Parallel session 4: Natural Language Processing and Research at Google
11.30 - 12.30    Parallel session: Women in Computing
11.30 - 12.30    Parallel session:   Mind the Gap
11.30 - 12.30    Parallel session: Employbility Session
14.45 - 16:15    Parallel session: Day in life of an Intern
14.45 - 16:15    Parallel session: Interview workshop
16.45 - 17.45    Career Panels: BSc students
16.45 - 17.45    Career Panels: MSc students
16.45 - 17.45    Career Panels: PhD students 1
16.45 - 17.45    Career Panels: PhD students 2

Perhaps the session on Natural Language Processing and Research at Google was one of the most awaited and popular one with most of the attendees opting for it. During the one hour Natural Language Processing session, Enrique Alfonseca who heads Natural Language Processing division at Google, Zurich gave a talk on his recently accepted ACL2013 paper in which a headline generative system is proposed that can augment Google's Knowledge Graph. The problem is motivated by the observation that news headlines are rarely objective and every news agency reports an event differently. From a computational perspective, such noisy headlines make it hard to detect events thereby making it a significantly challenging problem to augment event-based knowledge bases such as Google Knowledge Graph. The proposed model exploits event relatedness in news collections through dependency parsing on syntactic patterns using a Noisy-OR Bayesian network. Those interested can read the full paper here. Next up was a panel discussion on Research at Google with David Harper (one of Bruce Croft's PhD graduate). This was a highly interactive panel with research scientists (who were once renowned academics) giving insights into what it's like to work on real-world products/systems used by millions of users around the world; turns out it is a whole new experience with satisfaction far more different than joy of getting your research published. I asked two significant questions during this panel from point of view of my own plans of a research internship during PhD and my ambition to remain in academia. At the end of the panel session David Harper mentioned an important resource that gives a very detailed description of how Google approaches research; it is a Communications of the ACM article that can be accessed here.

We then entered the Women in  Computing panel which was very interesting for women Computer Scientists. This mostly centered around the question of how women engineers at Google manage an engineering job in industry with kids. Google, Zurich has a flexible policy for mothers-to-be and up to 8 months of maternity leave are granted; along with that there is an option to opt for part-time work along with the option to work from home. Moreover, it is up to the woman herself how she manages the engineering role with her kids and it all comes down to priorities; for a woman kids are always the priority as a Google engineer very nicely put it, "Engineering work can be done by someone else but only I can be a mother to my child". Then another interesting perspective that came up was with respect to quality time being spent with your kids; according to one woman engineer at Google when you know you are always with your kids you take it bit lightly and the quality of the time you spend with them suffers whereas if you are working you know that all the time you spend with your kid has to be quality time. Moving the focus a bit I asked without taking names of course about the assertions by some women in CEO positions that very few women are in those roles and what were the thoughts of women engineers at Google on that to which they replied that it's all up to a person's priorities, CEO positions don't matter that much as long as you enjoy your work and life both.

In the panel session on  Day in the Life of an Intern we were told about the work routine in various intern positions at Google. There are basically three intern positions at Google: APM (Associate Product Manager) which has to do with managing products at Google thinking of new features etc., SWE (Software Engineering) which has to do with programming behind Google products, and SRE (Site Reliability Engineering) which concerns site administration to keep the Google site up and running round the clock. A typical day of an APM intern involves loads and loads of meetings with engineers, discussions on certain features of products, a lot of email communications and among other things motivation boosters for the product team. A typical day of a SWE intern involves programming on the tasks assigned to him/her for the most part with little or no administrative stuff. A typical day in the life of a SRE intern involves being on wait and rushing to situations when a complaint arrives regarding the site being down.

The last session I took was Career Panel: PhD Students 2 which mainly centered around career options that PhD students can take once they are done with their PhD. There was a very interesting friction of academia vs. industry in this panel session with some of the panelists making honest confessions of missing academia specially interaction with students and the joy of getting research published while also accepting that one of the strongest motivations in moving from academia to industry is money. In an industry such as Google things are done differently with less freedom to work on things of your choice (like in academia) and the style of work is product-centric rather than research-centric; you cannot afford to solve a research problem in its entirety as the product release has a certain timeline which has to be met. Note that this is different from the other Web industry giants like Microsoft and Yahoo! which both have a separate research division while Google has merged research scientists with engineers in all of their product teams in order to meet the ambitious goal of "organizing the world's information and make it universally accessible and useful."

Saturday, July 13, 2013

The Journey Towards Becoming a Google Anita Borg Memorial Scholar

Those of us who know me and have been following me may know that I recently got the Google Anita Borg Memorial Scholarship for Europe, Middle East and Africa. This is the first time that a woman from Pakistan has won this prestigious scholarship ever since its inception in 2007. Over the past few weeks several people (specially women in Pakistani tech circles) have requested me to share my journey towards this scholarship and what were the hurdles that had to be overcome along the way. So, here I am sharing my story for those who had requested me.

First and foremost it would not have been possible without the support of two very important males in my life namely my father and my husband. My father has a huge role because he is the one who gifted me with the best education possible throughout my childhood thereby building strong foundations for me in early days. I firmly believe my husband to be one of the finest programmers of the world and those who have worked with him can definitely bear testimony to that. My husband has a huge role in this success as he is the one who is always working hard on me to polish my programming skills (giving me useful advices at every stage of life be it technical or any other matter pertaining to life). For a woman to be successful, it is very significant to have the support of male members of her family and this is what completes a life of a female member in the family despite the fact that media continuously reports negative things; the reality has been different throughout my life and also in the life of those whom I know back home in Pakistan. By splitting family apart no entity of family can function better and I would compare family to a running engine with each part playing an important role.

Coming back to the story it all began with the nights I used to spend in solving tough mathematical problems during my O-level days. When compared to the matriculation system, we have a considerably different and tougher Mathematics curriculum in O-levels (with subjects such as Probability and Statistics, Differentiation, Vectors etc. included and which normally Matriculation students study at a later stage); more than the curriculum I very well remember the role of my teachers who kept re-iterating their pride in me when I successfully solved a Mathematics challenge problem (our O-levels book had some of those in every activity and normally I was the only one in class who solved them); the joy of getting praise from your Maths teacher for solving a problem that no one in the class was able to solve was simply out of this world and it kept me going until the undergraduate stage came where I had to decide my major. On account of my love for Applied Mathematics a natural choice was Computer Science. This new world both amazed and baffled me for I had no prior experience in programming but challenges are one of the biggest motivators towards the path of learning and even history bears testimony to that; greater the challenges in one's life greater he/she is able to learn to overcome them.

Right in the beginning of my undergraduate years I came across some highly innovative and selfless people and together we formed the first ever open source students body BloX in our university, under BloX I imparted useful Linux knowledge to my juniors and helped them in getting a grip over fundamental Linux concepts. Mind you I have completely discarded anything to do with Windows as of now and am a proud Linux convert; and I also attribute a great deal of credit in my success to this wonderful operating system which always teaches you so much about the world of Computing. Many of those who had joined BloX in its initial days left it; it turned out they were after the fascination of it all as BloX got to represent Department of Computer Science, Karachi University in ITCN Asia 2004. Soon after ITCN Asia 2004 when actual Linux development had to be done not many wanted to go for it as it was not the "in thing in market" and could not guarantee a job which seemed to be the only purpose of Computer Science undergrads those days (this remains true to this day) and very few cared about the knowledge of science behind Computers. We finally had to dissolve BloX but the experience left us more motivated and charged; today a smile comes to my face thinking of those fun-filled days. I along with my colleague (who happens to be my husband now) kept doing the fun things in the world of Computer Science winning software competitions along the way, developing our own research-based Linux distribution called PAL Linux which was also distributed to all students of Parallel Computing final year course and finally getting our very own research paper published (it was about redefinition of images so as to enhance semantic search over them). All this while our colleagues started internships/jobs in reputed software houses of Pakistan and they had already begun to make money adding to the peer pressure; however, we kept going despite the odd questions we faced with regard to our career after BS (Computer Science). I did however join a small, unknown software house and I very well remember the critics of this decision from among my class mates; however, that was only to keep some amount of money coming since we needed funds for both marriage and MS abroad (by this time we had made up our minds to pursue an advanced degree in Computer Science).

South Korea seemed to be the best choice for both of us as there was tuition fees exemption along with a stipend to cover living expenses and KAIST happens to be the MIT of entire Asia. I felt more passionate when Professor Kyu-Young Whang of Database and Multimedia Laboratory in KAIST was ready to support our application as married students. Despite the fact that to many, South Korea was an unusual choice, and in their ignorance (underestimation of South Korea as significant entity in scientific world) everyone seemed to be advocating for United States as ultimate destination for Master's degree in Computer Science, we knew we had made the right choice and time bore testimony to that. KAIST turned out to be a life-changing experience and I can easily say it made me learn more than what some of my seniors doing MS in Europe or United States learnt. Professor Whang is an ACM Fellow within the Database community and a Computer Science legend within himself; he made us spend hours in the lab (sometimes we would work for more than 16 hours a day and during my Master's thesis defense I spent three days plus three nights straight in the lab with my husband cooking noodles for both in snow using a portable stove). I attribute much of my Computer Science research skills to Professor Whang and his PhD/PostDoc students who taught us valuable stuff behind coming up with a research statement, identifying open issues in current state-of-the-art within a field, design of solutions for solving a research problem in Computer Science, programming in the best way possible so as to keep systems scalable and useful for generations to come and writing your papers as clearly as possible adhering strongly to scientific method of passing knowledge. This article of mine on "Programming vs. Coding" was a result of some of Professor Whang's advices during his Database class and I did mention this article in my Google Anita Borg application. All this time we maintained links back in Pakistan and students kept writing to me for advices on career paths; I took out time to answer them and to always stay in touch with my roots back home.

During our respective PhDs, we wanted to explore a different region and Europe was our choice with flexible, caring supervisors and excellent research opportunities to come up with our own problem statement. Adding to this is wonderful experience of my current PhD supervisors namely Colm O'Riordan and Gabriella Pasi who always have enriching research directions from within information retrieval and fuzzy logic; and they provided us with what was missing in South Korea i.e. the opportunity to form research networks around the world and freedom to pursue paths we choose best for ourselves. Lastly, and most significantly, we still maintain a presence in Pakistan via our own research lab within the Computer Science Department of the Institute of Business Administration, Karachi, Pakistan - an experience that could be characterized as both exciting and frustrating. At times it is really painful to argue for hours with people in academic circles back home on the usefulness of a research lab and why it is essential to conduct scientific research. In countries like Pakistan, universities focus mainly on teaching, as there is insufficient support for research (mainly due to economic problems). I am constantly working to break this culture; I work with various students from time to time where I assist them for their thesis or final year projects motivating them for novel research ideas in the domain of Web Science. The Web Science and Technology Research lab, despite still being in its infancy, has been successful. Last year it was represented at the International Conference on World Wide Web, one of the most prestigious conferences in my field.

As a summary here are some tips for those who asked
1) Value people and treat them with respect as you can learn something from each and every person you come across. Take out time to reply to emails of those expecting something from you or reaching out to you even if its a very small matter; it does make a lot of difference at the end of the day.
2) Speak less and do more; there are times when actions mean everything and you have to give up ranting about things like success and rather take steps to achieve your goals. Remember procrastination is human's worst enemy.
3) Don't keep complaining about your circumstances as they are never easy for anyone. I remember a time when I did not have money to buy a Computer table and I had to program sitting on the floor; I didn't complain then and today I own around three computers/laptops in various parts of the world.
4) Communication skills matter a lot and it is extremely important to market yourself in the best way possible. Everyone has something special and he/she just needs the right way to market that something special.
5) Love technical stuff (not just gadgets) but science behind the things; do not kill your intellectual curiosity by settling for "glittering" gadgets and instead focus on innovative ideas stemming out from your gadgets.
6) Do not pay much heed to critics of your decisions for they are there to make you firm. This does not mean not paying attention to meaningful advices from people that matter but remember most of the criticism comes from people who know not.

To end on a humorous note, I made a funny meme which sort of was my reaction when people acknowledged my achievement in big words. This is not intended to make anyone feel bad and is just pure humor.

Saturday, November 10, 2012

Mining Tweets of World Cup T20 Match between India and Pakistan: Interesting Insights from Social Network Analysis

I have been quite absent from this space I call my blog for quite sometime now and this is not without reason. The past few months have been extremely busy with lots of traveling (Milan, Venice, Rome, Nijmegen and Copenhagen all in three-four months' time),  and of course the never-ending paper submissions. As I had explained in my previous post on online education initiatives I am also taking the Computer Science online courses on Coursera and this semester I happened to take up a very interesting course by Lada Adamic of University of Michigan (Social Network Analysis). Though I have myself taught some aspects of Social Network Analysis during a summer course at Faculty of Computer Science at IBA, Karachi but despite that I found this course intriguing and the way Lada enriched it with cool applications of SNA was simply amazing.

As an optional part of this course the students were to submit a programming project and I thought what better opportunity than this to submit a part of the TweetCric project being undertaken by our research group. It's always good to get some early feedback on your work in order to gain useful, innovative directions and hence, I decided to blog about my Social Network Analysis project. Readers are welcome to suggest any new directions or give their feedback in comments so as to help us in this project. Following is a description of the project for interested readers of my blog:

Social media applications have considerably influenced the lives of millions and everyday there is a huge amount of updates to various social networks such as Facebook and Twitter. As of March 2012, more than 400 million tweets were being posted on Twitter each day. The volume of tweets becomes significantly high during a sporting event as many sports fans now use social media as a part of their viewing experience. Users describe this as an experience full of pleasure and fun as described in following Facebook status update during the recent World Cup T20 match between India and Pakistan:

"Facebook comments are more interesting than the match. Already more than two pages of comments. Looks like PakInd Vs Facebook"

Interestingly, the huge amount of content produced during sporting events can be used for analysis of players' performance and in light of that sports managers can decide future sports strategies and hence the notion of crowd-sourced sports critics can be realized in practice. Researchers have already begun to explore the possibility of using this huge volume of user-generated content to solve various research issues such as event detection, video annotations for sports summaries etc. [1, 2]. We argue in this work to utilize this huge crowd-sourced content for the usefulness of sports strategy analysts and decision-makers. In this work, we use social network analysis to highlight significant players during the match along with an analysis of the reasons of why social network analysis methods detect these players.

Social Network Modeling
The data was obtained using the Twitter Search API. During the epic match held on September 30th 2012, we gathered tweets for the match using the Twitter Search API. We regularly queried the Search API through a Python script on half-hour intervals thereby collecting fresh tweets as the match progressed. In total we collected a sample of 43,450 tweets during the match with hashtag PakvsInd.

We modeled the social graph of the players and commentators using the text content of the tweets. First, using Wikipedia and ESPN CricInfo as an external resource we compiled a list of players and commentators relevant to the India-Pakistan cricket match. This list was then used to detect tweets containing a mention of any player or commentator; following list shows some sample tweets

  1. hafeez goes, 15 from 28 balls.. idiot, wasted his time big time. game over. #pakvsind
  2. like the world cup of pakistani batsmen falling against yuvraj. kamran also departs edging to dhoni. pak 56/4 after 9. #pakvsind
  3. rt @maria_memon: rt @maria_memon: afridi! quit playing games with our hearts....our hearts....#pakvsind'
  4. hafeez is the reason for todays batting performance.... after nazir he put all of the team under pressure! #pakvsind
  5. is dhoni trying to piss off pakistanis by bringing in kohli? #pakvsind'
We now explain how we formulate the nodes and edges in our social network of players and commentators. Each player/commentator is treated as a node and an edge is represented between players/commentators if they co-occur in a tweet. As an example consider tweet 2 above; there would be edges between yuvraj, kamran and dhoni according to our model. In total 8,587 tweets (19.8%) contained a mention of some player or commentator.

The following figure shows the visualization that was obtained from this social network (Gephi was used for the generation of the graph)

Modeled social network of players/commentators during World Cup T20 India-Pakistan match
As clear from the Figure, there are three communities within this social network. Nodes are sized according to betweenness centrality and it can be seen that Hafeez is the node with highest betweenness: this is because this particular player was the captain of Pakistani team in that match, Pakistan lost the match due to his poor captaincy, poor fielding placements and poor batting (as per most of the tweets). The node with second-highest betweenness i.e. Kohli is the one who got man of the match and scored the highest runs leading India to a comfortable victory. Hence, it can be seen that social network analysis gives important insights into sporting events. Natural language processing as an alternative approach seems to lack the precision and efficiency that social network analysis offers. Our team has been long arguing for a hybrid approach that utilizes both natural language processing and social network analysis approaches to address the various research questions in the fields of Information Retrieval and Web Information Systems given the low scalability and speed of Natural Language Processing alone [3].

We now analyse the communities within this dataset. The community represented in blue is mostly comprised of Indian players and it makes sense as to why they form a separate community. However, the inclusion of Misbah and Ajmal in this community is weird since both are Pakistani players - further analysis reveals as to why this occurred and it was due to Ajmal taking important wicket of Sehwag causing Ajmal to go into that community and Misbah being mentioned with Ajmal once forced him there too. The community in dark green represents for the most part Pakistani players with the exception of Dhoni who is the Indian team captain; this however occurs due to Twitterers comparing Hafeez's captaincy with Dhoni's captaincy thereby forcing Dhoni in that community. Lastly, the community in aqua green represents players who did not play in the match with the exception of Afridi and he was forced into that community due to tweet suggestions from Pakistani cricket fans of dropping him and including him in the list of those not playing the match.

Lastly as I mentioned in the beginning of this post as well any feedback or idea is welcome. Interested students who want to join this project are requested to contact me personally via email or social networks.

[1] J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, IUI ’12, pages 189–198, New York, NY, USA, 2012. ACM.
[2] A. Tang and S. Boring. #epicplay: crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pages 1569–1572, New York, NY, USA, 2012. ACM.
[3] A. Younus, M. Qureshi, F. Asar, M. Azam, M. Saeed, and N. Touheed, “What do the average twitterers say: A twitter model for public opinion analysis in the face of major political events,” in 2011 International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2011, pp. 618–623.

Friday, May 18, 2012

Online Education Initiatives: A Hope for Education in Less Developed Countries

I very well remember my time in the Computer Science Department of Karachi University when teachers who did not take classes really annoyed me. Other class-fellows would call me crazy on account of being so nerdy but I knew this was a valuable period of our life which would never come back. This was the age where the mind is ready to absorb all knowledge which thanks to our messed-up education system (not to forget the loads of politics that pollutes it) was literally being wasted. The dilemma of technological sciences such as Computer Science in lesser developed countries like Pakistan lies in it being more of a hype than a science. In my part of the world students flock to Computer Science to get good jobs after graduation: of course this is a necessity and the point of a good education but shall it be the only goal is the real question we should address.

Back then there was a frustrating time when the Object-Oriented Programming teacher gave us the option of either to learn OOP concepts with C# or C++, and, unfortunately most of the class went for C# due to its being in demand by the job market. At that point, I realized how tough a time Computer Science will get in Pakistan and this remains true to this day. Sadly not only students but teachers have also promoted the job-oriented study model leading to a myth that Computer Science is all about sitting on a desk writing code in .NET or PHP (or any other programming language for that matter).

Meeting the well-known scientist, Rakesh Agarwal from Microsoft Research confirmed my assertions about the pathetic state of affairs of technological sciences in countries like India and Pakistan. He shared the same dissatisfactions as me, and strongly criticized the industry in the lesser-developed countries. Equivalently sad is the state of affairs at the national universities in South Asia, and the situation is changing at a very slow pace. When Stanford announced its online courses, I saw acquaintances in my social network sharing about it and the ones most excited about these online courses were undergraduate students from institutes of my country. This as I see it is a silver lining admist the dark clouds as online education initiatives like Coursera, EdX and Udacity will now grant access to quality education to students from all over the world. This in my opinion is a huge step towards bridging the digital divide and it is now upon students in the developing world to make most of this opportunity. Today's connected society gives easy and massive access to knowledge unlike the situation I had back in my undergraduate days and I feel students today are far more blessed than students of my time.

What started as an educational initiative by accomplished Stanford Professors Daphne Koller and Andrew Ng has now turned into a global phenomenon with the best universities contributing to make knowledge open for all. If studying at world's reputed universities (Stanford University, MIT, Harvard, University of Michigan, University of Pennsylvania  etc.) was ever your dream then there can be no better time to go and get that dream. Some students might take this as an exaggerated statement but this comes from me after personally taking two online courses this semester and enjoying them to the maximum. Furthermore, Coursera statistics also confirm the value that online education has now added to universities; they could never have achieved this value as Andrew Ng puts it: "I normally teach 400 students," Ng explained, but last semester he taught 100,000 in an online course on machine learning. "To reach that many students before," he said, "I would have had to teach my normal Stanford class for 250 years."

It is a generally held notion that the academic culture and the styles of teaching in our part of the world are out-dated and boring. I can certainly confirm this assertion on account of my experience in Pakistani academic circles for quite sometime now. For the most part, higher-education circles in developing regions limit ideas to an academic document on a shelf quite unlike the way that things are done in the top research universities of the world. Students have always wanted to know how the ideas that they study in the classroom apply to the real-world problems around them. With world-class Professors offering online courses, there is an oppurtunity to get much of those questions answered.

Online education as a phenomenon is not new and for years people in less developed regions have been skeptical of them but it's quite different with Coursera and other similar initiatives. The revolutionary ideas behind these initiatives are the concept of testing, grading, student-to-student help and awarding certificates of completion of a course. Daphne Koller, a Stanford computer science professor who founded Coursera with Ng, explained in her talk at LinkedIn last week, "It will allow people who lack access to world-class learning - because of financial, geographic or time constraints — to have an opportunity to make a better life for themselves and their families."

So the next time students come to me seeking advice on how to start with research or how to apply for foreign universities I'd recommend him/her to take some courses (that relate to his/her area of interest) on Coursera or any such platform. With such initiatives coming from the world's top-class universities there is a hope for revolutionization of higher education by allowing students from all over the world to not only hear top-quality lectures, but to do homework assignments, be graded, receive a certificate for completing the course and use that to get a better job or gain admission to a better school.

Sunday, April 15, 2012

WWW2012 Poster: New Media vs. the Old Media

Today's social-media savvy age has considerably changed the paradigm of traditional journalism. Interestingly, it has also led to new debates within the journalism and media industry with supporters of social media terming it as a platform for the masses' voice while opponents terming it as gibberish and noise. Old-school journalism disregards the significance of social media popularity for any article on the pretense of “journalism is not about feeding the masses with whatever crap they want to be fed with.”

It turns out that this entire debate is not as simple as it appears to be on the outlook. What old-school journalism advocates do not take into account is the age-old phenomenon termed as “media bias” by the social sciences research community. A famous paper published in 2004 by the Department of Political Science at UCLA and the Department of Economics at University of Missouri studies the bias of famous news outlets in the US. Since then there have been various attempts at studying biases in traditional media platforms (such as New York Times, Fox News, Washington Post, CBS, Wall Street Journal) with most of these coming from the sciences (social science, political science, Computer Science). Empirical evidence is what is given utmost importance from a scientific viewpoint and unfortunately the social media circles in Pakistan tend to ignore this angle altogether. This brings into the picture a new phenomenon of bias measurement in various forms of media which turns out to be a huge research challenge within itself. The solution: yes, social media with the insights and popularity judgements can serve as a tool not just for the masses' voices but also for measurement of bias in traditional media and this is exactly what a team of researchers in IBA's Web Science group have done.

The crucial nature of the media industry makes it all the more essential to have ways and means of verification of its content. This leads to the natural question of how new media namely the social media can help measure the inevitable biases inherent in traditional media. Few of these questions have been answered by researchers from one of Karachi's most prestigious educational institute, Institute of Business Administration whereby they investigated differences between news appearing on traditional and social media platforms via publicly available data from famous microblog site Twitter. Being a part of this team made me delve deeper into various aspects of media both internationally and in Pakistan with my observation being that today's media tend to ignore the crucial role of social media and does not take into account popular demands. With this conclusion, we argue for a paradigm shift in how traditional media platforms perceive the new media landscape and the sooner they embrace this new world the better for their own survival.

Some technical details of the study warrant an explanation which is as follows. The data mining similarity metric of Jaccard Similarity has been used to investigate the differences in named entity coverage between the 16 million tweets posted during the time period of Egypt uprising (tweets' data obtained from TREC 2011 microblog track) and the New York Times articles corresponding to Egypt. The figure below shows our results:

It demonstrates a significantly low value of coverage (Jaccard Similarity being below 0.5 for all days) thereby proving the presence of media bias. Moreover, we extend this study to a local level (for Pakistani media outlets) on a daily basis for the month of November. The extension utilizes topic models (specifically standard LDA and Twitter-LDA) in order to discover similar topics in the two media followed by a ranking function which computes popularity of a news item in the two platforms. This is then compared with a manually ranked list with the final result being that the ranks obtained from social media (tweets data) match the human-annotated ranks more closely.

For those interested, here's the abstract of our paper:
It is often the case that traditional media provide coverage of a news event on the basis of journalists’ viewpoints - a problem termed in the literature as media bias. On the other hand social media have given birth to an alternative paradigm of journalism known as “citizen journalism”. We take advantage of citizen journalism to detect the bias in traditional media and propose a simple model for empirical measurement of media bias.

Note: This is part of a long-term project by the Web Science research group at Institute of Business Administration, Karachi, Pakistan and we welcome interested students to be a part of our project.

The slides for the work can be viewed here and the full 2-page poster paper can be downloaded from here.

Sunday, March 4, 2012

Three cheers for Professor Moon

Few days back when I read in my Facebook news feed an update from Professor Sue Moon that she is now tenured Professor at KAIST, I was immensely delighted. This post is a special tribute to Prof. Sue Moon from a student that did not get to spend much time with her but whatever time I spent with her played a huge role in my learning path. It all began in Spring 2009 when I took up Professor Moon´s course on Advance Networking. At first she sounded hard to impress but then I figured out it´s her way of teaching the students. The Advance Networking course she was teaching us was special, it turned out to be one of the toughest and yet greatest learning experiences of my life. She had specially designed the course keeping in mind the struggles young researchers have to face. Throughout the semester we were expected to read papers, write a critique of the paper and present some of the selected papers in class as if they were our own papers. This activity turned out to be quite hectic and each student used to dread the day when he/she had to present and one strong reason for that was Professor Moon´s fiery questions about the technical aspects of the paper. She used to spend hours in polishing our presentation and paper reading skills asking us to read papers from a critical angle so as to highlight its strong and weak points. She taught us a skill that is very valuable in the scientific community and that skill was captivating the audience when giving a technical talk, this rare skill is seriously lacking even among the best scientists of our community.

The semester ended and we all got back to our busy research life at KAIST but then in later parts of my Master´s degree I realized that her teaching and the way she groomed us in that course was extremely helpful. She literally taught us how to fall in love with research: an ability quite rare even among graduate students in world´s top universities. She keeps these technical how-to talks on her Web page and I have gone through all of them, I would definitely recommend these for all aspiring Computer Science researchers out there.

I want to particularly thank Prof. Moon for all she gave me. Knowledge, in my opinion is a priceless gift by itself and I am out of words to express my gratitude to her. Thank you Professor Moon for playing a role in my research path, your training has proven to be a great gift for me. Although my own Master´s advisor Professor Kyu-Young Whang taught me the most during my stay at KAIST (his training has also been invaluable in shaping me up as a researcher) but Professor Sue Moon is special due to the fact that she is one of the most outstanding women in Computer Science I have known. This field surely needs more inspiring women like her. I hope to meet her some day in order to thank her in person.

Saturday, August 27, 2011

Visit to Russia: RuSSIR/EDBT Summer School

Although I constantly microblogged on Twitter during my trip to Russia but nothing replaces a detailed blog post when it comes to coverage. I definitely wish to have an archive of details for myself and Information Retrieval (with of course other related areas) students around the world. I along with my husband and colleague Muhammad Atif Qureshi visited St. Petersburg, Russia from 14th August, 2011 to 20th August, 2011 for attending the prestigious Russian Summer School in Information Retrieval (RuSSIR) which was co-located with Russian Young Scientists' Conference where we presented our research work. This year's RuSSIR was quite special as the EDBT summer school was also co-located with it and as such the breadth and depth of the lectures presented at the school was immense. Here is a brief overview of the lecture sessions that I attended along with a good news for students in Karachi, Pakistan.

SocM Session: The Social Mining session was conducted by two well-known industry people namely Vladimir Gorovoy of Yandex and Yana Volkovich of Barcelona Media Innovation Center. It was highly interactive and practical with a practical recommendation task for students for which they were provided with a real dataset from Yandex Market. Here is a link for students who wish to try it out: Yandex Market practical task from RuSSIR. The session fundamentally covered various aspects of mining social media data, it began with a very correct observation borrowed from Google's analytics evangelist Avinash Kaushik that "Social media is the hot thing today, almost every one seems excited to get involved in it but no one actually knows how." This session covered that how with a glimpse into graph mining methods (PageRank, TunkRank and TwitterRank being some examples), models for opinion mining of reviews left by customers, social media engagment metrics and social innovation platforms for the future. In short, it was an extremely engaging and knowledge-enriched session particularly helpful for social media analytics students: I learned a lot during the course of this session and am particularly thankful to Dr. Yana Volkovich for some of her wonderful suggestions that will really help me in my own research.

Plenary Session (Knowledge Harvesting from Web Sources): I found this session very informative and full of pointers for new research ideas although it was a bit away from my own research area. Gerhard Weikum (Research Director at Max-Planck Institute for Informatics (MPII) in Saarbruecken, Germany) presented a comprehensive overview of research methodologies that can turn the Web into a large-scale Knowledge Base and few examples of such Knowledge Bases include DBpedia, KnowItAll, ReadTheWeb, and YAGO-NAGA, as well as industrial ones such as Freebase and Trueknowledge. The tutorial presented research methodologies along the avenue of knowledge harvesting with some examples of work on unification of WordNet and Wikipedia in YAGO, identification of a long tail of instances of entity classes through harvesting textual snippets on the Web and entity search through language model ranking. Overall the session was intense and the slides quite heavy with lots and lots of natural language processing material but definitely a great learning activity from the point of view of tools to use for your own research.

SentA Session: This session was one of the most exciting ones for me as my own research centered around Sentiment Analysis. Professor Mike Thelwall who heads the Statistical Cybermetrics Research Group at the University of Wolverhampton delivered the talks in this session and it mostly centered around the Sentiment Strength detection tool of his research group namely SentiStrength. We were also taken through a live demonstration of the tool after which Professor Mike Thelwall explained in detail its various features along with the underlying algorithms and its experimental evaluations. The SentiStrength team has done a pretty good job at managing this tool and the best things about it is that the word list marked with a word's positive/negative strength is publicly available for research purposes. During this session students were also introduced to machine learning methods of Sentiment Analysis with detailed explanation on feature selection, gold standard creation and 10-fold cross validation. To sum up, this session was extremely useful for students wishing to make a career in Sentiment Analysis and I specially thank Professor Mike for his valuable suggestions on various aspects of the field.

ColIR Session: This was a short session conducted by Chirag Shah of Rutgers University. It touched completely new dimensions within the field of Information Retrieval namely Information Retrieval facilitated through collaboration. According to Professor Chirag Shah with the emergence of collaborative Web platforms, information retrieval has also moved towards a completely new dimension. The traditional view of IR is that it is an individual activity: the Collaborative IR community challenges this notion by describing it as a co-ordinated activity and they have also proved their ideas in both theory and practice. This session covered both the theory and practice behind collaborative IR situations, systems, and evaluation techniques.

TopK Session: This session presented by the two charming ladies Sihem Amer-Yahia and Julia Stoyanovich was simply fantastic. We were introduced to a whole new approach of solving some of the toughest problems in social media and this approach comes from the old, classical database field. The session mainly centered around Top K processing, one of the well-known methods for ranked retrieval within the DB-IR research community, which was presented in a unique manner with a special focus on applying it to search and information discovery on the Social Web. Such applications were discussed from two significant viewpoints: 1) efficiency (minimizing both space and time requirements) and 2) user satisfaction. Both the researchers presented a comprehensive overview of their papers published in top Database and Information Retrieval conferences: VLDB, ICWSM, SIGMOD and ACM HT. Their research within the efficiency dimension was based on incorporation of upper bounds on classical top-k algorithms (threshold algorithm and no-random access algorithm) in order to minimize time and space complexity. Their research within the user satisfaction dimension presented the fundamental idea of scaling up user studies to thousands of users through leverage of crowd-sourcing platforms such as Amazon Mechanical Turk.Currently I am reading these papers to look for dimensions that can be applied to my own research in Social Media Analytics.

Here is an archive of tweets during my attendance at RuSSIR:

#RuSSIR sessions kick off with interesting presentation on Social Media Mining by @yvolkovich and Vladimir Gorovoy

Not many people know abt. a social network exclusively devoted to travel and hospitality: CouchSurfing

Can an online social network build enough trust to allow strangers to sleep on each others’ couches: Adamic's paper http://bit.ly/prdxTy

"The Web today is the largest knowledge encyclopaedia - we need it to turn it into a comprehensive Database" - Gerhard Weikum at #RuSSIR

In a very interesting talk by Mike Thelwall explaining the working of the famous sentiment analysis tool SentiStrength #RuSSIR

Automatic sentiment analysis has more or less the same accuracy as human sentiment analysis due to complexity of problem - Mike Thelwall

A look into inside of Yandex Market by @vgorovoy in session of Social Media Mining http://twitpic.com/66vfkp

Interesting talks in TopK session at #RuSSIR: essentially about converting social media research problems to traditional database problems

Researcher from Barcelona Media Innovation Center explains the science of social media mining #RuSSIR

Mention of work of KAIST's @sbmoon in #RuSSIR in Social Media mining lecture

Andrey Plakhov explains how entity-oriented search works at Yandex: Russia's search engine that has larger market share than Google Russia

Wonder where this rule came from #RuSSIR #Yandex http://twitpic.com/67e4w1

Sihem Amer-Yahia of Qatar Computing Research Institute continues day 3 of session on TopK Processing for Social Applications

Wonderful graphic by @yvolkovich on visualization of social media conversations during Spain protests #RuSSIR

Take-home from ColIR session: Science is all about collaboration unlike the Humanities #RuSSIR
AlJazeera English tracking information of users who visit the site for improved user experience - Sihem of QCRI at #RuSSIR

SearchTogether by Microsoft Research takes user-mediated Collaborative Information Retrieval one step ahead #RuSSIR

ColIR session: reason behind failure of Google Wave was the difficulty of the system requiring a 60-minute video tutorial #RuSSIR

Take-home of TopK #RuSSIR session: Social Web is full of challenges, our online social experience will be as good as we researchers make it

A week of super-duper learning and knowledge-sharing, intense discussions and lots of research take-aways. Hats off to #RuSSIR team!!

In short Russia is a wonderful place to visit and St. Petersburg is mind-blowing. Russian people are extremely hospitable, friendly and what's best about them is their love and passion for Mathematics. All in all Russia is a great place to visit if you are a Computer Science researcher as it is full of wonderful Computer Scientists both established researchers and young science-aspiring students.

At the end I am glad to announce that Web Science group at Institute of Business of Administration will conduct an open seminar which will educate Pakistani students in some of the above-mentioned topics. Feel free to contact me in case of any suggestions for the seminar, or any topic you wish to include.