Tuesday, September 14, 2010

Experience at TEDxKAIST: Happiness for Science and Science for Happiness

Almost all of us know about the TED platform which stands for Technology, Entertainment, Design and it is an annual event where world's leading thinkers and doers are invited to share what they are most passionate about. The TED platform gave birth to an accompanying, new initiative called TEDx and this is a new program that enables local communities such as schools, businesses, libraries, neighborhoods or just groups of friends to organize, design and host their own independent TED-like events. Such an event was organized in KAIST on 11th September, 2010 by some students of KAIST and the team was wonderfully led by Mark Whiting, a Masters student at the Department of Industrial Design, KAIST. They called it TEDxKAIST and it turned out to be an energizing event for the hard-working KAIST students.


KAIST is one of the nation's most prestigious science and technology institutions and keeping this in mind the theme was well-thought: "Happiness for Science and Science for Happiness" - its a significant one as KAIST is all about hard-working students actively engaged in scientific contributions and advancement. There is a famous saying for KAIST students that KAIST never sleeps - the KAISTIANs struggle hard to survive in the seeming paradox of hard work and true happiness. The inspiration for the theme came from this excerpt:

"Of the more highly educated sections of the community, the happiest in the present day are the men of science. Many of the most eminent of them are are emotionally simple, and obtain from their work a satisfaction so profound that they can derive pleasure from eating and even marrying." - Bertrand Russell (1930); The Conquest of Happiness


In this blog post I will share my take aways from TEDxKAIST and some of the key points from the speeches that all people associated with science should keep in mind to become a "happy scientist contributing to a happy world." Following is a brief profile of each of the speakers who spoke in TEDxKAIST and gave their view of a happy scientist:
  • Dr. Young Hae Noh who is a Professor at School of Humanities and Social Sciences at KAIST and has also served as dean of multiple departments at KAIST.
  • Dr. Minhwa Lee is the Business Ombudsman who makes a link between government and the small and medium businesses; he is also a Professor at KAIST.
  • Spanish Koffee, a very famous music group in Korea which pursues free distribution of digital music in their mission of "Passion worth Spreading."
  • Dr. Woonseung Yeo is a Professor at Graduate School of Culture Technology at KAIST and his PhD work at Stanford university includes introduction to the field of sonification which implies transmitting information through audio signals.
  • Sungdong Park is the CEO of Satrec Initiative which is the world's leading company in high-performance, cost-effective Earth observation small satellite solutions. He won a Civil Merit Medal, a presidential commandment and an Industrial Service Medal for his contributions to Korean space science and technology.
  • Byungwoo Jang is the CEO of LG OTIS and has served LG for many years. He comes from a family of great scholars of English literature.
  • Dr. Don Norman is a distinguished visiting Professor at KAIST and holds many other significant positions around the world. His work has resulted in a number of influential books including “The Design of Everyday Things” and most recently “Living With Complexity.”
Professor Noh began her speech with a quote on definition of success by Benjamin Zander, "Success is not about wealth, fame or power; it's about how many shining eyes I have found." She shared her story about her musical classes - a love story but a very different one: a Professor-student love story. She shared her tips on being a successful Professor - a Professor that brings out the talent in her students to the full, that is both loved by the students and loves the students and a Professor that incites passion and enthusiasm in the students which in my opinion is quite lacking in a majority of today's students. She advocated the idea that Professors should give freedom to students by allowing them to discover their potential and greatness in a journey of their own and at the same time Professors should be keen observers of students and should extract joy in discovering interesting features of their students.


It was really interesting to see and actually observe the scientist's definition of happiness: surpassing challenges and overcoming obstacles; sharing and inculcating passion all around is what happiness is from a scientist's point of view and this view came out more clearly in the talk by Sungdong Park. His story was one of courage and bravery, of making the impossible possible despite all hardships and of rising after setbacks. He shared a newspaper cutting which said, ""First Park Sung Dong got mad. Then he got even." Before establishing Satrec Initiative he was the leader in developing advanced small satellites in KAIST for 10 years - but then something happened which eventually led him to the success he enjoys today but the path was not easy. His government lab was laid off; it was a hard time but he did not lose hope and launched a venture with his old lab's technology. His vision was to make all of SATREC's engineers become millionaires - apparently a crazy idea but with passion and devotion Park made this possible and today Satrec is the only private company in Korea that is a member of the International Astronautical Federation (IAF) and is deploying satellite solutions for Dubai, Malaysia, Singapore and Turkey.

Another talk that inspired me a lot and in which were the things I have always advocated for science and engineering students was the talk of Byungwoo Jan, the main theme of the talk being technology needs art. His talk was about importance of literature for science and engineering students - without literature any student is incomplete for literature is a way to imagine yourself in the position of another person. Today there is lack of feeling of the pain of others which is making the world an insensitive place - one way to overcome this is through literature. The LG OTIS CEO highlighted how reading books makes life more meaningful and transforms individuals - many successful people have literature behind their back. Thomas Edison is reported to have read 3.5 million pages a life and think of all the imagination and creativity he derived from all these books. Abraham Lincoln had an unfortunate childhood, his life was transformed completely after reading the biography of President Washington and he decided to become a President. Reading books and works of literature that today's students of science and engineering do not do nor enjoy much is a very healthy habit for the mind and can be a new source of creativity and inspiration for tomorrow's scientists so they must not give up this habit.

At the end was the talk of Professor Don Norman which was undoubtedly the highlight of the entire event. The thing that was really surprising about this talk was that he did not use any slides, instead he drew all the material he wanted to present on a white board and the talk was inspiring indeed with lots and lots of lessons for people of science and engineering. The talk was fundamentally organized around the following

He first asked the audience about the ones who were happy and ones who were not and then moved on to say that those that said neither happy nor unhappy made a smart choice - because if you're happy then it means you are not doing well in your pursuit in life because on every path happiness comes with a lot and lot of unhappiness; being successful means not going through the normal way but through lots and lots of pain and difficulty. He then explained further about the happiness and sadness - it is just a state which can be measured and when on the path of achieving something one should not worry about being happy; satisfaction and dissatisfaction - it is a judgment which no one can measure except a person himself/herself and optimism and pessimism - these are points of views and this is what determines everything. As an example on point of view he explained the fear that a human feels when asked to walk on a plank placed in mid-air as opposed to no fear when he is asked to walk on the same plank placed on the floor meaning that points of views are driven by a human's emotional system, his approach and instincts and this has to be the driving factor if a scientist is to derive happiness from his science - happiness for both himself and the world.

He related a story about his experience at Apple which shows how a fusion of happiness and anxiety can lead to success in science - his tip was that when thinking about new ideas and when embarking on journey to creativity one must have fun, relax, be in a comfortable state of mind but when decision has been taken on some idea then accomplishment comes through anxiety and a worried state of mind. Lastly his talk threw light on the paradox of urgent problems vs. the important problems - it is the important problems that need to get done first because what you want to do in life is the important thing and that makes the difference.

This event was a great experience and a memorable one during my stay at KAIST and surely the lessons and tips given here will help me throughout my academic life.

Wednesday, September 1, 2010

Coling 2010 Workshop The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

Saturday, 28th August, 2010 was a long and eventful day as I attended a workshop co-located with COLING 2010 in Beijing, China. The workshop was organized by the Ubiquitous Knowledge Processing (UKP) Lab in the Department of Computer Science at the Technische Universit├Ąt Darmstadt, Germany. The main theme of the workshop centered around collaboratively constructed semantic resources and their role and influence in Natural Language Processing researches of today. Though this workshop was organized for the second time and had a small community with new faces appearing this time yet the way the discussions were carried out seems to offer much promise for this community.

The following diagram gives a pictorial representation of what the workshop was all about:


Within the natural language processing domain there used to exist what we can call a knowledge acquisition bottleneck; earlier this bottleneck was resolved through development of semantic, lexical resources by experts; WordNet being a typical, classical example. However with the emergence of Web 2.0 the scenario has completely changed and the focus of the NLP community has moved from the classical resources to collaboratively constructed semantic resources (CCSRs); Wikipedia being the most popular example which is why we can now see an increasing number of research publications directed towards such themes in many reputed conferences such as CIKM, WWW, EMNLP etc.

As the diagram above shows the workshop's focus was on researches directed towards use of CCSR's for enhancement and furthering of NLP and also the other way around using NLP to improve CCSR's

The best thing about the workshop was that it welcomed researchers from diverse domains some this time and promising more for the next time; in fact the invited speech by Professor Tat-Seng Chua of National University of Singapore was also from a diverse area namely community-based question answering. Hence the papers presented were from two categories:
  • Those using collaboratively constructed resources as sources of lexical semantic information for NLP purposes such as information retrieval, named entity recognition, or keyword extraction
  • Those using NLP techniques to improve the resources or extract and analyze different types of lexical semantic information from them.
Overall there were 8 papers presented in the workshop of which 5 were on Wikipedia, 1 on Amazon Mechanical Turk, 1 on translation resources and 1 on the blogosphere.

The paper presentations at the workshop were followed by an extremely interactive and knowledgeable discussion on the theme of the workshop - collaboratively constructed semantic resources. In my opinion everyone amongst the attendees had something to take from the discussion and below I am sharing all the questions raised during the discussion along with the thoughts of the attendees. Readers can chip in their comments/thoughts/suggestions for the mutual benefit of the whole community.

The first question centered around the scope of the workshop's name i.e. whether the scope of name "Collaboratively Constructed Semantic Resources" was too wide or too narrow. The very first suggestion from the invited speaker Professor Tat-Seng Chua was a very useful one and he suggested using the term Community Based Semantic Resources instead. Two other suggestions suggested the use of the term crowd sourcing or wisdom of the crowds as these are the more popular terms used within this area and workshops co-located with other reputed conferences use these terms. However one of the workshop chairs Dr. Torsten Zesch put forward a valid argument against these two terms: one can find it hard to agree that Wikipedia falls under a crowd sourced resource and as such this term may narrow down the scope of the workshop too much; on the other hand wisdom of the crowds is a very widespread term for Web 2.0 tasks and many other conferences use it but it is too broad a theme keeping in mind that the workshop centers around use of the resources for NLP tasks and using NLP tasks for improving collaboratively constructed semantic resources.

The next question was a very important one as it bridges NLP researches of yesterday and today: what is the relation between expert-made and collaboratively constructed resources; are they complimentary or are they different? Further explanation of this question was provided by the workshop chair Professor Dr. Iryna Gurevych: for many years the NLP community has relied on classical lexical, semantic resources and they have served us well but with Web 2.0 CCSR's are in widespread use so do we still need the classical resources, shall we spend our efforts in improving the classical resources? On this question almost all the attendees were on agreement that the quality and correctness issues in CCSR's have to be addressed for example Wikipedia has some quality and correctness issues and there is very less work on addressing these issues. When compared to classical resources CCSR's are better in terms of coverage and the classical resources are better in terms of quality so the need is to incorporate quality of classical,expert-made resources into CCSR's and for this we need to provide incentives for guiding the crowds who are generating the CCSR's: Mechanical Turk for example has a nice way of ensuring quality through monetary incentives and the research community needs to think of more ways in which to ensure that CCSR's of high-quality are produced.

The third question focused on the various types of CCSR's and their classifications: what are the most valuable collaboratively constructed semantic resources, how can we classify them? The various types of CCSR's mentioned by the attendees were Wikipedia, Twitter and social networks, forums and CQA sites, YouTube, Flickr, Wiki family e.g. Wiktionary, Wikiversity, Wikitravel etc. As for the value of any CCSR the attendees held agreement that coverage and number of people involved in the resource creation are the determining factors. As for the most valuable CCSR's Wikipedia's importance cannot be denied by any means and there is ample evidence to suggest this; for the future of CCSR's Twitter may emerge as an extremely valuable resource as it a whole wealth of knowledge waiting to be mined; moreover Twitter has managed to attract the attention of the research community in a very short span of time and this can be assessed from the amount of research publications using Twitter as a source of data; reputed publication venues such as WWW, CIKM, SIGKDD, COLING etc. now have many papers on Twitter and if the NLP community directs its efforts towards this resource properly then it can be used as a very effective and useful CCSR. As for the classification of CCSR's a well-grounded classification of CCSR's does not exist and this may also be one research problem within this area as classification is a multi-dimensional thing. One significant line of discussion between Wikipedia and Twitter is that in Wikipedia the user's intention is to make a single,useful resource whereas on Twitter people share their thoughts/message separately in a 140-character long tweet implying that the nature of each CCSR is different and it is important to take this factor into account.

The fourth question centered around the impact CCSR's are having: where do CCSR's have the largest impact, do they really make a difference? Everyone agreed that impact of CCSR's is huge as it has solved the knowledge acquisition bottleneck for researchers and data is now a free resources and since free zones empower people hence its worth the effort and exertion of research efforts towards CCSR's will be beneficial in the future as well. People start to think in new ways with new resources and it benefits the whole community: within this direction an important point was raised by one of the attendees that the commercial giants such as search engines may already be using CCSR's for their tasks and this may also be a significant business secret for them. All this implies that CCSR's have a great potential to heavily impact both research and commercial applications and the community needs to think about more and more ways for creation and improvement of such resources: an example being "games for a purpose" which gives people a leisure incentive rather than monetary one for creating the resources and Google's Image Labeler is one such application which Google uses to generate image tags and hence improvement of its image search.

The fifth and last question concerned the different research areas that have interest in CCSR's: which scientific communities have collaboratively constructed semantic resources as their distinct topic, which fields other than Computational Linguistics/Natural Language Processing/Human Language Technologies should we collaborate with regarding CCSR's? This question has a broad range of answers in my opinion; some answers discussed during the workshop had suggestions to collaborate with people from the social sciences field as "Social Science meets Computer Science" is an emerging, prominent theme recently; moreover people from the computer networking domain can also provide useful insights with respect to CCSR's and hence in the future we may see a broad range of researchers gathering to work collaboratively on collaboratively constructed semantic resources.

The workshop was a great experience for me and I look forward to attending and presenting my work in it in future as well. Readers are advised to drop in any comments/suggestions with respect to collaboratively constructed semantic resources.