Friday, May 28, 2010

Python and C++ Integration Tutorial

Today I am going to write for the wonderful Python community out there, beauty of open-source is the huge community and help that they provide to each other through blogs, forums, mailing lists etc. So after my struggle with Python and C++ few days ago I planned to write this post to help those who may be struggling with a similar problem.

Although Python is such a great programming language in which you can do almost anything as I mentioned in my previous post but when it comes to performance and efficiency C/C++ has no competitor. So there are times especially for large-scale systems where performance cannot be sacrificed and you have to resort to C/C++ but oh no!!! Most of your source code is already in Python.........so what now?? Well not to worry. Another great thing about Python is the concept of extending it using C/C++ and that's exactly what I did. But it's easier said than done.

Calling C from Python is relatively easy but calling C++ is the real headache so I am sharing how to do it for any programmer that needs it, I had a tough time due to lack of proper tutorials on the subject and this also motivated me to write one :)

The sample class that we use here is Employee and we will use its data members and methods in Python. You can download the complete code here.

In the Employee.cpp file there is a piece of code that does the actual work, Python can call C functions with Python C API and hence its necessary to provide C extensions by keyword extern, here is the code that does it:

extern "C" {
Employee* Employee_new(){ return new Employee(); }
void Employee_promote(Employee* emp){ emp->promote(); }
void Employee_demote(Employee* emp){ emp->demote(); }
void Employee_hire(Employee* emp){ emp->hire(); }
void Employee_fire(Employee* emp){ emp->fire(); }
void Employee_display(Employee* emp){ emp->display(); }
void Employee_setFirstName(Employee* emp,char* inFirstName){ emp->setFirstName(inFirstName); }
void Employee_setLastName(Employee* emp,char* inLastName){ emp->setLastName(inLastName); }
void Employee_setEmployeeNumber(Employee* emp,int inEmployeeNumber){ emp->setEmployeeNumber(inEmployeeNumber); }
void Employee_setSalary(Employee* emp,int inNewSalary){ emp->setSalary(inNewSalary); }
}


In the code the return type of all methods is void, the constructor must have the form ClassName*...........the tough part is when you have methods that take arguments and I had quite a tough time figuring that out due to lack of tutorials on the subject. The way to do it is like this, within the extern block write the method signature as before but after the object you should have the parameter types in the signature as shown here: void Employee_setFirstName(Employee* emp,char* inFirstName){ emp->setFirstName(inFirstName); }

Now the library generation part, here's how to do it in g++

g++ -c -fPIC Employee.cpp -o Employee.o
g++ -shared -Wl,-soname,libEmployee.so -o libEmployee.so Employee.o

This will generate the dll libEmployee.so (you can give it any name of your choice) and then it can be easily called in Python. Here is an example of Python code calling a C++ class, marvelous:

from ctypes import cdll
lib = cdll.LoadLibrary('./libEmployee.so')

class Employee(object):
def __init__(self):
self.obj = lib.Employee_new()

def EmployeeTest(self):
lib.Employee_setFirstName(self.obj,"Marni")
lib.Employee_setLastName(self.obj,"Kleper")
lib.Employee_setEmployeeNumber(self.obj,71)
lib.Employee_setSalary(self.obj,50000)
lib.Employee_promote(self.obj)
lib.Employee_promote(self.obj)
lib.Employee_hire(self.obj)
lib.Employee_display(self.obj)

emp=Employee()
emp.EmployeeTest()


So ctypes does the trick and makes it quite simple........there are also other alternatives such as Boost and Swig which I have not yet explored but will do in near future as I have to work with Python and C++ for my research work for MS thesis. So stay tuned for more.

Friday, May 14, 2010

The Python Experience

Today I will write an introductory post on Python, few days back a student said to me, "Python must be hard" and she is the main reason why I came up with this post.

In one phrase I would say that Python is the best of both worlds because it is capable of delivering the power of traditional compiled languages like C, C++ and the ease of use and simplicity of scripting, interpreted languages like Perl, Tcl. In the world of Python imagination literally becomes the limit for the programmer.

Few people know that Python is used for major tasks by companies like Google, Yahoo!, NASA, Red Hat, Pixar, Disney and Dreamworks. In fact today what we see as Yahoo! mail was the Rocketmail Web-based email service and it was designed in Python. Today many universities are planning to use Python to teach introduction to programming so that students can focus on problem-solving skills instead of being bogged down by the difficulty of the language and some including MIT have even started to do so.

So here goes for the students back in Pakistan the basic features of Python which make it so appealing and powerful:

  • Of course it is a high-level language, its beauty lies in its higher-level data structures that reduce the development time marginally.
  • Python has support for object-oriented programming, in fact it is an object-oriented language all the way down to its core.
  • With python the code is compact and that's also the beauty of it. Python is often compared to batch or Unix shell scripting languages. But with them there is little code-reusability and you are confined to small projects with shell scripts. In fact, even small projects may lead to large and unwieldy scripts. Not so with Python, where you can grow your code from project to project, add other new or existing Python elements, and reuse code at your whim.
  • Python's portability is also what makes it the most widely used programming language today, it can be found on a variety of systems. Python is written in C and due to C's portability Python is available on practically every platform that has an ANSI C compiler.
  • Python is extremely easy to learn and students can grasp it very quickly so I would certainly recommend to many students reading this post and if you need any sort of help feel free to contact me.
  • Python's code is very easy to read so much so that even a reader who has not ever seen a single line of Python will begin to understand and read the code instantaneously.
  • Python code is extremely easy to maintain and if you review a piece of code you wrote some months back you will be able to grasp it in no time.
  • Python is robust to errors. Python provides "safe and sane" exits on errors and when your Python crashes due to errors, the interpreter dumps out a "stack trace" full of useful information such as why your program crashed and where in the code (file name, line number, function call, etc.) the error took place. These errors are known as exceptions. Python even gives you the ability to monitor for errors and take an evasive course of action if such an error does occur during runtime. These exception handlers can take steps such as defusing the problem, redirecting program flow, perform cleanup or maintenance measures, shutting down the application gracefully, or just ignoring it. In any case, the debugging part of the development cycle is reduced considerably.
  • Now comes the one great thing I simply love about Python: numerous external libraries have already been developed for Python, so whatever your application is, someone may have traveled down that road before. All you need to do is "plug-and-play". There are Python modules and packages that can do practically anything from natural language processing in NLTK to scientific computing and everything you can imagine. In Python if you cannot find what you need chances are high that there is a third-party module or package that can do the job.
  • Python has its own memory manager, the thing that makes C and C++ extremely burdensome is that memory management is the responsibility of developer: the programmer has to take care of dirty tasks of memory management no matter what but with Python this headache is gone.
  • Python is classified as an interpreted language. However traditionally purely interpreted languages are almost always slower than compiled languages because execution does not take place in a system's native binary language. But like Java in reality Python is byte-compiled i.e. results in an intermediate form closer to machine language. This improves Python's performance while allowing it to retain the advantages of interpreted languages.
So dear students I would certainly recommend all of you to do give it a go at Python as in the long run it will really benefit you.

Tuesday, May 11, 2010

Is Search Really Dead??

Gone are the days when I had to go through newspaper sites or search engines to find out the result of a late night match that I could not watch the previous night.........today all I do is just login to my Facebook account and there it is it: the latest news right before me. So where are we heading towards?

This post is a little glimpse into the future of information retrieval: yes today I have decided to throw some light on my research area but not from too much of a technical standpoint but from an interesting standpoint which opens a whole new area of research. The area and concept is becoming so important that a special panel was devoted to this discussion in this year's WWW 2010 conference at Raleigh, USA. WWW Conference is the world's most renowned platform for WWW researchers and the theme was "Search is dead." The panel comprised of the following people:

Andrei Broder – Fellow and VP, Search & Computational Advertising, Yahoo! Research.

Marti Hearst – Professor, School of Information, University of California-Berkeley.

Barney Pell – Partner, Search Strategist for Bing, Microsoft.

Andrew Tomkins – Director of Engineering at Google Research.

Prabhakar Raghavan – (Co-organizer and Moderator) Head, Yahoo! Labs .

Elizabeth Churchill – (Co-organizer) Principal Research Scientist, Yahoo! Research.

If I have to sum up the discussion in one line it is "Search as we traditionally know it is already dead!!!!"

Already user expectations from search engines are changing, the traditional task of typing in a query and being offered 10 blue links in response now amounts to a failure.Completeness is the key here: users want search engines to incorporate all the incredible level of richness that is available these days as today there is much more diverse data sources and presentations than links to web pages. So with the user needs getting "weird" and the search business competition getting "fierce" we are at the doorstep of yet another information retrieval (IR) revolution after PageRank.

Well it boils down to an important question: who is a key player in all this??? Any guesses??? Yes social networks like Twitter, Facebook, MySpace and Orkut. Mark Zuckenberg's statement at Facebook's F8 Conference, "We are building a Web where the default is social" throws a lot of light into this entire phenomenon and in particular the new Facebook Platform, crucial parts of which are the Open Graph and Social Plugin. Already as statistics say Facebook has surpassed Google in Internet traffic and this may be the beginning of the new revolution.

So what's your say on it? Is the search really dead?? I will reserve my own thoughts on this for sometime :) and would love to hear from my readers.

In the end a short video to throw some more light into this social media revolution: