Tuesday, October 2, 2012

Pig Vs MapReduce

Tried few examples on Hadoop Map Reduce. After some initial hiccups, Hadoop setup on my local box turned out to be pretty hassle free. As Hadoop is written in Java, it definitely helped me. Understanding the whole business of running the map reduce jobs was a breeze.

Later I was reading Pig Latin. It a  high level scripting language for analyzing large data sets. Since its written over Map reduce, out of curiosity, I tried the same examples with Pig which I tried earlier with Map Reduce. These are my observations

  • Pig is good for modelling and prototyping purposes. You can do iterative development as it is easy to change the script and run it again. No need to package/compile for every change.
  • Pig is definitely slow compared to Map Reduce jobs.
  • There is not much documentation on how to optimize the Pig script. User may end up writing the script in such a way that it creates lots of Map reduce jobs.
  • If you are a programmer, probably you would like more power to optimize your code which comes with Map Reduce. I personally prefer the solution where I have more understanding of how things are working.
  • Pig map involve packaging/compiling code if you are using custom functions. In a real problem, user may end up writing lot of custom functions which map end  up making Pig development almost as complex as Map Reduce.