Pig is a platform that works with large data sets for the purpose of analysis. The Pig dialect is called Pig Latin, and the Pig Latin commands get compiled into MapReduce jobs that can be run on a suitable platform, like Hadoop.
DataSet Description : http://www2.informatik.uni-freiburg.de/~cziegler/BX/