cluster-lab: the beginnings

I am starting to interact with the newly set up cluster for data science here at YellowPages. Being a complete neophyte to all of these installation adventures, I will regularly post here a bunch of stuff that I am prone to forget – hopefully in a structured way.

Here is the address for Cloudera Manager, already installed: http://10.32.0.32:7180/cmf/home The cluster had CDH 5.0.3 installed; since we want to use Spark SQL, shipped with CDH 5.1+ (see this message from Databricks or this notification from Cloudera), the first thing I did was to upgrade CDH’s version. This was easy… once I discovered how (basically, make use of the packets in Cloudera Manager: look for the notifications in your status bar, and install/deploy the packet for CDH5.1); if you don’t have Cloudera Manager, you can do it by hand, this way.

Now, I have to confess I don’t know how to know, from inside Cloudera Manager, the version of each of the clients installed in the data nodes. For example, I wanted to know what version of Spark was deployed, and I couldn’t find how… I am sure it is possible, though! Anyway, what I did was to go on one of the data nodes (look for their address here) and run spark-shell, and look for the version information. I saw a 1.0 going on, so we are in business (since Spark SQL alpha comes with Spark 1.0). Just to be sure, I ran a Spark SQL example, and all went comme sur des roulettes. Good stuff.   

Leave a comment