December 17, 2014 Simon Raper

Beautiful Hue

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Hue is a godsend for when you want to do something on hadoop (hive query, pig script, run spark etc) and you are not feeling very command line.

As with all things on AWS there seem to be many routes in but here’s my recipe for getting it up and running (I’m assuming you have an AWS account. If not get set up by following the instructions here)

I use the AWS command line interface (easy instructions for install are here) to get a cluster with hue on it up and running

At the command line I then launched a cluster

aws emr create-cluster --name "My Cluster Name" --ami-version 3.3 --log-uri s3://my.bucket/logs/  --applications Name=Hive Name=Hue Name=Pig --ec2-attributes KeyName=mykeypair --instance-type m3.xlarge --instance-count 3

Next log in to the AWS console and go to EMR > Click on your cluster > view cluster details > enable web connection and follow the instructions there.

Note the instructions for configuring the proxy management tool are incomplete. Go to here for the complete instructions.

You should then, as it says, be able to put in master-public-dns-name:8888 into your browser and log in to hue. Don’t be a fool like me and forget to actually substitute in you master-public-fns-name which can be found on your cluster details!

Here’s a video tutorial I created that shows how to get Hue up and running on AWS

Tagged: ,

About the Author

Simon Raper I am an RSS accredited statistician with over 15 years’ experience working in data mining and analytics and many more in coding and software development. My specialities include machine learning, time series forecasting, Bayesian modelling, market simulation and data visualisation. I am the founder of Coppelia an analytics startup that uses agile methods to bring machine learning and other cutting edge statistical techniques to businesses that are looking to extract value from their data. My current interests are in scalable machine learning (Mahout, spark, Hadoop), interactive visualisatons (D3 and similar) and applying the methods of agile software development to analytics. I have worked for Channel 4, Mindshare, News International, Credit Suisse and AOL. I am co-author with Mark Bulling of Drunks and Lampposts - a blog on computational statistics, machine learning, data visualisation, R, python and cloud computing. It has had over 310 K visits and appeared in the online editions of The New York Times and The New Yorker. I am a regular speaker at conferences and events.

Machine Learning and Analytics based in London, UK