Today I want to review a really interesting video course from Packt: Building Hadoop Clusters.
I’ve always been really interested in distributed system and this course proven to be a good starting point. (you can find it at http://bit.ly/1kNiHSr )
As usual you can find the teaser video on YouTube
The course can be divided in 2 main part: a first (quite long) section deal on how to prepare 3 testing machine on Amazon AWS.
This topic is quite a long one but I have to admit that server setup is a quite critical task if you want to build up the cluster properly.
You will have to dimension servers, manage a shared network between them and grant proper security groups and ssh key sharing to allow password-less communication across nodes.
After going through all these steps (6 chapters long, phew!) you can start dealing with real Hadoop configuration.
Installation is performed through Ambari (so only RedHat / CentOs up to now, take care when you bring up your nodes if you are an Ubuntu addicted 😉 ) a really useful web interface to centralize all your node management.
Setup steps are illustrated in a real straightforward way, even if (IMHO) I can say some more detail on WHY you are configuring particular services across nodes would have been really appreciated (Mainly if dealing with more complex than a basic 3-node cluster)
After setup and a small historic / architecture introduction a full ambari interface overview is shown, jut to go to final chapters there you will start dealing with files upload/download and some more admin task though Hadoop User Experience web interface.
Doing a recap, this is a really interesting video if you want to start Hadoop hands-on.
It will allow you to quickly prepare a mini-cluster and start playing with Hadoop Distributed Filesystem.
Probably once everything is in place, you will be some more in-deep tutorial focusing more with HDFS management and MapReduce (jut mentioned in this course, probably another one will come shortly, I hope).
My only concern is how much space has been dedicated to AWS as a base IASS (its’ a small tutorial-in-the-tutorial ) when maybe not everyone could afford such solution just to evaluate Hadoop solution. Maybe some Vagrant scripts would have be worth the case providing a more easy to realize, a portable Hadoop cluster.
Hey, this could be a good idea for a next post, couldn’t it? 😉