Pre-requisites and Skills needed to learn Apache Hadoop

Skills need to learn Hadoop 

CORE JAVA

Advanced Java expertise comes as an added advantage for professionals yearning to learn Hadoop but is not among the pre-requisites to learn hadoop. Folks who are honourably interested to pursue a lucrative career in big data and hadoop can get started in hadoop while simultaneously spending few hours on learning basic concepts of java. Hadoop allows developers to write map and reduce functions in their preferred language of choice like Python, Perl, C, Ruby, etc. through the streaming API which supports reading from standard input and writing to standard output. Apart from this, Hadoop has high level abstractions tools like Pig and Hive which do not require familiarity with Java. 

LINUX

Hadoop needs to be setup in a Linux based operating system preferable Ubuntu [1].The preferred method of installing and managing hadoop clusters is through the command line parameters of Linux shell. So for professionals exploring opportunities in Hadoop, some basic knowledge on Linux is required to setup Hadoop. Hadoop runs on Linux, so you should know basic Linux command line navigation skills. Some Linux scripting skills will go a long way

1.) Hadoop fs –put : This command is used to upload a file from the local file system to HDFS. Multiple files can be uploaded using this command by separating the filenames with a space.

2.) Hadoop fs –get : This command is used to download a file from the local file system to HDFS. Multiple files can be downloaded using this command by separating the filenames with a space.

3.) Hadoop fs –cat : Command for Viewing the Contents of a file.

4.) Hadoop fs –mv : Command for Moving Files from Source to Destination

5) Hadoop fs –rm :  Command for Removing a Directory or File in HDFS. To remove a directory, the directory should be empty before using the rm command.

6) Hadoop fs –copyFromLocal  :  Command for Copying files from local file system to HDFS

7) Hadoop fs –du : Command to display the length of a file

8) Hadoop fs –ls : Command to view the content of a directory

9) Hadoop fs –mkdir : Command to create a Directory in HDFS

10) Hadoop fs –head  : Command to display the first few lines of a file

Hardware Requirements to Learn Hadoop

1) Intel Core 2  Duo/Quad/hex/Octa or higher end 64 bit processor PC or Laptop (Minimum operating frequency of 2.5GHz)

2) Hard Disk capacity of 1- 4TB.

3) 64-512 GB RAM

4) 10 Gigabit Ethernet or Bonded Gigabit Ethernet

Development tools are available for Hadoop: 

Hadoop development tools are still evolving. Here are a few:

  • Karmasphere IDE : tuned for developing for Hadoop
  • Eclipse and other Java IDEs : When writing Java code
  • Command line editor like VIM : No matter what editor you use, you will be editing a lot of files / scripts. So familiarity with CLI editors is essential.