This article explains how to set up and configure a single-node Hadoop installation( Hadoop Single Node Setup ). We can also simulate multiple nodes Hadoop environment on standalone single server with pseudo distributed Hadoop installation, which we will explained in detail in the next article of this series.
Single node setup helps us to quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS). If you are new to Hadoop please read our first article Apache Hadoop – A Promise Of The Century for Hadoop basic concept
Step 1 : Create a user for Hadoop Application
password hadoop
Step 2 : Download Hadoop from Apache Site http://hadoop.apache.org/core/releases.html . I have downloaded Hadoop stable release (1.0.4 ) from Apache Hadoop site by directly login to the node as hadoop user using wget command
wget http://apache.techartifact.com/mirror/hadoop/common/stable/hadoop-1.0.4.tar.gz
Step 3 : Before proceding for configuration we need to make sure 1.6 or higher version is installed on your system.
java version “1.6.0_31″
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) Client VM (build 20.6-b01, mixed mode, sharing)
Otherwise download JAVA from preferably from Sun and install using root user .
Step 4 : Unpack Hadoop package downloaded in step using hadoop user
This will create a folder hadoop-1.0.4
[hadoop@hadoop1 hadoop-1.0.4]$ ls -l
total 7528
drwxr-xr-x. 2 hadoop hadoop 4096 Nov 24 06:30 bin
-rw-rw-r–. 1 hadoop hadoop 119875 Oct 2 22:17 build.xml
drwxr-xr-x. 4 hadoop hadoop 4096 Oct 2 22:17 c++
-rw-rw-r–. 1 hadoop hadoop 446999 Oct 2 22:17 CHANGES.txt
drwxr-xr-x. 2 hadoop hadoop 4096 Nov 24 06:59 conf
drwxr-xr-x. 10 hadoop hadoop 4096 Oct 2 22:17 contrib
drwxr-xr-x. 7 hadoop hadoop 4096 Nov 24 06:30 docs
-rw-rw-r–. 1 hadoop hadoop 6840 Oct 2 22:17 hadoop-ant-1.0.4.jar
-rw-rw-r–. 1 hadoop hadoop 410 Oct 2 22:17 hadoop-client-1.0.4.jar
-rw-rw-r–. 1 hadoop hadoop 3928530 Oct 2 22:17 hadoop-core-1.0.4.jar
-rw-rw-r–. 1 hadoop hadoop 142452 Oct 2 22:17 hadoop-examples-1.0.4.jar
-rw-rw-r–. 1 hadoop hadoop 413 Oct 2 22:17 hadoop-minicluster-1.0.4.jar
-rw-rw-r–. 1 hadoop hadoop 2656646 Oct 2 22:17 hadoop-test-1.0.4.jar
-rw-rw-r–. 1 hadoop hadoop 287807 Oct 2 22:17 hadoop-tools-1.0.4.jar
drwxr-xr-x. 2 hadoop hadoop 4096 Nov 24 06:30 ivy
-rw-rw-r–. 1 hadoop hadoop 10525 Oct 2 22:17 ivy.xml
drwxr-xr-x. 5 hadoop hadoop 4096 Nov 24 06:30 lib
drwxr-xr-x. 2 hadoop hadoop 4096 Nov 24 06:30 libexec
-rw-rw-r–. 1 hadoop hadoop 13366 Oct 2 22:17 LICENSE.txt
drwxrwxr-x. 4 hadoop hadoop 4096 Nov 24 09:41 logs
-rw-rw-r–. 1 hadoop hadoop 101 Oct 2 22:17 NOTICE.txt
drwxrwxr-x. 3 hadoop hadoop 4096 Nov 24 10:08 output
-rw-rw-r–. 1 hadoop hadoop 1366 Oct 2 22:17 README.txt
drwxr-xr-x. 2 hadoop hadoop 4096 Nov 24 06:30 sbin
drwxr-xr-x. 3 hadoop hadoop 4096 Oct 2 22:17 share
drwxr-xr-x. 16 hadoop hadoop 4096 Nov 24 06:30 src
drwxr-xr-x. 9 hadoop hadoop 4096 Oct 2 22:17 webapps
[hadoop@hadoop1 hadoop-1.0.4]$
Step 5 : Now edit the /home/hadoop/hadoop-1.0.4/conf/hadoop-env.sh so the JAVA_HOME point to the correct location of the java that is installed on your system.
# The only required environment variable is JAVA_HOME. All others are
# set JAVA_HOME in this file, so that it is correctly defined on
export JAVA_HOME=/usr/java/jdk1.6.0_31
[hadoop@hadoop1 ~]$
Step 6 : Now test your Hapood Enviroment
Make a directory called input
mkdir input
Copy all .xml file contents of conf folder to input folder
Run the following Hadoop sample program to finds and displays every match of the given regular expression. Output will be written to output directory
You will get a output like below
12/11/25 21:43:58 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/11/25 21:43:58 WARN snappy.LoadSnappy: Snappy native library not loaded
12/11/25 21:43:58 INFO mapred.FileInputFormat: Total input paths to process : 7
12/11/25 21:43:59 INFO mapred.JobClient: Running job: job_local_0001
12/11/25 21:43:59 INFO util.ProcessTree: setsid exited with exit code 0
12/11/25 21:43:59 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@a1d1f4
12/11/25 21:43:59 INFO mapred.MapTask: numReduceTasks: 1
12/11/25 21:43:59 INFO mapred.MapTask: io.sort.mb = 100
12/11/25 21:44:00 INFO mapred.JobClient: map 0% reduce 0%
12/11/25 21:44:00 INFO mapred.MapTask: data buffer = 79691776/99614720
12/11/25 21:44:00 INFO mapred.MapTask: record buffer = 262144/327680
12/11/25 21:44:00 INFO mapred.MapTask: Starting flush of map output
12/11/25 21:44:00 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/11/25 21:44:02 INFO mapred.LocalJobRunner: file:/home/hadoop/hadoop-1.0.4/input/capacity-scheduler.xml:0+7457
…………output truncated………………
This will create a output directory as below with following contents
total 4
-rwxrwxrwx. 1 hadoop hadoop 11 Nov 25 21:44 part-00000
-rwxrwxrwx. 1 hadoop hadoop 0 Nov 25 21:44 _SUCCESS
Step 7 : We got 1 expression matching the give Regular expression
1 dfsadmin
[hadoop@hadoop1 hadoop-1.0.4]$
We will continue learning Hadoop on later articles.. Thanks
Source of the Topic : http://hadoop.apache.org/docs/r1.0.4/single_node_setup.html
