Download VMWare Workstation
Download Ubuntu Desktop 12.04 AMD
Create the image 400G disk, 2 processors, “bridged” network
Launch a Terminal via Dash Home
$ sudo
apt-get update
Wait for Update Manager to auto-launch; Click “Install Updates”
Reboot
Launch Terminal
$
sudo nano /etc/sudoers
Copy from Guest is Ctrl-Shift-C
Paste to Guest is Ctrl-Shift-V
Append to end of /etc/sudoers
SSH Instructions
$ ssh-keygen -t rsa -P ""
Install Oracle Java
$ sudo
add-apt-repository ppa:webupd8team/java
$ sudo
apt-get update
$ sudo
apt-get install oracle-java7-installer
$ java
-version
Install Hadoop
$
wget -c
http://mirror.metrocast.net/apache/hadoop/common/hadoop-1.0.3/hadoop-1.0.3-bin.tar.gz
$
tar -zxvf hadoop-1.0.3-bin.tar.gz
$
nano .bashrc
Append to .bashrc
export
HADOOP_HOME=/home/myusername/hadoop-1.0.3
Close the Terminal and launch a new one to pick up the new
environment variable
$ exit
Set JAVA_HOME in hadoop env
$ cd
hadoop-1.0.3/conf
$ nano
hadoop-env.sh
Append next to commented JAVA_HOME
export
JAVA_HOME=/usr/lib/jvm/java-7-oracle
Create hdfs target directories
$ mkdir
~/hdfs
$ mkdir
~/hdfs/name
$ mkdir
~/hdfs/data
$ mkdir
~/hdfs/tmp
$
sudo chmod -R 755 ~/hdfs/
Modify the config files as described in: http://cloudfront.blogspot.com/2012/07/how-to-configure-hadoop.html
$ sudo
nano ~/hadoop-1.0.3/conf/core-site.xml
$ sudo
nano ~/hadoop-1.0.3/conf/hdfs-site.xml
$ sudo nano ~/hadoop-1.0.3/conf/mapred-site.xml
Format the namenode and start hadoop services
$
~/hadoop-1.0.3/bin/hadoop namenode -format
$
~/hadoop-1.0.3/bin/start-all.sh
Confirm services are started
$
jps
Hadoop status
Map Reduce status
Install Hive https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration
$ wget
-c http://apache.claz.org/hive/hive-0.9.0/hive-0.9.0-bin.tar.gz
$ tar
-xzvf hive-0.9.0-bin.tar.gz
Add these lines to ~/.bashrc and restart your terminal
export
HADOOP_HOME=/home/myusername/hadoop-1.0.3
export
HIVE_HOME=/home/myusername/hive-0.9.0-bin
export
PATH=$HIVE_HOME/bin:$PATH
export
PATH=$HADOOP_HOME/bin:$PATH
Create hive directories within hdfs and set permissions for table
create
$
hadoop fs -mkdir
/user/hive/warehouse
$
hadoop fs -mkdir /tmp
$
hadoop fs -chmod g+w
/user/hive/warehouse
$
hadoop fs -chmod g+w /tmp
Launch hive and create sample tables
$
hive
hive>
CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\t' STORED AS TEXTFILE;
hive>
CREATE TABLE kjv (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED
BY '\t' STORED AS TEXTFILE;
hive>
exit;
Download sample data from Cloudera
$
wget -O shakespeare.tar.gz
https://github.com/cloudera/cloudera-training/blob/master/data/shakespeare.tar.gz?raw=true
$
wget -O bible.tar.gz https://github.com/cloudera/cloudera-training/blob/master/data/bible.tar.gz?raw=true
$
tar -zvxf bible.tar.gz
$
tar -zvxf shakespeare.tar.gz
Put the Shakespeare sample data into hdfs
$
hadoop fs -mkdir shakespeare-input
$
hadoop fs -put ~/input/all-shakespeare
/user/myusername/shakespeare-input
$
hadoop fs -ls shakespeare-input
Run the “grep” sample against the hdfs directory “shakespeare-input”
and place results in “shakespeare_freq”
$
hadoop jar ~/hadoop-1.0.3/hadoop-examples-1.0.3.jar grep shakespeare-input
shakespeare_freq '\w+'
$
hadoop fs -ls shakespeare_freq
Put the bible sample data into hdfs
$
hadoop fs -mkdir bible-input
$
hadoop fs -put ~/bible/all-bible /user/myusername/bible-input
$
hadoop fs -ls bible-input
Run the “grep” sample against the hdfs directory “bible-input” and
place results in “bible_freq”
$
hadoop jar ~/hadoop-1.0.3/hadoop-examples-1.0.3.jar grep bible-input bible_freq
'\w+'
$
hadoop fs -ls bible_freq
Cleanup the logs
$ hadoop
fs -rmr bible_freq/_logs
$ hadoop
fs -rmr shakespeare_freq/_logs
Open Hive
$
hive
hive>
load data inpath "shakespeare_freq" into table shakespeare;
hive>
select * from shakespeare limit 10;
hive>
select * from shakespeare where freq > 20 sort by freq asc limit 10;
hive>
select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc
limit 10;
hive>
explain select freq, count(1) as f2 from shakespeare group by freq sort by f2
desc limit 10;
hive>
load data inpath “bible_freq” into table kjv;
hive>
create table merged (word string, shake_f int, kjv_f int);
hive>
insert overwrite table merged select s.word, s.freq, k.freq from shakespeare s
join kjv k on (s.word = k.word) where s.freq >= 1 and k.freq >= 1;
hive>
select * from merged limit 20;
hive>
select word, shake_f, kjv_f, (shake_f + kjv_f) as ss from merged sort by ss
desc limit 20;
Now you know; and knowing is half the battle
Total
MapReduce CPU Time Spent: 6 seconds 140 msec
OK
the 25848 62394 88242
and 19671 38985 58656
of 16700 34654 51354
I 23031 8854 31885
to 18038 13526 31564
in 10797 12445 23242
a 14170 8057 22227
that 8869 12603 21472
And 7800 12846 20646
is 8882 6884 15766
my 11297 4135 15432
you 12702 2720 15422
he 5720 9672 15392
his 6817 8385 15202
not 8409 6591 15000
be 6773 6913 13686
for 6309 7270 13579
with 7284 6057 13341
it 7178 5917 13095
shall 3293 9764 13057
Time
taken: 67.711 seconds
How much RAM?
ReplyDelete1024M... My machine only has 4G but it ran fine and I was able to run three VMs at 1024 without noticeable lag.
ReplyDeleteWill try to replicate your steps, but I plan on doing a couple of things differently. Instead of modifying the sudoers file as you have proposed, I will create a separate group and dedicated Hadoop user, generate a passwordless RSA keypair (as you also do), add it to the .ssh/authorized_keys, and do an SSH login to localhost so that the server is a known host.
ReplyDeleteThanks for taking a look! The "sudoers" technique is probably overly permissive, but since it is running on a VM on my local machine, it thought it was ok. Your alterations are likely better for a production-like setup or a multi-node cluster.
ReplyDeleteI am getting error while formatting namenode
ReplyDeleteRe-format filesystem in /mnt/data/hivedata/hdfs/name ? (Y or N) y
Format aborted in /mnt/data/hivedata/hdfs/name
Thank you for the wonderful tutorial,its all pretty much worked for me till the end.thanks.
ReplyDeleteThanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!
ReplyDeleteSalesforce Training in Chennai
It’s too informative blog and I am getting conglomerations of info’s about Hadoop.Thanks for sharing, I would like to see your updates regularly so keep blogging.
ReplyDeleteSalesforce training institute in Chennai
Salesforce training
Hi friends, This is Rebeka from Chennai. I am a technology freak. Your technical information is really useful for me. Keep update your blog.
ReplyDeleteRegards..
Oracle Training Chennai
Thanks for sharing this valuable information.and I gathered some information from this blog. I did SAP Training in Chennai, at FITA Academy which offer best SAP Course in Chennai with years of experienced professionals.
ReplyDeleteThis comment has been removed by the author.
ReplyDeletesuch a good website and given to more information thanks! and more visit
ReplyDeletesas online training
Android training institute in noida - webtrackker is best training institute webtrackkerr provides real time working trainer with 100% placement supprt. webtrackker provides all IT course like SAP(ABAP, BASIS, FI/CO, CRM, MM, PP, BI), SAS, WEB DESIGNING, AUTOCAD, CAM, NODEJS, ANGULARJS, HYBIRD APPS, DIGITAL MARKETING.
ReplyDeleteI believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
ReplyDeleteselenium training in chennai
Very informative blog thanks for sharing Searching for a SEO company in Chennai that can bring your brand to the top results page on Google
ReplyDeletehttp://spotandaman.com/
ReplyDeleteNice sharing.Don’t stop using YouTube, Facebook, Twitter, and Instagram. According to Tubular Labs data,keep up!!
ReplyDeleteandroid training in chennai
android online training in chennai
android training in bangalore
android training in hyderabad
android Training in coimbatore
android training
android online training
Thanks for sharing wonderful information. It contain many digital marketing Agency related contents.
ReplyDeleteweb design company in chennai
digital marketing company in chennai
best web design company in chennai
website redesigning company in chennai
ecommerce development company in chennai
logo design company in chennai
Welcome to a sanctuary of organic serenity, where spirituality meets sustainable living. Join us in embracing an organic, mindful path to wellness. Your practice, your soul, your sanctuary. Kosmoh Life Style is the best place to buy yoga products online, featuring a wide range of high-quality yoga accessories and yoga wear, tailored for comfort and style in your yoga practice.
ReplyDelete