Adv-Tek: Hadoop 1.03 and Hive on Ubuntu 12.04

Download VMWare Workstation

https://my.vmware.com/web/vmware/details?productId=241&downloadGroup=WKST-804-LX

Download Ubuntu Desktop 12.04 AMD

http://releases.ubuntu.com/12.04/ubuntu-12.04-desktop-amd64.iso

Create the image 400G disk, 2 processors, “bridged” network

Launch a Terminal via Dash Home

$ sudo apt-get update

Wait for Update Manager to auto-launch; Click “Install Updates”

Reboot

Launch Terminal

$ sudo nano /etc/sudoers

Copy from Guest is Ctrl-Shift-C

Paste to Guest is Ctrl-Shift-V

Append to end of /etc/sudoers

ALL = (ALL) NOPASSWD: ALL

SSH Instructions

http://cloudfront.blogspot.com/2012/07/how-to-setup-and-configure-ssh-on-ubuntu.html

$ ssh-keygen -t rsa -P ""

Install Oracle Java

http://cloudfront.blogspot.in/2012/07/how-to-install-sunoracle-java-on-ubuntu.html

$ sudo add-apt-repository ppa:webupd8team/java

$ sudo apt-get update

$ sudo apt-get install oracle-java7-installer

$ java -version

Install Hadoop

http://cloudfront.blogspot.com/2012/07/how-to-configure-hadoop.html

$ wget -c http://mirror.metrocast.net/apache/hadoop/common/hadoop-1.0.3/hadoop-1.0.3-bin.tar.gz

$ tar -zxvf hadoop-1.0.3-bin.tar.gz

$ nano .bashrc

Append to .bashrc

export HADOOP_HOME=/home/myusername/hadoop-1.0.3

Close the Terminal and launch a new one to pick up the new environment variable

$ exit

Set JAVA_HOME in hadoop env

$ cd hadoop-1.0.3/conf

$ nano hadoop-env.sh

Append next to commented JAVA_HOME

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

Create hdfs target directories

$ mkdir ~/hdfs

$ mkdir ~/hdfs/name

$ mkdir ~/hdfs/data

$ mkdir ~/hdfs/tmp

$ sudo chmod -R 755 ~/hdfs/

Modify the config files as described in: http://cloudfront.blogspot.com/2012/07/how-to-configure-hadoop.html

$ sudo nano ~/hadoop-1.0.3/conf/core-site.xml

$ sudo nano ~/hadoop-1.0.3/conf/hdfs-site.xml

$ sudo nano ~/hadoop-1.0.3/conf/mapred-site.xml

Format the namenode and start hadoop services

$ ~/hadoop-1.0.3/bin/hadoop namenode -format

$ ~/hadoop-1.0.3/bin/start-all.sh

Confirm services are started

$ jps

Hadoop status

http://localhost:50070

Map Reduce status

http://localhost:50030

Install Hive https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration

$ wget -c http://apache.claz.org/hive/hive-0.9.0/hive-0.9.0-bin.tar.gz

$ tar -xzvf hive-0.9.0-bin.tar.gz

Add these lines to ~/.bashrc and restart your terminal

export HADOOP_HOME=/home/myusername/hadoop-1.0.3

export HIVE_HOME=/home/myusername/hive-0.9.0-bin

export PATH=$HIVE_HOME/bin:$PATH

export PATH=$HADOOP_HOME/bin:$PATH

Create hive directories within hdfs and set permissions for table create

$ hadoop fs -mkdir /user/hive/warehouse

$ hadoop fs -mkdir /tmp

$ hadoop fs -chmod g+w /user/hive/warehouse

$ hadoop fs -chmod g+w /tmp

Launch hive and create sample tables

$ hive

hive> CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

hive> CREATE TABLE kjv (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

hive> exit;

Download sample data from Cloudera

$ wget -O shakespeare.tar.gz https://github.com/cloudera/cloudera-training/blob/master/data/shakespeare.tar.gz?raw=true

$ wget -O bible.tar.gz https://github.com/cloudera/cloudera-training/blob/master/data/bible.tar.gz?raw=true

$ tar -zvxf bible.tar.gz

$ tar -zvxf shakespeare.tar.gz

Put the Shakespeare sample data into hdfs

$ hadoop fs -mkdir shakespeare-input

$ hadoop fs -put ~/input/all-shakespeare /user/myusername/shakespeare-input

$ hadoop fs -ls shakespeare-input

Run the “grep” sample against the hdfs directory “shakespeare-input” and place results in “shakespeare_freq”

$ hadoop jar ~/hadoop-1.0.3/hadoop-examples-1.0.3.jar grep shakespeare-input shakespeare_freq '\w+'

$ hadoop fs -ls shakespeare_freq

Put the bible sample data into hdfs

$ hadoop fs -mkdir bible-input

$ hadoop fs -put ~/bible/all-bible /user/myusername/bible-input

$ hadoop fs -ls bible-input

Run the “grep” sample against the hdfs directory “bible-input” and place results in “bible_freq”

$ hadoop jar ~/hadoop-1.0.3/hadoop-examples-1.0.3.jar grep bible-input bible_freq '\w+'

$ hadoop fs -ls bible_freq

Cleanup the logs

$ hadoop fs -rmr bible_freq/_logs

$ hadoop fs -rmr shakespeare_freq/_logs

Open Hive

$ hive

hive> load data inpath "shakespeare_freq" into table shakespeare;

hive> select * from shakespeare limit 10;

hive> select * from shakespeare where freq > 20 sort by freq asc limit 10;

hive> select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;

hive> explain select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;

hive> load data inpath “bible_freq” into table kjv;

hive> create table merged (word string, shake_f int, kjv_f int);

hive> insert overwrite table merged select s.word, s.freq, k.freq from shakespeare s join kjv k on (s.word = k.word) where s.freq >= 1 and k.freq >= 1;

hive> select * from merged limit 20;

hive> select word, shake_f, kjv_f, (shake_f + kjv_f) as ss from merged sort by ss desc limit 20;

Now you know; and knowing is half the battle

Total MapReduce CPU Time Spent: 6 seconds 140 msec

the 25848 62394 88242

and 19671 38985 58656

of 16700 34654 51354

I 23031 8854 31885

to 18038 13526 31564

in 10797 12445 23242

a 14170 8057 22227

that 8869 12603 21472

And 7800 12846 20646

is 8882 6884 15766

my 11297 4135 15432

you 12702 2720 15422

he 5720 9672 15392

his 6817 8385 15202

not 8409 6591 15000

be 6773 6913 13686

for 6309 7270 13579

with 7284 6057 13341

it 7178 5917 13095

shall 3293 9764 13057

Time taken: 67.711 seconds

19 comments:

MGHAugust 22, 2012 at 2:04 AM
How much RAM?
Ben HidalgoAugust 22, 2012 at 3:59 PM
1024M... My machine only has 4G but it ran fine and I was able to run three VMs at 1024 without noticeable lag.
MGHAugust 23, 2012 at 8:12 PM
Will try to replicate your steps, but I plan on doing a couple of things differently. Instead of modifying the sudoers file as you have proposed, I will create a separate group and dedicated Hadoop user, generate a passwordless RSA keypair (as you also do), add it to the .ssh/authorized_keys, and do an SSH login to localhost so that the server is a known host.
Ben HidalgoAugust 24, 2012 at 9:49 AM
Thanks for taking a look! The "sudoers" technique is probably overly permissive, but since it is running on a VM on my local machine, it thought it was ok. Your alterations are likely better for a production-like setup or a multi-node cluster.
UnknownAugust 13, 2013 at 5:39 AM
I am getting error while formatting namenode

Re-format filesystem in /mnt/data/hivedata/hdfs/name ? (Y or N) y
Format aborted in /mnt/data/hivedata/hdfs/name

UnknownSeptember 10, 2013 at 3:38 PM
Thank you for the wonderful tutorial,its all pretty much worked for me till the end.thanks.
UnknownSeptember 29, 2014 at 4:04 AM
Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!

Salesforce Training in Chennai
UnknownMarch 17, 2015 at 10:02 PM
It’s too informative blog and I am getting conglomerations of info’s about Hadoop.Thanks for sharing, I would like to see your updates regularly so keep blogging.
Salesforce training institute in Chennai
Salesforce training
UnknownMay 11, 2015 at 4:45 AM
Hi friends, This is Rebeka from Chennai. I am a technology freak. Your technical information is really useful for me. Keep update your blog.
Regards..
Oracle Training Chennai
UnknownMay 30, 2015 at 4:18 AM
Thanks for sharing this valuable information.and I gathered some information from this blog. I did SAP Training in Chennai, at FITA Academy which offer best SAP Course in Chennai with years of experienced professionals.
UnknownOctober 23, 2015 at 11:31 PM
This comment has been removed by the author.
UnknownNovember 13, 2015 at 8:20 PM
such a good website and given to more information thanks! and more visit
sas online training
UnknownMarch 18, 2016 at 5:30 AM
Android training institute in noida - webtrackker is best training institute webtrackkerr provides real time working trainer with 100% placement supprt. webtrackker provides all IT course like SAP(ABAP, BASIS, FI/CO, CRM, MM, PP, BI), SAS, WEB DESIGNING, AUTOCAD, CAM, NODEJS, ANGULARJS, HYBIRD APPS, DIGITAL MARKETING.
saranyaApril 28, 2018 at 2:37 AM
I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
selenium training in chennai
AdhunttFebruary 26, 2019 at 1:44 AM
Very informative blog thanks for sharing Searching for a SEO company in Chennai that can bring your brand to the top results page on Google
secretmasthMarch 16, 2019 at 5:04 AM
http://spotandaman.com/
RevathiAugust 5, 2020 at 10:25 PM
Nice sharing.Don’t stop using YouTube, Facebook, Twitter, and Instagram. According to Tubular Labs data,keep up!!

android training in chennai

android online training in chennai

android training in bangalore

android training in hyderabad

android Training in coimbatore

android training

android online training
UNKNOWNJuly 30, 2021 at 5:20 AM
Thanks for sharing wonderful information. It contain many digital marketing Agency related contents.

web design company in chennai

digital marketing company in chennai

best web design company in chennai

website redesigning company in chennai

ecommerce development company in chennai

logo design company in chennai
Kosmoh LifestyleDecember 14, 2023 at 4:11 AM
Welcome to a sanctuary of organic serenity, where spirituality meets sustainable living. Join us in embracing an organic, mindful path to wellness. Your practice, your soul, your sanctuary. Kosmoh Life Style is the best place to buy yoga products online, featuring a wide range of high-quality yoga accessories and yoga wear, tailored for comfort and style in your yoga practice.

Wednesday, August 15, 2012

Hadoop 1.03 and Hive on Ubuntu 12.04

19 comments: