Hive Installation

Hive

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL with schema on read and transparently converts queries to map/reduce, Apache Tez and Spark jobs. All three execution engines can run in Hadoop YARN. To accelerate queries, it provides indexes, including bitmap indexes.

Hive
Java
Hadoop
Downloading Hive
Installing Hive
Derby
Metastore

Java

Java must be installed on your Linux system. Verify that Java is installed following these steps. Use the following command:

java -version

This should return something like:

openjdk version "1.8.0_77"
OpenJDK Runtime Environment (build 1.8.0_77-b03)
OpenJDK 64-Bit Server VM (build 25.77-b03, mixed mode)

Java is currently installed. It is necessary to know where our Java installation is located. To identify where it is located use:

whereis java

This should return something like:

java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java /usr/share/man/man1/
java.1.gz

Note: whereis locates source/binary and manuals sections for specified files. The supplied names are first stripped of leading pathname components and any (single) trailing extension of the form ".ext", for example, ".c". Prefixes of "s." resulting from use of source code control are also handled. whereis then attempts to locate the desired program in a list of standard Linux places.

Open your .bashrc file to ensure that Java is configured correctly.

export JAVA_HOME="/usr/lib/java"
export PATH="$PATH:$JAVA_HOME/bin"

Check the alternatives symbolic links to ensure they are established correctly.

alternatives --display java
alternatives --display javac
alternatives --display jar

# alternatives --install /usr/bin/java java usr/local/java/bin/java 2 # alternatives --install /usr/bin/javac javac usr/local/java/bin/javac 2 # alternatives --install /usr/bin/jar jar usr/local/java/bin/jar 2 alternatives [options] --install link name path priority # alternatives --set java usr/local/java/bin/java # alternatives --set javac usr/local/java/bin/javac # alternatives --set jar usr/local/java/bin/jar alternatives [options] --set name path alternatives [options] --display name

Hadoop

Hadoop must be installed on your system before installing Hive. To verify that it is installed, use:

hadoop version

This should return something like:

Hadoop 2.2.0.2.1.0.0-92
Subversion [email protected]:hortonworks/hadoop.git -r e544b3bc07472c6ca2c56a59c36e2536370eafdf
Compiled by jenkins on 2013-12-13T03:04Z
Compiled with protoc 2.5.0
From source with checksum 1c6398d7af826956aa7866b088eba7d
This command was run using /usr/lib/hadoop/hadoop-common-2.2.0.2.1.0.0-92.jar

It appears that Hadoop is installed and configured correctly.

Downloading Hive

Go to your Downloads directory.

cd ~/Downloads

You can get the tar in one of two ways. First, use the wget command:

wget http://www-us.apache.org/dist/hive/stable/apache-hive-1.2.1-bin.tar.gz

This will download the tar file to your /home/hduser/Downloads directory.

Hive can also be downloaded from http://www-us.apache.org/dist/hive/stable/. For this example, click on apache-hive-1.2.1-bin.tar.gz to initiate a download. In most cases, the file will be located in the Downloads directory. To verify, use the following command:

cd ~/Downloads
ls -la

This should return something like:

total 90664
drwxr-xr-x.  2 hduser hduser       41 Apr 18 09:05 .
drwx------. 14 hduser hduser     4096 Apr 18 07:59 ..
-rw-rw-r--.  1 hduser hduser 92834839 Apr 18 09:03 apache-hive-1.2.1-bin.tar.gz

Installing Hive

It is time to install Hive on our system. Use the following commands to verify the download and extract the hive archive:

cd ~/Downloads
tar -zxvf apache-hive-1.2.1-bin.tar.gz
ls -la

This should return something like:

total 90668
drwxr-xr-x.  3 hduser hduser       69 Apr 18 09:10 .
drwx------. 14 hduser hduser     4096 Apr 18 07:59 ..
drwxrwxr-x.  8 hduser hduser     4096 Apr 18 09:10 apache-hive-1.2.1-bin
-rw-rw-r--.  1 hduser hduser 92834839 Apr 18 09:03 apache-hive-1.2.1-bin.tar.gz

Move the Hive files to the /usr/lib/hive directory.

sudo mv /home/hduser/Downloads/apache-hive-1.2.1-bin /usr/local/hive
ls -la /usr/local/hive

This should return something like:

total 476
drwxrwxr-x.  8 hduser hduser   4096 Apr 18 09:10 .
drwxr-xr-x. 13 root   root     4096 Apr 18 09:23 ..
drwxrwxr-x.  3 hduser hduser   4096 Apr 18 09:10 bin
drwxrwxr-x.  2 hduser hduser   4096 Apr 18 09:10 conf
drwxrwxr-x.  4 hduser hduser     32 Apr 18 09:10 examples
drwxrwxr-x.  7 hduser hduser     63 Apr 18 09:10 hcatalog
drwxrwxr-x.  4 hduser hduser   4096 Apr 18 09:10 lib
-rw-rw-r--.  1 hduser hduser  24754 Apr 29  2015 LICENSE
-rw-rw-r--.  1 hduser hduser    397 Jun 19  2015 NOTICE
-rw-rw-r--.  1 hduser hduser   4366 Jun 19  2015 README.txt
-rw-rw-r--.  1 hduser hduser 421129 Jun 19  2015 RELEASE_NOTES.txt
drwxrwxr-x.  3 hduser hduser     22 Apr 18 09:10 scripts

Some modifications are required in the .bashrc file.

vi ~/.bashrc

Add the following lines to the file:

# Hive Home
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin

Use the following command to execute the ~/.bashrc file.

source ~/.bashrc

Create a temporary Hive directory by using the command:

mkdir /tmp/hive

Ensure that Hadoop is up and running. This can be done by using the command:

jps

If Hadoop is not running...start it up. For me, this is:

./hadoopstartup.sh

Start hive by using the command:

hive

This should return something like:

hive>

To view the Hive properties, use the following command:

hive> set -v

This should return something like: