Hive Installation
Hive
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL with schema on read and transparently converts queries to map/reduce, Apache Tez and Spark jobs. All three execution engines can run in Hadoop YARN. To accelerate queries, it provides indexes, including bitmap indexes.
Java
Java must be installed on your Linux system. Verify that Java is installed following these steps. Use the following command:
java -version
This should return something like:
openjdk version "1.8.0_77" OpenJDK Runtime Environment (build 1.8.0_77-b03) OpenJDK 64-Bit Server VM (build 25.77-b03, mixed mode)
Java is currently installed. It is necessary to know where our Java installation is located. To identify where it is located use:
whereis java
This should return something like:
java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java /usr/share/man/man1/ java.1.gz
Note: whereis
locates source/binary and manuals sections for specified files. The supplied names are first stripped of leading pathname components and any (single) trailing extension of the form ".ext", for example, ".c". Prefixes of "s." resulting from use of source code control are also handled. whereis then attempts to locate the desired program in a list of standard Linux places.
Open your .bashrc
file to ensure that Java is configured correctly.
export JAVA_HOME="/usr/lib/java" export PATH="$PATH:$JAVA_HOME/bin"
Check the alternatives
symbolic links to ensure they are established correctly.
alternatives --display java alternatives --display javac alternatives --display jar# alternatives --install /usr/bin/java java usr/local/java/bin/java 2 # alternatives --install /usr/bin/javac javac usr/local/java/bin/javac 2 # alternatives --install /usr/bin/jar jar usr/local/java/bin/jar 2 alternatives [options] --install link name path priority # alternatives --set java usr/local/java/bin/java # alternatives --set javac usr/local/java/bin/javac # alternatives --set jar usr/local/java/bin/jar alternatives [options] --set name path alternatives [options] --display name
Hadoop
Hadoop must be installed on your system before installing Hive. To verify that it is installed, use:
hadoop version
This should return something like:
Hadoop 2.2.0.2.1.0.0-92 Subversion [email protected]:hortonworks/hadoop.git -r e544b3bc07472c6ca2c56a59c36e2536370eafdf Compiled by jenkins on 2013-12-13T03:04Z Compiled with protoc 2.5.0 From source with checksum 1c6398d7af826956aa7866b088eba7d This command was run using /usr/lib/hadoop/hadoop-common-2.2.0.2.1.0.0-92.jar
It appears that Hadoop is installed and configured correctly.
Downloading Hive
Go to your Downloads directory.
cd ~/Downloads
You can get the tar
in one of two ways. First, use the wget
command:
wget http://www-us.apache.org/dist/hive/stable/apache-hive-1.2.1-bin.tar.gz
This will download the tar
file to your /home/hduser/Downloads
directory.
Hive can also be downloaded from http://www-us.apache.org/dist/hive/stable/. For this example, click on apache-hive-1.2.1-bin.tar.gz
to initiate a download. In most cases, the file will be located in the Downloads directory. To verify, use the following command:
cd ~/Downloads ls -la
This should return something like:
total 90664 drwxr-xr-x. 2 hduser hduser 41 Apr 18 09:05 . drwx------. 14 hduser hduser 4096 Apr 18 07:59 .. -rw-rw-r--. 1 hduser hduser 92834839 Apr 18 09:03 apache-hive-1.2.1-bin.tar.gz
Installing Hive
It is time to install Hive on our system. Use the following commands to verify the download and extract the hive archive:
cd ~/Downloads tar -zxvf apache-hive-1.2.1-bin.tar.gz ls -la
This should return something like:
total 90668 drwxr-xr-x. 3 hduser hduser 69 Apr 18 09:10 . drwx------. 14 hduser hduser 4096 Apr 18 07:59 .. drwxrwxr-x. 8 hduser hduser 4096 Apr 18 09:10 apache-hive-1.2.1-bin -rw-rw-r--. 1 hduser hduser 92834839 Apr 18 09:03 apache-hive-1.2.1-bin.tar.gz
Move the Hive files to the /usr/lib/hive
directory.
sudo mv /home/hduser/Downloads/apache-hive-1.2.1-bin /usr/local/hive ls -la /usr/local/hive
This should return something like:
total 476 drwxrwxr-x. 8 hduser hduser 4096 Apr 18 09:10 . drwxr-xr-x. 13 root root 4096 Apr 18 09:23 .. drwxrwxr-x. 3 hduser hduser 4096 Apr 18 09:10 bin drwxrwxr-x. 2 hduser hduser 4096 Apr 18 09:10 conf drwxrwxr-x. 4 hduser hduser 32 Apr 18 09:10 examples drwxrwxr-x. 7 hduser hduser 63 Apr 18 09:10 hcatalog drwxrwxr-x. 4 hduser hduser 4096 Apr 18 09:10 lib -rw-rw-r--. 1 hduser hduser 24754 Apr 29 2015 LICENSE -rw-rw-r--. 1 hduser hduser 397 Jun 19 2015 NOTICE -rw-rw-r--. 1 hduser hduser 4366 Jun 19 2015 README.txt -rw-rw-r--. 1 hduser hduser 421129 Jun 19 2015 RELEASE_NOTES.txt drwxrwxr-x. 3 hduser hduser 22 Apr 18 09:10 scripts
Some modifications are required in the .bashrc
file.
vi ~/.bashrc
Add the following lines to the file:
# Hive Home export HIVE_HOME=/usr/local/hive export PATH=$PATH:$HIVE_HOME/bin
Use the following command to execute the ~/.bashrc
file.
source ~/.bashrc
Create a temporary Hive directory by using the command:
mkdir /tmp/hive
Ensure that Hadoop is up and running. This can be done by using the command:
jps
If Hadoop is not running...start it up. For me, this is:
./hadoopstartup.sh
Start hive
by using the command:
hive
This should return something like:
hive>
To view the Hive properties, use the following command:
hive> set -v
This should return something like: