reosoftproductions.com
RODNEY AND ARLYN'S WEB SITE
Pig

Pig Installation

Pig Installation

Navigation

Pig Installation

These instructions are a modified form of the Pig instructions found on the Apache website.

Mandatory Components

Unix and Windows users must have several components installed and configured on the server.

Hadoop (Required)

Hadoop 0.23.X, 1.X or 2.X - It is possible to run Pig with different version of Hadoop by setting HADOOP_HOME to point to the directory where Hadoop is installed. If HADOOP_HOME is not set, Pig will run with the embedded version, which is currently Hadoop 1.0.4.

Java (Required)

Java 1.7 - Set JAVA_HOME to the root of your Java installation.

Python (Optional)

Python 2.7 - Used when using Streaming Python UDFs.

Ant (Optional)

Ant 1.8 - For builds.

Download Pig

Using your browser, go to the Apache Pig distribution website. This can be found at http://www-us.apache.org/dist/pig/. Locate a file called pig-0.nn.nn.tar.gz.

Use the wget command to download the file. The following is for the 0.15 version:

cd ~/Downloads
wget http://www-us.apache.org/dist/pig/pig-0.15.0/pig-0.15.0.tar.gz

This will download the tar file to your /home/hduser/Downloads directory. Verify the download.

ls -la

This should return something like:

total 218628
drwxr-xr-x.  2 hduser hduser        41 Apr 18 09:05 .
drwx------. 14 hduser hduser      4096 Apr 18 07:59 ..
-rw-rw-r--.  1 hduser hduser 120917625 Jun  5  2015 pig-0.15.0.tar.gz
tar -zxvf pig-0.15.0.tar.gz
ls -la

This should return something like:

total 218628
drwxr-xr-x.  2 hduser hduser        41 Apr 18 09:05 .
drwx------. 14 hduser hduser      4096 Apr 18 07:59 ..
drwxr-xr-x. 16 hduser hduser      4096 Jun  1  2015 pig-0.15.0
-rw-rw-r--.  1 hduser hduser 120917625 Jun  5  2015 pig-0.15.0.tar.gz

Move the Pig files to the /usr/local/pig directory.

sudo mv /home/hduser/Downloads/pig-0.15.0 /usr/local/pig
ls -la /usr/local/pig

This should return something like:

total 8512
drwxr-xr-x. 16 hduser hduser    4096 Jun  1  2015 .
drwxr-xr-x. 14 root   root      4096 Apr 19 20:58 ..
drwxr-xr-x.  2 hduser hduser      43 Apr 19 20:48 bin
-rw-rw-r--.  1 hduser hduser   89447 Jun  1  2015 build.xml
-rw-rw-r--.  1 hduser hduser  190118 Jun  1  2015 CHANGES.txt
drwxr-xr-x.  2 hduser hduser      87 Apr 19 20:48 conf
drwxr-xr-x.  3 hduser hduser      40 Apr 19 20:48 contrib
drwxr-xr-x.  6 hduser hduser    4096 Apr 19 20:48 docs
drwxr-xr-x.  2 hduser hduser    4096 Apr 19 20:48 ivy
-rw-rw-r--.  1 hduser hduser   26978 Jun  1  2015 ivy.xml
drwxr-xr-x.  2 hduser hduser      82 Apr 19 20:48 legacy
drwxr-xr-x.  5 hduser hduser    4096 Apr 19 20:48 lib
drwxr-xr-x.  3 hduser hduser      18 Jun  1  2015 lib-src
drwxr-xr-x.  2 hduser hduser    4096 Apr 19 20:48 license
-rw-rw-r--.  1 hduser hduser   11358 Jun  1  2015 LICENSE.txt
-rw-rw-r--.  1 hduser hduser    2125 Jun  1  2015 NOTICE.txt
-rw-rw-r--.  1 hduser hduser 4021860 Jun  1  2015 pig-0.15.0-core-h1.jar
-rw-rw-r--.  1 hduser hduser 4321305 Jun  1  2015 pig-0.15.0-core-h2.jar
-rw-rw-r--.  1 hduser hduser    1341 Jun  1  2015 README.txt
-rw-rw-r--.  1 hduser hduser    1847 Jun  1  2015 RELEASE_NOTES.txt
drwxr-xr-x.  2 hduser hduser       6 Jun  1  2015 scripts
drwxr-xr-x.  4 hduser hduser      27 Jun  1  2015 shims
drwxr-xr-x.  8 hduser hduser    4096 Apr 19 20:48 src
drwxr-xr-x.  9 hduser hduser    4096 Apr 19 20:48 test
drwxr-xr-x.  5 hduser hduser      57 Apr 19 20:48 tutorial

Some modifications are required in the .bashrc file.

vi ~/.bashrc

Add the following lines to the file:

# Pig Home
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$PIG_HOME/bin

Use the following command to execute the ~/.bashrc file.

source ~/.bashrc

Test the Pig installation with this simple command:

pig -help

This should return something like:

Apache Pig version 0.15.0 (r1682971) 
compiled Jun 01 2015, 11:44:35

USAGE: Pig [options] [-] : Run interactively in grunt shell.
       Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
       Pig [options] [-f[ile]] file : Run cmds found in file.
  options include:
    -4, -log4jconf - Log4j configuration file, overrides log conf
    -b, -brief - Brief logging (no timestamps)
    -c, -check - Syntax check
    -d, -debug - Debug level, INFO is default
    -e, -execute - Commands to execute (within quotes)
    -f, -file - Path to the script to execute
    -g, -embedded - ScriptEngine classname or keyword for the ScriptEngine
    -h, -help - Display this message. You can specify topic to get help for 
         that topic.   properties is the only topic currently supported: -h 
         properties.
    -i, -version - Display version information
    -l, -logfile - Path to client side log file; default is current working 
         directory.
    -m, -param_file - Path to the parameter file
    -p, -param - Key value pair of the form param=val
    -r, -dryrun - Produces script with substituted parameters. Script is not executed.
    -t, -optimizer_off - Turn optimizations off. The following values are supported:
            ConstantCalculator - Calculate constants at compile time
            SplitFilter - Split filter conditions
            PushUpFilter - Filter as early as possible
            MergeFilter - Merge filter conditions
            PushDownForeachFlatten - Join or explode as late as possible
            LimitOptimizer - Limit as early as possible
            ColumnMapKeyPrune - Remove unused data
            AddForEach - Add ForEach to remove unneeded columns
            MergeForEach - Merge adjacent ForEach
            GroupByConstParallelSetter - Force parallel 1 for "group all" 
            statement
            PartitionFilterOptimizer - Pushdown partition filter conditions to 
            loader
            implementing LoadMetaData
            PredicatePushdownOptimizer - Pushdown filter predicates to loader 
            implementing LoadPredicatePushDown
            All - Disable all optimizations
        All optimizations listed here are enabled by default. Optimization 
            values are case insensitive.
    -v, -verbose - Print all error messages to screen
    -w, -warning - Turn warning logging on; also turns warning aggregation off
    -x, -exectype - Set execution mode: local|mapreduce|tez, default is mapreduce.
    -F, -stop_on_failure - Aborts execution on the first failed job; default is off
    -M, -no_multiquery - Turn multiquery optimization off; default is on
    -N, -no_fetch - Turn fetch optimization off; default is on
    -P, -propertyFile - Path to property file
    -printCmdDebug - Overrides anything else and prints the actual command 
                     used to run Pig, including any environment variables that
                     are set by the pig command.
16/04/19 21:02:55 INFO pig.Main: Pig script completed in 61 milliseconds (61 ms)

Pig is now installed.