Pig Installation
Navigation
- Apache Pig Website
- Wiki
- Cheat Sheets
- Pig's Data Model
- My Pig Installation
- My Pig Logging
- My Pig SET Keys
- My Pig Recipes
- My Pig UDF
- Piggybank!!
- Pig's Parameter Substitution
- Hadoop and Pig
- Programming Pig (O Reilly)
Pig Installation
These instructions are a modified form of the Pig instructions found on the Apache website.
Mandatory Components
Unix and Windows users must have several components installed and configured on the server.
Hadoop (Required)
Hadoop 0.23.X, 1.X or 2.X - It is possible to run Pig with different version of Hadoop by setting HADOOP_HOME to point to the directory where Hadoop is installed. If HADOOP_HOME is not set, Pig will run with the embedded version, which is currently Hadoop 1.0.4.
Java (Required)
Java 1.7 - Set JAVA_HOME to the root of your Java installation.
Python (Optional)
Python 2.7 - Used when using Streaming Python UDFs.
Ant (Optional)
Ant 1.8 - For builds.
Download Pig
Using your browser, go to the Apache Pig distribution website. This can be found at
http://www-us.apache.org/dist/pig/. Locate a file called pig-0.nn.nn.tar.gz
.
Use the wget
command to download the file. The following is for the 0.15 version:
cd ~/Downloads wget http://www-us.apache.org/dist/pig/pig-0.15.0/pig-0.15.0.tar.gz
This will download the tar
file to your /home/hduser/Downloads
directory. Verify the download.
ls -la
This should return something like:
total 218628 drwxr-xr-x. 2 hduser hduser 41 Apr 18 09:05 . drwx------. 14 hduser hduser 4096 Apr 18 07:59 .. -rw-rw-r--. 1 hduser hduser 120917625 Jun 5 2015 pig-0.15.0.tar.gz
tar -zxvf pig-0.15.0.tar.gz ls -la
This should return something like:
total 218628 drwxr-xr-x. 2 hduser hduser 41 Apr 18 09:05 . drwx------. 14 hduser hduser 4096 Apr 18 07:59 .. drwxr-xr-x. 16 hduser hduser 4096 Jun 1 2015 pig-0.15.0 -rw-rw-r--. 1 hduser hduser 120917625 Jun 5 2015 pig-0.15.0.tar.gz
Move the Pig files to the /usr/local/pig
directory.
sudo mv /home/hduser/Downloads/pig-0.15.0 /usr/local/pig ls -la /usr/local/pig
This should return something like:
total 8512 drwxr-xr-x. 16 hduser hduser 4096 Jun 1 2015 . drwxr-xr-x. 14 root root 4096 Apr 19 20:58 .. drwxr-xr-x. 2 hduser hduser 43 Apr 19 20:48 bin -rw-rw-r--. 1 hduser hduser 89447 Jun 1 2015 build.xml -rw-rw-r--. 1 hduser hduser 190118 Jun 1 2015 CHANGES.txt drwxr-xr-x. 2 hduser hduser 87 Apr 19 20:48 conf drwxr-xr-x. 3 hduser hduser 40 Apr 19 20:48 contrib drwxr-xr-x. 6 hduser hduser 4096 Apr 19 20:48 docs drwxr-xr-x. 2 hduser hduser 4096 Apr 19 20:48 ivy -rw-rw-r--. 1 hduser hduser 26978 Jun 1 2015 ivy.xml drwxr-xr-x. 2 hduser hduser 82 Apr 19 20:48 legacy drwxr-xr-x. 5 hduser hduser 4096 Apr 19 20:48 lib drwxr-xr-x. 3 hduser hduser 18 Jun 1 2015 lib-src drwxr-xr-x. 2 hduser hduser 4096 Apr 19 20:48 license -rw-rw-r--. 1 hduser hduser 11358 Jun 1 2015 LICENSE.txt -rw-rw-r--. 1 hduser hduser 2125 Jun 1 2015 NOTICE.txt -rw-rw-r--. 1 hduser hduser 4021860 Jun 1 2015 pig-0.15.0-core-h1.jar -rw-rw-r--. 1 hduser hduser 4321305 Jun 1 2015 pig-0.15.0-core-h2.jar -rw-rw-r--. 1 hduser hduser 1341 Jun 1 2015 README.txt -rw-rw-r--. 1 hduser hduser 1847 Jun 1 2015 RELEASE_NOTES.txt drwxr-xr-x. 2 hduser hduser 6 Jun 1 2015 scripts drwxr-xr-x. 4 hduser hduser 27 Jun 1 2015 shims drwxr-xr-x. 8 hduser hduser 4096 Apr 19 20:48 src drwxr-xr-x. 9 hduser hduser 4096 Apr 19 20:48 test drwxr-xr-x. 5 hduser hduser 57 Apr 19 20:48 tutorial
Some modifications are required in the .bashrc
file.
vi ~/.bashrc
Add the following lines to the file:
# Pig Home export PIG_HOME=/usr/local/pig export PATH=$PATH:$PIG_HOME/bin
Use the following command to execute the ~/.bashrc
file.
source ~/.bashrc
Test the Pig installation with this simple command:
pig -help
This should return something like:
Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35 USAGE: Pig [options] [-] : Run interactively in grunt shell. Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s). Pig [options] [-f[ile]] file : Run cmds found in file. options include: -4, -log4jconf - Log4j configuration file, overrides log conf -b, -brief - Brief logging (no timestamps) -c, -check - Syntax check -d, -debug - Debug level, INFO is default -e, -execute - Commands to execute (within quotes) -f, -file - Path to the script to execute -g, -embedded - ScriptEngine classname or keyword for the ScriptEngine -h, -help - Display this message. You can specify topic to get help for that topic. properties is the only topic currently supported: -h properties. -i, -version - Display version information -l, -logfile - Path to client side log file; default is current working directory. -m, -param_file - Path to the parameter file -p, -param - Key value pair of the form param=val -r, -dryrun - Produces script with substituted parameters. Script is not executed. -t, -optimizer_off - Turn optimizations off. The following values are supported: ConstantCalculator - Calculate constants at compile time SplitFilter - Split filter conditions PushUpFilter - Filter as early as possible MergeFilter - Merge filter conditions PushDownForeachFlatten - Join or explode as late as possible LimitOptimizer - Limit as early as possible ColumnMapKeyPrune - Remove unused data AddForEach - Add ForEach to remove unneeded columns MergeForEach - Merge adjacent ForEach GroupByConstParallelSetter - Force parallel 1 for "group all" statement PartitionFilterOptimizer - Pushdown partition filter conditions to loader implementing LoadMetaData PredicatePushdownOptimizer - Pushdown filter predicates to loader implementing LoadPredicatePushDown All - Disable all optimizations All optimizations listed here are enabled by default. Optimization values are case insensitive. -v, -verbose - Print all error messages to screen -w, -warning - Turn warning logging on; also turns warning aggregation off -x, -exectype - Set execution mode: local|mapreduce|tez, default is mapreduce. -F, -stop_on_failure - Aborts execution on the first failed job; default is off -M, -no_multiquery - Turn multiquery optimization off; default is on -N, -no_fetch - Turn fetch optimization off; default is on -P, -propertyFile - Path to property file -printCmdDebug - Overrides anything else and prints the actual command used to run Pig, including any environment variables that are set by the pig command. 16/04/19 21:02:55 INFO pig.Main: Pig script completed in 61 milliseconds (61 ms)
Pig is now installed.