Pig Logging
Navigation
- Apache Pig Website
- Wiki
- Cheat Sheets
- Pig's Data Model
- My Pig Installation
- My Pig Logging
- My Pig SET Keys
- My Pig Recipes
- My Pig UDF
- Piggybank!!
- Pig's Parameter Substitution
- Hadoop and Pig
- Programming Pig (O Reilly)
Logging
Useful Links
- Apache Logging Services
- log4j 1.2.17
- log4j 2.0
Apache Pig utilizes the Apache log4j framework to capture log messages. There are several versions of log4j available. To determine the version used by my implementation, enter the command:
find $PIG_HOME -name log4j*jar
This should return something similar to this:
hduser> find $PIG_HOME -name log4j*jar /usr/local/pig/lib/hadoop1-runtime/log4j-1.2.16.jar
This indicates that Pig is using log4j V1.2. There are newer versions of log4j available, but the one we are using is fine.
The logging properties are controlled through several files. The first file
is the pig.properties
file. It is a good idea to back up the .properties
file before we attempt to edit it. Use the
following command:
cp $PIG_HOME/conf/pig.properties $PIG_HOME/conf/pig.properties.bak_$(date +"%Y%m%d") vi $PIG_HOME/conf/pig.properties
Look for a line that may look like this:
# log4jconf=./conf/log4j.properties
Since there is a #
in the first column, the line is 'disabled'.
To enable it, remove the #
. If the line is already enabled,
then leave the line alone. Save and exit the file.
Check to see if the log4j.properties
file exists. This is done
by using the command:
ls -la log4j.properties*
The output may look like this:
hduser> ls -la log4j.properties* -rw-rw-r--. 1 hduser hduser 1136 Jun 1 2015 log4j.properties.template
The file log4j.properties
is missing, so copy it from the
template.
cp log4j.properties.template log4j.properties
Edit the log4j.properties
file with your favorite editor. This is what the default file contains:
# ***** Set root logger level to DEBUG and its only appender to A. log4j.logger.org.apache.pig=info, A # ***** A is set to be a ConsoleAppender. log4j.appender.A=org.apache.log4j.ConsoleAppender # ***** A uses PatternLayout. log4j.appender.A.layout=org.apache.log4j.PatternLayout log4j.appender.A.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
Change the following line and save.
For information on log4j.appender.A.layout.ConversionPattern, see this link.
# ***** Set root logger level to ERRORiERR and its only appender to A. log4j.logger.org.apache.pig=ERROR, A
Execute your Pig script with the -4 option. See example:
pig -x local -4 $PIG_HOME/conf/log4j.properties Recipe001.pig
When this was executed (Recipe001), the number of messages lowered by 67 lines (415 lines vs. 348 lines).
Hadoop
Hadoop also has a log4j.properties
file. The majority of the messages that are generated by Pig come from Hadoop. Since Hadoop does not know about Pig, it does not use Pig's log settings. Therefore, the Hadoop log4j.properties
configuration file can also be changed. On my system, it was located in the /etc/hadoop/conf.empty/
directory. The name of the file is also log4j.properties
. It would be a good idea to make a copy of the file before attempting to make any changes.
cp /etc/hadoop/conf.empty/log4j.properties /etc/hadoop/conf.empty/log4j.properties.bak_$(date +"%Y%m%d")
Open the log4j.properties
with your favorite text editor. Look for the section where the console is defined. Change the
# Define some default values that can be overridden by system properties hadoop.root.logger=INFO,console log4j.appender.console.layout.ConversionPattern=%5p | %d | %F | %L | %m%n
log4j v2.x PatternLayout
If you want to generate your logging information in a particular format based
on a pattern, then you can use the
org.apache.log4j.PatternLayout
to format your logging
information.
Pattern Conversion Characters
The following table explains the characters used to build patterns:
Conversion Character | Meaning |
---|---|
c{precision} logger{precision} |
Outputs the name of the logger that published the logging event. The logger conversion specifier can be optionally followed byprecision specifier, which consists of a decimal integer, or a pattern starting with a decimal integer.
If a precision specifier is given and it is an integer value, then only the corresponding number of right most components of the logger name will be printed. If the precision contains other non-integer characters then the name will be abbreviated based on the pattern. If the precision integer is less than one the right-most token will still be printed in full. By default the logger name is printed in full. |
C{precision} class{precision} |
Outputs the fully qualified class name of the caller issuing the logging request. This conversion specifier can be optionally followed byprecision specifier, that follows the same rules as the logger name converter.
Generating the class name of the caller is an expensive operation and may impact performance. Use with caution. |
d{pattern} date{pattern} |
Outputs the date of the logging event. The date conversion specifier may be followed by a set of braces containing a date and time pattern string |
enc{pattern} encode{pattern} |
Encodes special characters such as '\n' and HTML characters to help prevent log forging and some XSS attacks that could occur when displaying logs in a web browser. Anytime user provided data is logged, this can provide a safeguard. |
equals{pattern}{test}{substitution} equalsIgnoreCase{pattern}{test}{substitution} |
Replaces occurrences of 'test', a string, with its replacement 'substitution' in the string resulting from evaluation of the pattern. For example, "%equals{[%marker]}{[]}{}" will replace '[]' strings produces by events without markers with an empty string.
The pattern can be arbitrarily complex and in particular can contain multiple conversion keywords. |
F file |
Outputs the file name where the logging request was issued.
Generating the file information is an expensive operation and may impact performance. Use with caution. |
l | Used to output location information of the caller which generated the logging event. |
L | Used to output the line number from where the logging request was issued. |
m | Used to output the application supplied message associated with the logging event. |
M | Used to output the method name where the logging request was issued. |
n | Outputs the platform dependent line separator character or characters. |
p|level{level=label, level=label, ...} p|level{length=n} p|level{lowerCase=true|false} |
Outputs the level of the logging event. You provide a level name map in the form "level=value, level=value" where level is the name of the Level and value is the value that should be displayed instead of the name of the Level. |
r relative |
Outputs the number of milliseconds elapsed since the JVM was started until the creation of the logging event. |
t thread |
Outputs the name of the thread that generated the logging event. |
x | Outputs the Thread Context Stack (also known as the Nested Diagnostic Context or NDC) associated with the thread that generated the logging event. |
X | The X conversion character is followed by the key for the MDC. For example, X{clientIP} will print the information stored in the MDC against the key clientIP. |
% | The sequence %% outputs a single percent sign. |
For more information, see the Apache information.
Format Modifiers
By default, the relevant information is displayed as output as is. however, with the aid of format modifiers, it is possible to change the minimum field width, the maximum field width and justification. The following table covers various modifier scenarios:
Format Modifier | Left Justify | Minimum Width | Maximum Width | Comment |
---|---|---|---|---|
%20c | false | 20 | none | Left pad with spaces if the category name is less than 20 characters long. |
%-20c | true | 20 | none | Right pad with spaces if the category name is less than 20 characters long. |
%.30c | NA | none | 30 | Truncate from the beginning if the category name is longer than 30 characters. |
%20.30c | false | 20 | 30 | Left pad with spaces if the category name is shorter than 20 characters. However, if the category name is longer than 30 characters, then truncate from the beginning. |
%-20.30c | true | 20 | 30 | Right pad with spaces if the category name is shorter than 20 characters. However, if category name is longer than 30 characters, then truncate from the beginning. |