reosoftproductions.com
RODNEY AND ARLYN'S WEB SITE
Pig

Pig Logging

Pig Logging

Navigation

Logging

Useful Links

Apache Pig utilizes the Apache log4j framework to capture log messages. There are several versions of log4j available. To determine the version used by my implementation, enter the command:

find $PIG_HOME -name log4j*jar

This should return something similar to this:

hduser> find $PIG_HOME -name log4j*jar
/usr/local/pig/lib/hadoop1-runtime/log4j-1.2.16.jar

This indicates that Pig is using log4j V1.2. There are newer versions of log4j available, but the one we are using is fine.

The logging properties are controlled through several files. The first file is the pig.properties file. It is a good idea to back up the .properties file before we attempt to edit it. Use the following command:

cp $PIG_HOME/conf/pig.properties $PIG_HOME/conf/pig.properties.bak_$(date +"%Y%m%d")
vi $PIG_HOME/conf/pig.properties

Look for a line that may look like this:

# log4jconf=./conf/log4j.properties

Since there is a # in the first column, the line is 'disabled'. To enable it, remove the #. If the line is already enabled, then leave the line alone. Save and exit the file.

Check to see if the log4j.properties file exists. This is done by using the command:

ls -la log4j.properties*

The output may look like this:

hduser> ls -la log4j.properties*
-rw-rw-r--. 1 hduser hduser 1136 Jun  1  2015 log4j.properties.template

The file log4j.properties is missing, so copy it from the template.

cp log4j.properties.template log4j.properties

Edit the log4j.properties file with your favorite editor. This is what the default file contains:

# ***** Set root logger level to DEBUG and its only appender to A.
log4j.logger.org.apache.pig=info, A

# ***** A is set to be a ConsoleAppender.
log4j.appender.A=org.apache.log4j.ConsoleAppender
# ***** A uses PatternLayout.
log4j.appender.A.layout=org.apache.log4j.PatternLayout
log4j.appender.A.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

Change the following line and save.

For information on log4j.appender.A.layout.ConversionPattern, see this link.

# ***** Set root logger level to ERRORiERR and its only appender to A.
log4j.logger.org.apache.pig=ERROR, A

Execute your Pig script with the -4 option. See example:

pig -x local -4 $PIG_HOME/conf/log4j.properties Recipe001.pig

When this was executed (Recipe001), the number of messages lowered by 67 lines (415 lines vs. 348 lines).

Hadoop

Hadoop also has a log4j.properties file. The majority of the messages that are generated by Pig come from Hadoop. Since Hadoop does not know about Pig, it does not use Pig's log settings. Therefore, the Hadoop log4j.properties configuration file can also be changed. On my system, it was located in the /etc/hadoop/conf.empty/ directory. The name of the file is also log4j.properties. It would be a good idea to make a copy of the file before attempting to make any changes.

cp /etc/hadoop/conf.empty/log4j.properties /etc/hadoop/conf.empty/log4j.properties.bak_$(date +"%Y%m%d")

Open the log4j.properties with your favorite text editor. Look for the section where the console is defined. Change the

  
# Define some default values that can be overridden by system properties
hadoop.root.logger=INFO,console

log4j.appender.console.layout.ConversionPattern=%5p | %d | %F | %L | %m%n

log4j v2.x PatternLayout

If you want to generate your logging information in a particular format based on a pattern, then you can use the org.apache.log4j.PatternLayout to format your logging information.

Pattern Conversion Characters

The following table explains the characters used to build patterns:

Conversion Character Meaning
c{precision}
logger{precision}
Outputs the name of the logger that published the logging event. The logger conversion specifier can be optionally followed byprecision specifier, which consists of a decimal integer, or a pattern starting with a decimal integer.

If a precision specifier is given and it is an integer value, then only the corresponding number of right most components of the logger name will be printed. If the precision contains other non-integer characters then the name will be abbreviated based on the pattern. If the precision integer is less than one the right-most token will still be printed in full. By default the logger name is printed in full.
C{precision}
class{precision}
Outputs the fully qualified class name of the caller issuing the logging request. This conversion specifier can be optionally followed byprecision specifier, that follows the same rules as the logger name converter.

Generating the class name of the caller is an expensive operation and may impact performance. Use with caution.
d{pattern}
date{pattern}
Outputs the date of the logging event. The date conversion specifier may be followed by a set of braces containing a date and time pattern string
enc{pattern}
encode{pattern}
Encodes special characters such as '\n' and HTML characters to help prevent log forging and some XSS attacks that could occur when displaying logs in a web browser. Anytime user provided data is logged, this can provide a safeguard.
equals{pattern}{test}{substitution}
equalsIgnoreCase{pattern}{test}{substitution}
Replaces occurrences of 'test', a string, with its replacement 'substitution' in the string resulting from evaluation of the pattern. For example, "%equals{[%marker]}{[]}{}" will replace '[]' strings produces by events without markers with an empty string.

The pattern can be arbitrarily complex and in particular can contain multiple conversion keywords.
F
file
Outputs the file name where the logging request was issued.

Generating the file information is an expensive operation and may impact performance. Use with caution.
l Used to output location information of the caller which generated the logging event.
L Used to output the line number from where the logging request was issued.
m Used to output the application supplied message associated with the logging event.
M Used to output the method name where the logging request was issued.
n Outputs the platform dependent line separator character or characters.
p|level{level=label, level=label, ...}
p|level{length=n}
p|level{lowerCase=true|false}
Outputs the level of the logging event. You provide a level name map in the form "level=value, level=value" where level is the name of the Level and value is the value that should be displayed instead of the name of the Level.
r
relative
Outputs the number of milliseconds elapsed since the JVM was started until the creation of the logging event.
t
thread
Outputs the name of the thread that generated the logging event.
x Outputs the Thread Context Stack (also known as the Nested Diagnostic Context or NDC) associated with the thread that generated the logging event.
X The X conversion character is followed by the key for the MDC. For example, X{clientIP} will print the information stored in the MDC against the key clientIP.
% The sequence %% outputs a single percent sign.

For more information, see the Apache information.

Format Modifiers

By default, the relevant information is displayed as output as is. however, with the aid of format modifiers, it is possible to change the minimum field width, the maximum field width and justification. The following table covers various modifier scenarios:

Format Modifier Left Justify Minimum Width Maximum Width Comment
%20c false 20 none Left pad with spaces if the category name is less than 20 characters long.
%-20c true 20 none Right pad with spaces if the category name is less than 20 characters long.
%.30c NA none 30 Truncate from the beginning if the category name is longer than 30 characters.
%20.30c false 20 30 Left pad with spaces if the category name is shorter than 20 characters. However, if the category name is longer than 30 characters, then truncate from the beginning.
%-20.30c true 20 30 Right pad with spaces if the category name is shorter than 20 characters. However, if category name is longer than 30 characters, then truncate from the beginning.