reosoftproductions.com
RODNEY AND ARLYN'S WEB SITE
Pig

Pig Recipes

Recipe 002

Navigation

Goal

Copy a file from one location to another within Hadoop (HDFS).
Source File: /user/hduser/dvd/KeyValuePair.txt
Target File: /user/hduser/dvd/out/

hduser> hdfs dfs -ls /user/hduser/dvd/out
Found 1 items
-rw-r--r--   1 hduser supergroup   95981130 2016-04-20 23:15 /user/hduser/dvd/out/part-m-00000

The following is the script file.

/*****************************************************************************/
/* Recipe002.pig                                                             */
/* Pig Execution Mode:  mapreduce                                            */
/* Pig Batch Execution:  pig -x mapreduce Recipe002.pig                      */
/*                       pig Recipe002.pig                                   */
/* Source HDFS File:  /user/hduser/dvd/KeyValuePair.txt                      */
/* Target HDFS File:  /user/hduser/dvd/out/part-m-00000                      */
/*                                                                           */
/* The input file uses commas (,) to delimit the fields.  PigStorage must be */
/* specified because PigStorage assumes a tab-delimited format if it is not  */
/* specified.                                                                */
/*                                                                           */
/* If the target (output) directory already exists (i.e. Output directory    */
/* hdfs://hdcentos:9000/user/hduser/dvd/out already exists), the process     */
/* will fail.  To remove the output directory before executing this script,  */
/* use:                                                                      */
/*   hdfs dfs -rm -r /user/hduser/dvd/out                                    */
/*****************************************************************************/
/* Date     Initials Description                                             */
/* -------- -------- ------------------------------------------------------- */
/* 20160420 Reo      Initial.                                                */
/*****************************************************************************/

/*****************************************************************************/
/* The following SET command will suppress the creation of the _SUCCESS file */
/* in the output directory.                                                  */
/*****************************************************************************/
SET mapreduce.fileoutputcommitter.marksuccessfuljobs false;

/*****************************************************************************/
/* Read in the data using a comma (,) as the delimiter.                      */
/*****************************************************************************/
DVDData = LOAD '/user/hduser/dvd/KeyValuePair.txt' 
  USING PigStorage(',')
  AS
  (
    DVDName:chararray,
    AttributeName:chararray,
    AttributeValue:chararray
  );
/*****************************************************************************/
/* Time to STORE the data that was just read in.                             */
/*****************************************************************************/
STORE DVDData INTO '/user/hduser/dvd/out';