Pig Recipes
Recipe 002
Navigation
- Apache Pig Website
- Wiki
- Cheat Sheets
- Pig's Data Model
- My Pig Installation
- My Pig Logging
- My Pig SET Keys
- My Pig Recipes
- My Pig UDF
- Piggybank!!
- Pig's Parameter Substitution
- Hadoop and Pig
- Programming Pig (O Reilly)
Goal
Copy a file from one location to another within Hadoop (HDFS).
Source File: /user/hduser/dvd/KeyValuePair.txt
Target File: /user/hduser/dvd/out/
hduser> hdfs dfs -ls /user/hduser/dvd/out Found 1 items -rw-r--r-- 1 hduser supergroup 95981130 2016-04-20 23:15 /user/hduser/dvd/out/part-m-00000
The following is the script file.
/*****************************************************************************/ /* Recipe002.pig */ /* Pig Execution Mode: mapreduce */ /* Pig Batch Execution: pig -x mapreduce Recipe002.pig */ /* pig Recipe002.pig */ /* Source HDFS File: /user/hduser/dvd/KeyValuePair.txt */ /* Target HDFS File: /user/hduser/dvd/out/part-m-00000 */ /* */ /* The input file uses commas (,) to delimit the fields. PigStorage must be */ /* specified because PigStorage assumes a tab-delimited format if it is not */ /* specified. */ /* */ /* If the target (output) directory already exists (i.e. Output directory */ /* hdfs://hdcentos:9000/user/hduser/dvd/out already exists), the process */ /* will fail. To remove the output directory before executing this script, */ /* use: */ /* hdfs dfs -rm -r /user/hduser/dvd/out */ /*****************************************************************************/ /* Date Initials Description */ /* -------- -------- ------------------------------------------------------- */ /* 20160420 Reo Initial. */ /*****************************************************************************/ /*****************************************************************************/ /* The following SET command will suppress the creation of the _SUCCESS file */ /* in the output directory. */ /*****************************************************************************/ SET mapreduce.fileoutputcommitter.marksuccessfuljobs false; /*****************************************************************************/ /* Read in the data using a comma (,) as the delimiter. */ /*****************************************************************************/ DVDData = LOAD '/user/hduser/dvd/KeyValuePair.txt' USING PigStorage(',') AS ( DVDName:chararray, AttributeName:chararray, AttributeValue:chararray ); /*****************************************************************************/ /* Time to STORE the data that was just read in. */ /*****************************************************************************/ STORE DVDData INTO '/user/hduser/dvd/out';