Pig Split command


============SPLIT==================

Emp1 = LOAD '/user/cloudera/N_EMP1.txt' USING PigStorage(',') as ( sno:int , name:chararray , role:chararray , salary:chararray , company:chararray , exp:int);
Emp2 = LOAD '/user/cloudera/N_EMP2.txt' USING PigStorage(',') as ( sno:int , name:chararray , role:chararray , salary:int , company:chararray , exp:int);

The SPLIT operator is used to split a relation into two or more relations.

SPLIT Emp1 into X if (exp > 3), Y if (exp < 3);
dump X;

==========CROSS========
c = CROSS Emp1,Emp2;
 dump c;

Join:
TXN = LOAD '/user/cloudera/txns.txt' USING PigStorage(',')  as (txnno:INT, txndate:CHARARRAY, custno:INT, amount:INT, category:CHARARRAY, product:CHARARRAY, city:CHARARRAY, state:CHARARRAY, spendby:CHARARRAY);
CUTS = LOAD '/user/cloudera/custs' USING PigStorage(',')  as (cusid:int, firstname: CHARARRAY, lastname: CHARARRAY, age:int,profession:CHARARRAY);

J = Join TXN by custno, CUTS by cusid;
L = Limit J 20;
DUMP L;

·       Left Outer Joins:
A left outer join means records from the left side willbe included even when they do not have a match on the right side.
LJ= JOIN F by custno LEFT OUTER, F1 by cusid;
LJ1= LIMIT LJ 15;
DUMP LJ1;

·       Right Outer Join:
A right outer joins means records from the right side will be included even when they do nothave a match on the left side.
RJ= JOIN F by custno RIGHT OUTER, F1 by cusid;
RJ1= LIMIT RJ 15;
DUMP RJ1;
·       Full Outer Join:
A full outer join means records from both sides are taken even when they do not have matches.
FJ = JOIN F by custno FULL OUTER , F1 by cusid;
DUMP FJ;
·       Self Join:
Self joins are supported, though the data must be loaded twice.
trans = LOAD '/user/cloudera/txns.txt' AS (txnno:chararray,txndate:chararray,custno:int,amount:chararray,category:chararray,product:chararray,city:chararray,state:chararray,spendby:chararray);
trans1 = LOAD '/user/cloudera/txns.txt' AS (txnno:chararray,txndate:chararray,custno:int,amount:chararray,category:chararray,product:chararray,city:chararray,state:chararray,spendby:chararray);
SF = JOIN trans by custno, trans1 by custno;
SF1= LIMIT SF 15;
DUMP SF1;

Comments

Popular posts from this blog

Hadoop

Problem Statement Of Real Estate Use Cases

Problem Statement Of Bank Marketing analysis