Saturday 15 January 2011

apache pig - avoiding prefixes in multi relation join in pig -


I am trying to type a star schema type in the pig and below is my code when I use different columns I will prefix in the name of the previous time so that she can work. I'm sure there should be some better way, any signal I would not find it through googling would be very useful.

or prefix a column like "H864 :: H86 :: hs_8_d :: hs_8_desc" that I want to save.

  hs_8 = Load 'HS_8_District' to be used as pygraphogays ('^') (HS_8: round, HS_8_dcc: charray); hs_8_d = generated foreach hs_8 hs_2, SUBSTRING (hs_8,0,4) as hs_4, SUBSTRING (hs_8,0,6) SUBSTRING (hs_8,0,2) as as hs_6, hs_8, hs_8_desc; Hs_6_d = Load 'HS_ 6_District' to be used as a bugfixes ('^') (HS_6: Charar, HS_6DDC: Charray); Hs_4_d = LOAD 'hs_4_distinct' is used as PigStorage ('^') (hs_4: chararray, hs_4_desc: chararray); Hs_2_d = LOAD 'hs_2_distinct' used as PigStorage ('^') (hs_2: chararray, hs_2_desc: chararray); H86 = JOIN HS_8_D BY HS_6, HS_6_D BY HS_6 USING 'REPLYED'; H864 = JOIN H 86 by HS_8_D :: HSAN, HSN_D H 'Use of Replication' by HSDL; H2642 = Join H864 by H86 :: hs_8_d :: hs_2, hs_2_d by hs_2 'used'; hs_dim = foreach H8642 generated hs_2_d :: hs_2, hs_2_d :: hs_2_desc, H864 :: hs_4_d :: hs_4, H864 :: hs_4_d :: hs_4_desc, H864 :: H86 :: hs_6_d :: hs_6, H864 :: H86 :: hs_6_d: : Hs_6_desc, H864 :: H86 :: hs_8_d :: hs_8, h864 :: h86 :: hs_8_d :: hs_8_desc; You can simplify nicknames by adding an additional foreach to join  

. Check the data, it will not add the additional MR job to the pipeline. Origin and this 4 map will be released for jobs

For example:.

  H86 = foreach (joining hs_8_d hs_6, repeated 'using hs_6_d by hs_6') generated Hs_8_d :: hs_2 x1, x2 in the hs_8_d :: hs_4, hs_8_d :: hs_6 as x3, hs_8_d :: hs_8 as x4, hs_8_d :: as hs_8_desc x5, hs_6_d :: as hs_6 x6, hs_6_d :: hs_6_desc as x7; H864 = foreach (x2 by H86, hs_4 by hs_4_d use 'repeated') H86 generated :: y1 form x1, h86 :: y2 look x2, x86, x86, x86, x86, x86, : x5 as x4 Y5, H86 :: as x6 y6, H86 :: x7 as y7, hs_4_d :: as hs_4 y8, hs_4_d :: as Hs_4_desc Y9; H8642 = foreach (y1 by H864, hs_2 by hs_2_d be used 'repeated') H864: Y1 as generated Z1, Y3 as H864 :: z2, y2 as H864 :: z3, H864 :: Z4 as Y4, H864 :: y5 z5, H864 :: as z6 as Y6, H864 :: Y7 as Z7, H864 :: y8 as the Z8, H864 :: z9 as Y9, Zl0 In hs_2_d :: hs_2, hs_2_d :: hs_2_desc as Z11; Hs_dim = Generate FOREACH H8642 z10, z11, z8, z9, z6, z7, z4, z5;   

If you have a bag of tuples, it can be useful.

No comments:

Post a Comment