I have a set of integer values that I want to group in a bundle group.
Example: Say I have 1000 points between 1 and 1000, and I want to do 20 bins.
Is there any group in the bin / array?
In addition, I would not know ahead of time how wide the range will be, so I can not hardcode a specific value.
If you have a minimum and maximum, you can divide the number of cans by bins. For example,
- foo.pig id = load as '$ INPUT' (id: int); Ids_with_key = foreach id generated (id - $ MIN) * $ BIN_COUNT / ($ MAX-$ MIN + 1) as bin_id, id; Group_by_id = Group_id_key by bin_id; Bin_id = foreach group_by_id generates group, flatten (ids_with_key.id); Dump bin_id; Then you can use the following command to run it:
Poor-F foo.pig -p MIN = 1 -p MAX = 1000 -p BIN_COUNT = 20 -p INPUT = your_input_path The idea behind the script is that we get the size [MIN, MAX] to get the shape Can divide every bin: (MAX - MIN + 1) / BIN_COUNT , which is called BIN_SIZE, then we will call ID in bin number: (id - MIN) / BIN_SIZE Map, and group them.
No comments:
Post a Comment