Wednesday 15 August 2012

bash - Sum numbers from one column when string in second column repeats -


I am trying to figure out a way to sum the numbers of columns when the second column repeats.

My file looks like this:

  0.35 Scer | Chrix | Ref | NC_001141.1 | 0.21 scar | Chirix | Referee | NC_001141.1 | 0.40 SKIRE | Chirix | Referee | NC_001141.1 | 0.27 Scair | Chirix | Referee | NC_001141.1 | 0.26 scar. Chirix | Referee | NC_001141.1 | 0.20 scar. Chirix | Referee | NC_001141.1 | 1.22 scar. Croatia | Referee | NC_001133.7 | 0.08 skiers | Kroi | Referee | NC_001133.7 | 0.55 scare | Chaiviiii | Referee | NC_001140.5 | 0.07 Skier | Charvavi | Referee | NC_001140.5 | 0.17 Scair | CharviIi | Referee | NC_001140.5 |   

And I have an output file that contains the name of that column, in which the value of the previous values ​​for that particular string would be:

  Scer | Chrix | Referee | NC_001141.1 | 1.69 skiers | Kroi | Referee | NC_001133.7 | 1.30 scar. Chirvii | Referee | NC_001140.5 | 0.7 9   

I think it is possible with awk , but I did not come out with the correct answer nor found in the forums.

Thank you very much in advance

with awk : Printf "% s \ n% 4.2f \ n", x, a for:

  awk '{a [$ NF] + = $ 1} for END {one} [X]} 'file   

Output with your sample data:
  $ awk' {a [$ NF] + = $ 1} END { (One in X) printf "% s \ n% 4.2f \ n", x, a [x]} 'file skier. CharviIi | Referee | NC_001140.5 | 0.7 9 Scherder | Chirix | Referee | NC_001141.1 | 1.69 skiers | Kroi | Referee | NC_001133.7 | 1.30   

If the output is required in order:
  awk 'seen == $ 2 {cnt + = $ 1; Next} flag {printf "% s \ n% 4.2f \ n", seen, CNT; Flag = 0} {saw = $ 2; Cnt = $ 1; Flag = 1} END {printf "% s \ n% 4.2f \ n", seen, cnt} 'file    

No comments:

Post a Comment