I am trying to figure out a way to sum the numbers of columns when the second column repeats.
My file looks like this:
0.35 Scer | Chrix | Ref | NC_001141.1 | 0.21 scar | Chirix | Referee | NC_001141.1 | 0.40 SKIRE | Chirix | Referee | NC_001141.1 | 0.27 Scair | Chirix | Referee | NC_001141.1 | 0.26 scar. Chirix | Referee | NC_001141.1 | 0.20 scar. Chirix | Referee | NC_001141.1 | 1.22 scar. Croatia | Referee | NC_001133.7 | 0.08 skiers | Kroi | Referee | NC_001133.7 | 0.55 scare | Chaiviiii | Referee | NC_001140.5 | 0.07 Skier | Charvavi | Referee | NC_001140.5 | 0.17 Scair | CharviIi | Referee | NC_001140.5 | And I have an output file that contains the name of that column, in which the value of the previous values for that particular string would be:
Scer | Chrix | Referee | NC_001141.1 | 1.69 skiers | Kroi | Referee | NC_001133.7 | 1.30 scar. Chirvii | Referee | NC_001140.5 | 0.7 9 I think it is possible with awk , but I did not come out with the correct answer nor found in the forums. Thank you very much in advance
with awk : Printf "% s \ n% 4.2f \ n", x, a for: awk '{a [$ NF] + = $ 1} for END {one} [X]} 'file Output with your sample data: $ awk' {a [$ NF] + = $ 1} END { (One in X) printf "% s \ n% 4.2f \ n", x, a [x]} 'file skier. CharviIi | Referee | NC_001140.5 | 0.7 9 Scherder | Chirix | Referee | NC_001141.1 | 1.69 skiers | Kroi | Referee | NC_001133.7 | 1.30 If the output is required in order: awk 'seen == $ 2 {cnt + = $ 1; Next} flag {printf "% s \ n% 4.2f \ n", seen, CNT; Flag = 0} {saw = $ 2; Cnt = $ 1; Flag = 1} END {printf "% s \ n% 4.2f \ n", seen, cnt} 'file
No comments:
Post a Comment