Sunday 15 January 2012

Iterate awk function for every unique field in column -


I have written a strange script to analyze my table data - I calculate the P-Vel and log2 interrupt ratio I am here.
This is an example of a data table I have.

 Label Value 1 Value 2 Label 1 9 6 Label 1 7 6 Label 1 1 6 Label 2 5 7 Label 2 3 7 Label 2 8 7  

Each label ( label 1/2 ) How often do I get to value1 & gt; Value2 and this number was divided by the total number of times Label - I'm getting a P-value. In addition, I compare their log2 ratio.
This is my awk script.

  awk '{a [$ 1] = $ 1}; ($ 2 & gt; = $ 3) {c ++}; {Sum + = $ 2} END {Print C / NR, log ($ 3 / (amount / NR)) / log (2), a [$ 1]} '  

and The result is I get

 0.666667 0.0824622 Label1  

Column 1 is P-value; Column 2 is an odd ratio; Column 3 is labeled

The problem is that I do not know how this calculation is being applied for both labels - I'm getting results only for the first time.

My question is for each unique field in column 1 ( Label1 / 2 ) How to iterate the kind of awk function>

I agree with two rows before the first line of data, so I have 3 NR of Compare The program's previous label name ( $ 1 ) and only when it changes ( $ 1! = Label ) it calculates and prints The other condition ( NR> = 3 ) saves data only when processed on that label.

  awk 'NR == 3 {label = $ 1} NR & gt; = 3 & amp; & Amp; $ 1! = Label {printf "% .6f% .6f% s \ n", c / l, log (v / (sum / l)) / log (2), label c = l = sum = 0 label = $ 1} NR & gt; = 3 {if ($ 2> = $ 3) {c ++} l ++ sum + = $ 2 v = $ 3} END {printf "% .6f% .6f% s \ n", C / L, log (v / (sum / l)) / log (2), label} 'info'   

this yield:

  0.666667 0.082462 Label1 0.333333 0.392317 Label2    

No comments:

Post a Comment