Sunday, 15 January 2012

r - lawstat runs.test valid for small samples? -


This is an R programming question and a figure question. From my experiments, it seems that the run.test function in R package gives very strange results for small samples. Can anyone confirm, refute and / or explain? My argument is as follows.

My test data count the patent issued to the firm in a tech class in each of the 15 years.

  testpats & lt; -c (2,1,2,0,1,4,1,1,2,4,2,6,1,3,3)   

is running

  runs. First of all, the following picture produces the following picture (I will not give the images to me, so I am entertained because I am a fan of the game. According to the documentation "The comments which are less than the sample mean are presented by letter" A ", and the comments which are used in the mean of the sample mean:  
  BBBBABBBAAABAA   

Are greater or equal, give them as "B" letter Less addressed is. "

Sample Testpaten Middle 2. Therefore, if the documents were correct, the image should look like this:

  = - = - - + + - - = + + + + + + + + + + + + + + + + + + + + + + + / +>  

Obviously this is very different, so I do not know how to run runs. Used for "sample mean".

Second, the test figure given by the output output

  run run - positive correlated data: testpats standardized statistical statistical = -0.4877, p-value = 0.3129   

Do I calculate by hand using the methods described on

 . Meridian & lt; - Average (testpace) runidime & lt; - ifelse (testpats> = mymid, 1, -1) N1 and LT; - Length (Joe (Rundimi> gt; 0)); Mid Digit 2 and LT; - Number of values ​​above or equal to the average of the length (Joe (dame and lieutenant; 0)) # points below the average SR2 and LT; - (2 * n1 * n2 * (2 * n1 * n2 - n1 - n2)) / ((n1 + n2) ^ 2 * (n1 + n2-1)) # Standard deviation of run number Rbar & lt; - (2 * N1 * N2) / (N1 + N2) + 1 # estimated number of runs R & L; - 9 # Number of numbers scored - How do I automate? Z & LT; - (R.RBR) / SR2 # Arns Test Stats Z   

  [1] 0.2508961   

Alternatively, I can explain using the small sample version of the test. The small sampling method only uses the number of comments and runs mentioned above and below.

N1 = 5; N2 = 6; R = 9

One-way Powell should be 0.976.

Again, it is not even close to the number drawn by the runs. ()

So, what gives? How do I run completely wrongly runs.test ()? I have tried to use the function after changing the data in the up / down indicators (such as 1 / -1), and I still get strange results.

I was stumbling on the same problem, while it was the modeling in Excel and the output of my StatGraphics software I finally found my "solution" in StatGraphics documentation. I noted it in R format (do not use R now, but I think this is correct):

Z & lt; - (R-5 RBR) / SR2

I still do not know why 0.5 is to be subtracted (or added in some cases), but I think it has one-way and two-way There is something with the test. +0.5 or -0.5 then test one-sided (high or low), and without this extra I think that there is two-way.

I do not know if I am right now, but I got it

Editing (from Software documentation): Calculate the probability of seeing at least K runs: Use -0.5k Calculate the probability of viewing less than or equal to runs: Use the +0.5

Editing 2: + or - 0.5 is a consistency improvement. You can observe only 3 or 4 events, nothing in between. If you calculate the chance in 3 (the probability of 3.5 or less) and calculate the chance (more than 3.5 chance) in 4, then the probability of expression will be 1.

No comments:

Post a Comment