Saturday 15 September 2012

variable selection in r using knn -


I have 592 variables (593 variables total) with a data frame of 72 objects and a factor class variable (DF) = 72 593) I am looking for a way to select optimum K value by using receiver operational characteristics (ROC) to select 7 variables (including square variables). I want to use these seven variables for graphical model analysis, but I do not want to select the variable at random. I want my selection to be statistically appropriate.

I have to see that my result is something:

variable V23, V120, V230, V333, V496, V585, V593 selected on the basis of ROC's highest value I went.

I want to classify and select the "best" predicted variable of high accuracy so that I can use these variables for graphical modeling.

I have tried using the carat package but I do not know how to manipulate it to choose the high accuracy variable (column) which can be used for other analysis.

Thanks Guys I'm sure someone understood me.

Thank you.

kutex.

I will do something like this:

  Library (PRC) # ' Select N Top Variable with ROC Analysis '# @ @ RAM Reaction Class variable name #' @ Ultimate predictions variables by names containing # '@PRAM data to select, among them the columnist # 1 as column #' to Param N Featured .top.N.ROC & lt; - The function should be in the number of (responses, predictions, data, n). {N & lt; - min (n, long (predictions)) aucs & lt; - [sermon (predictors, function) [auc (data [[reaction]], data [[predictor]])) return (predictors [order (aucs, declining = true)] [1: n])} top .variables & lt; - select.top.N.ROC ("V", Paste ("V", Paste ("V", 1: 593, sep = ""), myDataFrame, 7) Cat (Paste ("Variables", Paste (Top. Virals, Fall = " , ")," Was chosen on the basis of the highest value of ROC. ")))   

With any type of sterilization feature selection method, you have 7 fully correlated variables You can choose which one will not give you the additional information, so V23 will be enough to choose. For multivariate datasets, you should consider using a multivirate feature selection method instead.

No comments:

Post a Comment