Friday 15 May 2015

How to use random forests in R for classification to decide if the value of a column is less or greater than a value N? -


I have used random forests in the R for classification, where the obvious value in the respective column (for example 0 or 1). For example, for the Iris database, we can use random forests to classify the data on the basis of species:

  myRF & lt; - randomForest (species ~., Data = iris, importance = true, proximity = true)   

This makes sense because species can only take certain values. The question is whether species can take value from 1 to 100 and I want to classify the data into two categories: those whose value is more than 50 and whose value is less than 50?

Of course, I can add another column whose value is dependent on 1 or 0 species, and then I classify it on the last column rather than the species, but to tell RC directly The way we want to categorize our data is 2 categories: A category where one species is less than 50 and the other is greater than 50? (Assuming new fantasy values ​​for species)?

Thanks

  MyRf ~ randomForest (species & amp; 50; ., ...)   

which

  1. is actually not different to defining a new variable, it also includes Species is less than 50, but avoids modifying your datasets;

  2. Only sensible if species instead of a persistent clear variable (i.e., to understand the number of species in this way is understood) .

    In a more general case where you can guess that one factor will appear in one of the subsets of values, you

      use randomForest Can (% level ("level1", "level2", ...) ~ .....) in    

No comments:

Post a Comment