Aryan PrajapatKnowledge Contributor
How to select features for machine learning in R?
How to select features for machine learning in R?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Questions | Answers | Discussions | Knowledge sharing | Communities & more.
Let’s consider three different approaches and how to implement them in the caret package.
By detecting and removing highly correlated features from the dataset.
We need to create a correlation matrix of all the features and then identify the highly correlated ones, usually those with a correlation coefficient greater than 0.75:
corr_matrix <- cor(features)
highly_correlated <- findCorrelation(corr_matrix, cutoff=0.75)
print(highly_correlated)
By ranking the data frame features by their importance.
We need to create a training scheme to control the parameters for train, use it to build a selected model, and then estimate the variable importance for that model:
control <- trainControl(method="repeatedcv", number=10, repeats=5)
model <- train(response_variable~., data=df, method="lvq", preProcess="scale", trControl=control)
importance <- varImp(model)
print(importance)
By automatically selecting the optimal features.
One of the most popular methods provided by caret for automatically selecting the optimal features is a backward selection algorithm called Recursive Feature Elimination (RFE).
We need to compute the control using a selected resampling method and a predefined list of functions, apply the RFE algorithm passing to it the features, the target variable, the number of features to retain, and the control, and then extract the selected predictors:
control <- rfeControl(functions=caretFuncs, method="cv", number=10)
results <- rfe(features, target_variable, sizes=c(1:8), rfeControl=control)
print(predictors(results))