Aryan PrajapatKnowledge Contributor
List and define the various approaches to estimating model accuracy in R.
List and define the various approaches to estimating model accuracy in R.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Questions | Answers | Discussions | Knowledge sharing | Communities & more.
Below are several approaches and how to implement them in the caret package of R.
Data splitting—the entire dataset is split into a training dataset and a test dataset. The first one is used to fit the model, the second one is used to test its performance on unseen data. This approach works particularly well on big data. To implement data splitting in R, we need to use the createDataPartition() function and set the p parameter to the necessary proportion of data that goes to training.
Bootstrap resampling—extracting random samples of data from the dataset and estimating the model on them. Such resampling iterations are run many times and with replacement. To implement bootstrap resampling in R, we need to set the method parameter of the trainControl() function to “boot” when defining the training control of the model.
Cross-validation methods
k-fold cross-validation —the dataset is split into k-subsets. The model is trained on k-1 subsets and tested on the remaining one. The same process is repeated for all subsets, and then the final model accuracy is estimated.
Repeated k-fold cross-validation —the principle is the same as for the k-fold cross-validation, only that the dataset is split into k-subsets more than one time. For each repetition, the model accuracy is estimated, and then the final model accuracy is calculated as the average of the model accuracy values for all repetitions.
Leave-one-out cross-validation (LOOCV) —one data observation is put aside and the model is trained on all the other data observations. The same process is repeated for all data observations.