Provide a brief description and examples of each of the following methods of clustering:
Partitioning methods.
Hierarchical methods.
Density-based methods.
Grid-based methods.
Load the soybean diagnosis data set in Weka (found in Weka-3.6/data/soybean.arff), then perform the following:
Build a decision tree by selecting J48 as the classifier and 10-way cross-validation. Then fill out the following table:
Correctly Classified Instances | |
Incorrectly Classified Instances | |
Kappa statistic | |
Mean absolute error | |
Root mean squared error | |
Relative absolute error | |
Root relative squared error | |
Total Number of Instances |
Build a Naïve Bayes classifier and select 10-way cross-validation. Then fill out the following table:
Correctly Classified Instances | |
Incorrectly Classified Instances | |
Kappa statistic | |
Mean absolute error | |
Root mean squared error | |
Relative absolute error | |
Root relative squared error | |
Total Number of Instances |
Compare between results in previous two sections (a and b), which algorithm give the better result and why?
Construction and evaluation of a classifier’s accuracy on a dataset require partitioning labeled data into a training set and a test set. Explain three main methods used for such partitioning.
Explain why cross-validation is used in both supervised learning (classification) and unsupervised learning (clustering)?