Topic: Data Mining Assignment
In this assignment, you will perform predictive analytics. You are given an sqlite3 database file (Assignment2022.sqlite) which contains a total of 5500 samples across two tables.
Table “train” contains 5000 samples have already been categorized into three classes.
You are asked to predict the class labels of the 500 samples in the “test” table.
Tasks
In this first task, you will examine all data attributes and identify issues present in the data. For each of the issues that you have identified, choose and perform necessary actions to address it.
Apply these actions to both the training and test data at the same time. At the end of this phase, you will have two data sets: one for training and one for the final testing task. Below is a list of data preparation issues that you need to address
Identify and remove irrelevant attributes.
Detect and handle missing entries.
Detect and handle duplicates (both instances and attributes).
Select suitable data types for attributes.
Perform data transformation (such as scaling/standardization) if needed.
Perform other data preparation operations (This is optional, bonus marks will be awarded for novel ideas).
For each of the above issues your report should:
Describe the relevant issue in your own words and explain why it is important to address it. Your explanation must consider the classification task that you will undertake subsequently.
Demonstrate clearly that such an issue exists in the data with suitable illustration/evidence.
Clearly state and explain your choice of action to address such an issue.
Demonstrate convincingly that your action has addressed the issue satisfactorily.