Discuss whether bivariate or multivariate analysis is most suitable for your predictive model.

Machine learning application: predictions and interpretations

The aim of this coursework is for you to apply your knowledge in Machine Learning and Predictive Analytics, to work creatively on a dataset of a real-world application; to define a learning problem, discuss data attributes, evaluate suitable learning algorithm(s) analytically or through your implementation, and to present your findings and conclusions. This will be expressed as a 2000-word report

Scenario

This is your chance to design and/or evaluate a ‘predictive model’ of your own/choice for a real-world application. Application and data can be of your choice but also a wide range of recommended datasets for machine learning problems are available in UCI Machine Learning Repository 1 (Most Popular Data Sets – hits since 2007), and challenges, datasets and analytics contributions Kaggle 2, or check course’s Blackboard page for further datasets.

For this coursework your design/choice, and your approach to evaluate a machine learning solution (or a predictive model) is key – you can (but do not need to) implement a model, write code or collect data yourself.

You should identify a real problem, need, frame a solution and come up with analytical analysis to evaluate your choice of a learning algorithm for your predictive model.

Your report (in a form of a discussion paper) should cover the following elements:

Discuss a machine learning problem given your chosen application; identify the problem, the requirements for a predictive model and its impact.

Describe and analysis a dataset and its characteristics; size, representation and attributes.

Discuss whether bivariate or multivariate analysis is most suitable for your predictive model.

Choose/apply (a) learning algorithm(s) and identify its/their categories; supervised, unsupervised, semi-supervised.

Analytically or experimentally evaluate your choice of machine learning solution; its suitability, cost, and apply an error evaluation metric to justify your choice, e.g., classification accuracy of classification problems, MSE and/or R^2 (R squared) for regression models, etc.

Choose a learning algorithm which you think is less suitable for your predictive model and justify your “rejection” reasons.

Datasets can be found here: https://www.kaggle.com/datasets