This assignment assesses your skills for data exploration and classification of a simple dataset using Matlab.

computer science


This assignment assesses your skills for data exploration and classification of a simple dataset using Matlab.

You will choose a dataset, describe its characteristics with the help of diagrams, and build a classification model.

 You will need to submit the written part (report from tasks 2 and 4) and the practical part (models from Task 3 as .mat files)

Task 1

 Choose one dataset from the UCI Machine Learning repository (link) and check its suitability via email with your TA. After successful confirmation by your TA, familiarise yourself with it. [5 marks]

Task 2

 Describe your chosen dataset (400-500 words). [25 marks] What is the data about? How many features (attributes), instances, and classes does it have and what data types are these? What are the maximum, minimum, and average values of the continuous numerical features? Using Matlab’s plotting functions, illustrate the features of your dataset using meaningful boxplots, histograms and grouped scatter plots (remember, these plots allow you to analyse the individual distribution of features, as well as the relationship between them). Explain what you can learn about the dataset from the diagrams.

 Task 3

Using Matlab, build three different classification models for your dataset and evaluate them. [40 marks] For this task, you will build a Decision Tree, a Naïve Bayes model, and a k-Nearest-Neighbour model using the relevant Matlab functions. Use a 60-40 percent split to train and test the performance of the models. Also save your them as .mat files and submit them through Blackboard. Use the diary function to save all Matlab commands that you used for this task and attach this to the end of your report.

Task 4

Describe and analyse your classification results (300-400 words). [25 marks] Which models performed better, which ones performed worse, and explain why? In order to answer these questions, you need to evaluate the performance of your models. You are required to compute the corresponding confusion matrices and their associated recall, precision and accuracy metrics (refer to the lecture slides and see here for more info). There will be 5 marks for the presentation of the assignment including spelling and grammar, layout and formatting, and readability of figures. Good luck!



Write the mistakes you have encountered in doing this assignment ( to show the student has done it instead of him getting helped by experts)


No use of citations or referencing, the work must be completely done by the student

Related Questions in computer science category