One of the main objectives of this module is to help you gain hands-on experience in
communicating insightful and impactful findings to stakeholders. In this coursework, you will
use the tools and techniques you learned throughout this module to train few machine learning
models on a dataset that you feel passionate about, select the techniques that best suits your
needs, and communicate insights you found from your modeling exercise.
After going through some guided steps, you will have insights that either explain or predict your
outcome variable. As a main deliverable, you will submit a report that helps you focus on
highlighting your analytical skills and thought process.
You are expected to leverage a wide variety of tools such as Jupyter notebook, Python and the
relevant machine learning libraries (Keras, Tensorflow, Pytorch, etc.), but your report should
focus on present findings, insights and next steps. Before you begin, you will need to choose a
data set that you feel passionate about. This can be a data set similar to the data you have
available at work or data you have always wanted to analyse. For some people this will be
sports data sets, while some other folks prefer to focus on data from a datathon or data for
good. Data for Good, inspired by DataKind.org, brings together leading data scientists with high impact
social organizations through a comprehensive, collaborative approach that leads to shared insights,
greater understanding, and positive action through “data in the service of humanity”. Below are the
links to 5 data sets:
1. Fortune 500. URL: https://data.world/aurielle/fortune-500-2017
2. AT&T stock price data. URL: https://www.kaggle.com/konstantinparfenov/att-sbc-stock-price
3. COVID-19 variants. URL: https://www.kaggle.com/gpreda/covid19-variants
4. Leukemia gene expression. URL: https://www.kaggle.com/brunogrisci/leukemia-gene
5. Stock exchange data. URL: https://www.kaggle.com/mattiuzc/stock-exchange-data
Once you have selected a data set, you will produce the deliverables listed below:
A. Main objective of the analysis that specifies whether your model will be focused on
prediction or interpretation.
B. Brief description of the data set you chose and a summary of its attributes.
C. Brief summary of data exploration and actions taken for data cleaning and feature
D. Summary of training two machine learning models. For regression, the model will be
multiple linear regression, polynomial regression, LASSO regression and ridge
regression. For classification, the model will be multilayer perceptron, convolution
neural network and variants of multilayer perceptron and convolution neural network
such as ResMLP and GoogLeNet. For clustering, the model will be K-nearest neighbour,
K-means, hierarchical clustering and DBScan.
E. A paragraph explaining which of your models you recommend as a final model that best
fits your needs in terms of accuracy and explainability.
F. Summary Key Findings and Insights, which walks your reader through the main drivers
of your model and insights from your data derived from your models.
G. Suggestions for next steps in analysing this data, which may include suggesting revisiting
this model adding specific data features to achieve a better explanation, a better
Please submit a PDF file containing the deliverables A to G. You should include the visuals from
your code output, but this report is intended as a summary of your findings, not as a code
You can submit your code as a python notebook (.ipynb file) or as a print out in the appendix of
your main PDF report.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: email@example.com 微信号:vipnxx