Create a data science-related notebook, a standalone application, web application or any other kind of
artefact with which you apply machine learning and data mining/analysis techniques to a chosen real-world
Some ideas for possible projects:
1. Current events.
2. Time-series analysis, time-series forecasting.
3. Recommender engine: create application for making recommendations based on user preferences.
4. Fitness data: analysis of your personal or some group’s FitBit data.
5. Twitter: sentiment analysis, text classification, semantic analysis, network visualisation, geospatial
visualisation, data storage etc.
6. Facebook: network visualisation, geospatial visualisation, network analysis, natural language
processing, data storage etc.
7. Data journalism: data visualisation – implementation of interactive graphs (web enabled),
8. A live Kaggle competition problem dataset https://www.kaggle.com/competitions (see notes below)
9. Web app that performs some data-related service.
10. Process mining.
11. …or something entirely different.
Topics NOT to cover:
1. Currency markets, BitCoin, share market stock prices
2. Closed Kaggle competition datasets
3. Previously researched topics for which there are existing notebooks
4. Definitely NO to the TITANIC dataset
5. COVID (this topic is oversubscribed)
This is a recommendation, not a requirement: be as original as you can with your data sources. Some
datasets are very popular and have come up repeatedly in assignments over the years. Unfortunately,
because they are popular there are a lot of online sources that have scripts published for those datasets. In
many cases, related assignment submissions involve some form of plagiarism. While the internet is a big
place, we have seen a lot of these scripts before and it is easy to catch. Unless you are going to do something
genuinely novel with a well-used data source (you will know it is well-used if you can easily find python
kernels for it), avoid these data sources. The safest bet is a dataset that is integrated from multiple disparate
WARNING ABOUT CHOOSING A KAGGLE DATASET
Discuss this with the lecturer first. A high standard is set when marking Kaggle-related submissions. If you
use a Kaggle dataset, we recommend you do not look at related Kaggle kernels as there can be a temptation
to copy what you see. Copying without attribution is plagiarism which could lead to zero marks for this
assignment. Be aware that markers are familiar with Kaggle kernels, in part due to marking assignments for
other papers and cohorts. We will also be looking through related kernels prior to marking.
You are encouraged to use Python; however, this is not an absolute pre-requisite for all parts of your project.
If you choose to build a GUI based application, Python does possess libraries that facilitate this; however, you
can use Qt or technologies like .NET which allows you to call your Python methods that implement the logic in
In previous years, some students have created web-based applications which have front-end and back-end
components that both serve webpages and perform some data science related tasks. If you have web
development skills, then you are encouraged to pursue this. It is sufficient that your application run on
We will conduct presentations both live and over MS Teams. Each person in the group will need to present.
The presentations will be short and to the point. We would like you to aim for a presentation using only a
handful of power point slides, lasting up to 5 minutes, or an application demo lasting up to 10 minutes. Make
your presentation interesting. Don’t focus on technical details. Consider your audience to be tech-savvy
executives. Focus instead on the story that you are trying to tell and sell to the audience/decision makers. The
presentations will be marked in part by your peers.
Make sure you do these four things:
1. Submit all your code, experimental code in a mixture of .py and Notebook files as is appropriate for
each project. Each project should submit at least one Notebook that contains all the key findings and
2. Submit a separate document (or include this at the top of a notebook) that details what each team
member contributed to the assignment. Not all contributors will be awarded the same mark. Each
team member must submit their own version of how each team member contributed.
3. Each member of the class will be marked individually
4. Watch and mark others’ presentations
Marks will be awarded for different components of the project using the following rubric:
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx