The starting repository contains the following ﬁles:
main.py: This ﬁle contains code to verify your answers. You must not edit anything
assignment1.py: This ﬁle contains one function for each task in the assignment. You
should ﬁll in the relevant function to complete the task. You may choose to create
additional functions to segment your code, but all the code you write must be
contained in this ﬁle.
data/data.json: This ﬁle contains details of recent soccer matches in the English
Premier League, which you will need in order to complete your assignment.
data/football: This folder contains a number of news articles about soccer matches
in the English Premier League. You will need to load these ﬁles in order to complete
Your Tasks (Total 20 marks)
Task 1 Loading and interpreting a JSON ﬁle (1 mark)
Write a function task1() that loads the data/data.json ﬁle into Python. Your function
should return a list of teams codes, sorted in alphabetical order by team code.
You can test your implementation with the following command: python main.py task1
Task 2 Data Aggregation (2 marks)
Write a function task2() that uses the information contained in the clubs objects to work out
how many goals were scored by and against each team in total throughout the season. Your
function should output this information to a csv ﬁle called task2.csv. Your csv ﬁle should
contain the following headings: team code, goals scored by team, goals scored against team.
Each row in the ﬁle should contain the details for one team, sorted in alphabetical order by
You can test your implementation with the following command: python main.py task2
Task 3 Regular Expressions (2 marks)
In addition to the information contained in the data.json ﬁle, we also have a number of
news articles written about soccer matches. Each article is located in a separate text ﬁle in
the data/football folder. For this task we will assume that each article is written about a
match. Write a function task3() to extract the largest match score identiﬁed in the article.
Add the number of goals scored by each side together to produce the total number of goals
scored in the match.
For example, if the largest match score mentioned in an article is 14-6, your program
should calculate 20 as the total number of goals. For this task we deﬁne the largest match
score as the one with the highest total number of goals, so a score of 14-6 is considered larger
than a score of 16-2.
If a suitable score cannot be found in the article, your function should return 0 as the
total number of goals for that article. You will need to use regular expressions to accomplish
Your function should produce a csv ﬁle containing the ﬁlename and the total number of
goals for each article. Your csv ﬁle should contain two columns, filename and total goals.
Each row in the ﬁle should contain the detail for one article, sorted in ascending alphabetic
order by ﬁlename. Save this ﬁle as task3.csv
You can test your implementation with the following command: python main.py task3
Task 4 Visualising Scores (1 mark)
We now wish to understand whether there are outliers present in the number of goals we
calculated in Task 3. Write a function task4() that produces a boxplot showing the distri-
bution of values for total goals. Any values more than 1.5 interquartile ranges above Q3
should be identiﬁed as outliers on the plot. This boxplot should be saved as task4.png
For all tasks involving visualisations, you should ensure that your plots contain a title and
labels for all relevant axes.
You can test your implementation with the following command: python main.py task4
Task 5 Extracting information from text data (2 marks)
We now wish to understand how often each club is mentioned by the media. The data.json
ﬁle also contained a list of club names. Write a function task5() that searches through each
of the news articles for mentions of each club and counts the articles for which each club is
mentioned at least once. Your function should produce a csv ﬁle containing the club name
and number of mentions for each club. Your csv ﬁle should contain the following column
headings: club name and number of mentions. Save this ﬁle as task5.csv. Each row in
the ﬁle should contain the details for one team, sorted in ascending alphabetic order by club
name. Your function should also produce a bar chart conveying this information, saved as
You can test your implementation with the following command: python main.py task5
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: email@example.com 微信号:vipnxx