首页 » Python代写 » Python代写 | User-centric Systems for Data Science Assignment 1

Python代写 | User-centric Systems for Data Science Assignment 1

本次美国代写是一个使用Python程序实现sql关键字的assignment

User-centric Systems for Data Science Assignment 1

1. Data schema

The data we will use for this assignment consist of two CSV files: Friends and Ratings. The former
contains friendship relationships as tuples of the form UserID1 UserID2, denoting two users who are
also friends. A user can have one or more friends and the friendship relationship is symmetric: if A is a
friend of B, then B is also a friend of A and both tuples (A B and B A) are present in the file. Ratings
contains user ratings as tuples of the form UserID MovieID Rating. For example, tuple 12 3 4
means that “the user with ID 12 gave 4 stars to the movie with ID 3”.

Hint #1: You can use Python’s CSV reader to parse input files.

Hint #2: Consider encapsulating your Python tuples in ATuple objects (see code skeleton).

2. TASK I: Implement ‘likeness’ prediction query (credits: 50/100)

The first task is to implement a query that predicts how much a user will like a particular movie. The
prediction in this case is based on friends’ ratings: the ‘likeness’ of a movie M for a user A is equal to the
average rating for M as given by A’s friends. In SQL, this query can be expressed as follows:

SELECT AVG(R.Rating)
FROM Friends as F, Ratings as R
WHERE F.UID2 = R.UID
AND F.UID1 = ‘A’ AND R.MID = ‘M’

Friends and Ratings are both relations (tables): Friends stores friendship relationships whereas
Ratings stores user ratings (cf. Section 1 “Data Schema”).

The above query requires implementing five operators:

1. Scan: An operator that scans an input file (table) and returns its contents (tuples).
2. Project: An operator that implements a SQL projection.
3. Filter: An operator that filters individual tuples based on a user-defined predicate. This is
equivalent to a selection predicate in SQL.
4. Join: An binary operator that applies a relational equality join.
5. AVG: An aggregation function that returns the average of the input values (e.g. ratings).

Hint #1: You might find useful drawing your query plan in the form of a tree, as discussed in Lecture 1.
A node in the plan is an operator that corresponds to a Python class in the code skeleton we provide.

Hint #2: We highly recommend that you employ some of the standard optimization techniques we
discussed in Lecture 1, e.g., pushing the selection (filter) down to the leaves of the query plan. Your
solution will be graded solely based on correctness, however, we should be able to run tests with your
code in a reasonable amount of time.

3. Task II: Implement recommendation query (credits: 20/100)

The second task is to implement a query that recommends a movie to a user: the movie with the higher
‘likeness’ value as computed by the query of TASK I. In SQL, this query can be expressed as follows:

SELECT R.MID
FROM ( SELECT R.MID, AVG(R.Rating) as score
FROM Friends as F, Ratings as R
WHERE F.UID2 = R.UID
AND F.UID1 = ‘A’
GROUPBY R.MID
ORDERBY score DESC
LIMIT 1 )

The recommendation query requires implementing three additional operators:

1. GroupBy: An operator that groups tuples based on a ‘key’ attribute (e.g. movie id).

2. OrderBy: An operator that sorts tuples based on a value (e.g. sort candidate movies in descending
order of their average rating as given by A’s friends).

3. Limit: An operator that returns the first k tuples in its input (k=1 in the query above). An OrderBy
followed by a limit is also known as a top-k operator.
Hint: You will need a custom comparator to sort tuples. Have a look at cmp_to_key() in
functools.


程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB


本网站支持 Alipay WeChatPay PayPal等支付方式

E-mail: vipdue@outlook.com  微信号:vipnxx


如果您使用手机请先保存二维码,微信识别。如果用电脑,直接掏出手机果断扫描。