In this homework, you will try to recommend new collaborations to researchers
of the Machine Learning community. Our approach will follow the guidelines of
collaborative filtering: \If your past behavior/preferences were similar
to some other user’s, your future behavior may be as well”. As an
example, imagine you like Rolling Stones, Beatles and Jimmy Hendrix. It turns
out that most people that like the aforementioned artists, are also fans of Eric
Clapton. Then, it is very likely that if you listen to Eric Clapton’s music, you
will like it as well.
In this assignment you will implement a collaborative filtering recommen
dation system for suggesting new collaborations to Machine Learning re
A network as a graph: A graph or network represents relationships among
different entities (users of a social network, researchers, products, etc.). Those
entities are represented as nodes and the relationships between them (friends
on Facebook, co-authors of a research paper, products purchased together) as
edges. When there is an edge between two nodes, x and y, we say that y is a
neighbor (or friend) of x (and also – as the graphs we consider are undirected –
x is also a neighbor of y).
Representing a graph in Python: A widely used library in Python, for
representing graphs is NetworkX. You can read the documentation for more
information on how to use this library.
2 Recommend new collaborations – The ML Com munity case
In order to provide new collaborations and test the efficiency of the methods
used, you are given two files (you can find them on piazza):
• “old edges.txt”: In this file, every line contains the names of two re
searchers that have co-authored a paper in one of the top Machine Learn
ing conferences (NeurIPS, ICLR, ICML) between 2010 and 2016.
• “new edges.txt”: In this file, every line contains the names of two re
searchers (from those existing in the above file) that formed a new (non
existing before) collaboration, in either 2017 and 2018.
With the first file in hand, you will answer the following question:
\For author X, list some non-collaborators in order, starting with the best col
laborator recommendation and ending with the worst”. A non-friend is a user
who is not X and is not a collaborator of X. Depending on the recommendation
algorithm you are going to choose, the list may include all non-collaborators or
some of them.
Then, using the second file, with actual new collaborations formed in the
next 3 years, you will test the efficiency of these algorithms.
a) [3 pts.] Write a function that reads the file \old edges.txt” and create a
graph using NetworkX. (This is a tab-separated value (TSV) file, you may
use packages such as Pandas to read it. )
b) [3 pts.] Write a function that reads the file \new edges.txt” and for each
author, keeps track of the new collaborations this user formed during
In 2017 and 2018, there were 1,757 new edges formed between existing au
thors. For the next tasks, pick (and recommend new collaborations for) those
authors that formed at least 10 new connections between 2017-2018. In the
remaining, when we talk about author X, we refer to one of those authors.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx