The key objectives of this assignment are to learn how to train and evaluate a non-trivial
machine learning model. More speciﬁcally, the task is called “learning-to-rank”. You
will be given a set of training features that contain the relevance labels 0 (not relevant),
1 (partially relevant) and 2 (relevant), for a large set of query-document pairs. Your
task is to research this problem, ﬁnd a suitable solution, train a model, and produce a
result ﬁle from a training set that will be scored using standard evaluation measures in
If you are unfamiliar with IR, you might want to look through the book Learning to
Rank for Information Retrieval by Tie-Yan Liu, which is available online in the RMIT
library and also on canvas now.
The following ﬁles are provided:
• A2.pdf : This speciﬁcation ﬁle.
• train.tsv : A large ﬁle of labelled query-document pairs suitable for training.
• test.tsv : The holdout set that you will use to create a runﬁle.
• documents.tsv: A 3 ﬁeld ﬁle containing the document id, original html, and
clean text parse of each document.
• query.tsv: A 2 ﬁeld ﬁle containing the query id and the query text for each query.
The A2.pdf ﬁle is on canvas, the train and test ﬁles are in a zip ﬁle you can
download using the URL below called A2data.zip and the document.tsv and
query.tsv ﬁles can be found in the optional download below called extradata.zip.
The Features Provided
Rules of the game
You are allowed to use any python library you like to solve the problem. The are
a wealth of tools to choose from, including pandas, numpy, and scikit-learn for the
basic processing, and multiple libraries designed speciﬁcally for Learning to Rank. I
will let you ﬁnd these on your own. It should be easy to ﬁnd several that will work,
and you can try several to determine which works the best. We will need you to
ensure your environment is reproducible though, so the correct way to do this is to
create an anaconda environment for a speciﬁc version of python (I strongly suggest it
be 3.8, install any packages you need using pip (not anaconda), and then generate a
requirements.txt ﬁle to include with your submission. So, something like:
conda create -n SXXXXXX python=3.8
conda activate SXXXXXX
pip install pandas numpy scikit-learn
pip freeze > requirements.txt
This will create a new environment you can start in Anaconda using “conda
activate SXXXXX” and exit from using “conda deactivate”.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx