1 Summary of the coursework
In this coursework, You will be analysing a human-centric 1 dataset and develop a fair machine learning ecosystem to detect, reduce and eventually mitigate different types of bias that exist in the ﬁnal outcome of the algorithm in various ways.
This coursework involves two parts: (1) a 3–page project report (2) code implementation, discussed in the following two sections accordingly. The deadline for submission is on May 5th, 2022. You will submit all the deliverables in a single compressed ﬁle (preferably .zip format). If there are any queries regarding this coursework, please do not hesitate to contact me: email@example.com
2 Project Outline
In this project, you are responsible to design a automated system to help the HR department of an ACME organisation. This system shortlists applicants for interview and proposes a decision on whether or not an speciﬁc candidate should be given an offer. Typically, this system will be a classiﬁer that is trained based on the historic data available through previous experiences of the hiring committees in the organisation. As the ML engineer, you are given the dataset, now you are responsible to design a non-biased system (unlike what Amazon designed in 2016 which led to a publicity scandal and eventually, Amazon decided the decease the system 2).
Please note that the implementation and outcome of each task should be clearly separated in your submissi- on (in both project report and source code). You will submit your implementation in a Jupyter notebook (ipynb format) so clearly segment your notebook and make sure it is in a presentable format. Also, the code should be sufﬁciently commented and obviously, should be your own implementation. Note: the implementation of the method should be yours. You cannot use any “fair AI” python package in your implementation, i.e. IBM AIF360 or similar packages introduced in the class, you can compare your results with such systems ofﬂine to make sure your implementation works correctly though.
You should submit the answers to the questions proposed here in a separate PDF ﬁle in your submission. The style of the analysis should be technical, rather than verbose. This should be understandable by someone with a good knowledge of the bias mitigation techniques. Be concise and straight to the point. Make sure your answer to these questions do not exceed 3 A4 pages, including the citations.
2.1 Task 1: Dataset Analysis [40 Marks]
Download the dataset from the Dataset folder on blackboard. The description of the dataset can be also found in the same folder. Read the relevant documentations and answers the following tasks in your project report document (in the document, clearly specify the answers for each task). Include the implementation tasks in your Jupyter notebook. The data set that we use is recruitment.xls 3. The applicant data set includes the following information within nine variables:
- ApplicantCode (applicant code).
- Gender (1 = male or 2 = female).
- BAMEyn (Black, Asian or Minority Ethnic: 1 = yes or 2 = no)
- ShortlistedNY (0 = rejected or 1 = shortlisted).
- Interviewed (0 = not interviewed or 1 = interviewed).
- FemaleONpanel(1=male only panel or 2=female member on panel).
- OfferNY (1= made an offer or 0 = not offered).
- AcceptNY (1 = Accepted or 0 = declined).
- JoinYN (1 = joined or 0 = not joined).
As the variable and value labels indicate, the data set indicates the gender (‘Gender’ – variable 2) of each person that sent in an application for the graduate job as well as whether or not they were Black, Asian or Minority Ethnic (‘BAMEyn’ – variable 3). Importantly the ‘ShortlistedNY’ variable indicates whether, after an initial review of their application, they were considered to be an appropriate candidate for interview (in other words, considered potentially employable). The ‘Interviewed’ variable indicates whether they were interviewed or not, the ‘FemaleONPanel’ variable indicates whether there was a female interviewer included on the interview panel. Then a key variable here is whether the applicant was offered a job or not (‘OfferNY’) and the ‘AcceptNY’ variable indicates whether they accepted the offer. Finally the ‘JoinYN’ variable indicates whether the applicant joined the organization.
- What is the sensitive attribute in the dataset? Clarify what group is privileged and what group is unprivileged? How did you determine it? Describe your answers brieﬂy. (5 Marks)
- What are the statistics relevant to the privileged and unprivileged groups? (e.g. average, standard deviation, variance, and more). Clearly demonstrate the above numbers in a table in your project report. (5 Marks)
- Demonstrate the statistical disparity of privileged and unprivileged groups in two scenarios: (1) being invited to the interview (2) being offered a job. Show the disparity analysis results in tables for each of a privileged and unprivileged combinations (i.e. the privileged vs. only one of the unprivileged groups in each table) (10 Marks)
- Statistically prove the dataset is biased towards the privileged group in shortlistingprocess(i.e. determine the hypothesis for the scenario, calculate the p–value and conclude if you accept the hypothesis or not). Explain the procedure in your project document. (10 Marks)
- Implement the above proof in your source code (i.e. compute the p-value and determine the hypothesis conclusion accordingly).Note:It is ﬁne to use an extra statistical package in your source code, but clearly explain its usage in your project report. (10 Marks)
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx