Question 1: NZ Health survey data – Wrangling, reshaping, functions and plotting (35 marks)
This question relates to the “health_survey.csv” dataset, that you can download from the Stream site. This dataset is
from the NZ Health survey. You can learn more about this dataset and what the column names and labels represent
a) Importing data:
• Read in the health data and save to a dataframe object. There is an encoding argument that can be set
when using read_csv. You may need to set this to ‘latin’ to avoid the ‘utf-8′ codec can’t decode…’ error (see
the Pandas documentation for further information).
• Remove the first unnamed column and the seven ‘p.value’ columns.
• Change all ‘percent’ column name to their associated ‘Year’ values (e.g. the name for ‘percent.16’ changes
to ‘2016’) and change the column name of ‘short.description’ to ‘description’.
• Display the unique labels in the ‘description’ column so that you can inspect them.
• Save a new dataframe object (with an appropriate name) into a new memory location, that contains all the
rows that meet all of the following criteria:
• That match six of the ‘description’ labels of your choosing. For instance, if you were interested in
knowing about ‘Physically active’, ‘Anxiety disorder’, ‘Daily smokers’, ‘Diabetes’, ‘Healthy weight’, and
‘Self-rated health – very good’, then your dataframe would contain only rows that matched these
• That also match the ‘Total’ label in the Group column
• That also match the ‘adult’ label in the population column
Note: this is not an ‘either/or’ filter. All rows in your dataframe must meet all of the above conditions. Only a
maximum of half marks will be awarded if you choose the exact same ‘description’ labels as given in the
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: email@example.com 微信号:vipnxx