# R代写｜Practical Assessment 1 Data Wrangling

## 这是一篇关于将从web中找到打开的数据，将其导入到R中，反映数据集中的数据类型、格式和结构，并使用R函数检查数据的R代写，需要将使用R标记模板创建一个报告。

Overview

This assessment allows you to apply the data preprocessing knowledge and skills learned in Modules 1-3. You will locate open data from the web, import it into R, reflect upon the data types, formats, and structures in your data set, and inspect the data using R functions. You will create a report using the R Markdown template to explain the steps taken by you in order to perform the mentioned data-related tasks.

Assessment criteria and weighting

Please see the marking rubric to know the assessment criteria and weightage.

Course Learning outcomes

This assessment is linked to the following course learning outcomes:

• Select, perform, and justify data validation processes for raw datasets to satisfy quality requirements.
• Apply and evaluate the best practice standards of Tidy Data Principles.

Assessment Instructions

This assessment requires you to locate open data from the web, import it into R, and reflect upon the data types, formats, and structures in your data set. You are expected to submit a report on your data exploration. Use the given R Markdown template to create the report.

Step 1. Locate an open source of data from the web. This can be a tabular, spreadsheet data (i.e., .txt, .csv, .xls, .xlsx files), data sets from other statistical software (i.e., SPSS, SAS, Stata etc. data files), or you can scrape HTML table data.

As a minimum, the data set should include:

• One numeric variable.
• One qualitative (categorical) variable.

There is no limit on the number of observations and number of variables. But keep in mind that when you have a very large data set, it will increase your reading time.

Step 2. Read/Import the data into R, then save it as a data frame. You can use Base R functions or readr, xlsx, readxl, foreign, rvest packages for this purpose. In this step, you must provide the R codes with outputs (i.e., head of data set) and explain everything that you do to import/read/scrape the data set.

Step 3. Provide a clear description of the data and its source (i.e., URL of the website). Provide variable descriptions.

Step 4. Inspect the dataset and variables using R functions. You should:

• Check the dimensions of the data frame.
• Check the column names in the data frame, rename them if required.
• Summarise the types of variables by checking the data types (i.e., character, numeric, integer, factor, and logical) of the variables in the data set. If variables are not in the correct data type, apply proper type conversions.
• Check the levels of factor variables, rename/rearrange them if required.
• Provide the R codes with outputs and explain everything that you do in this step.

Step 5. Subset the data frame using the first 10 observations (include all variables). Then convert it to a matrix. Check the structure of that matrix (i.e., check whether the matrix is a character, numeric, integer, factor, or logical) and explain in a few words why you ended up with that structure. Provide the R codes with outputs and explain everything that you do in this step.

Step 6. You will create a new data frame from scratch. Note that, this step is independent of the dataset that you used in the previous steps. In this step you should:

• Create a data frame from scratch with 2 variables and 10 observations. Your data frame must contain one integer variable and one ordinal variable. Make sure that you factorized and ordered the ordinal variable properly. Show the structure of your variables and the levels of the ordinal variable.
• Create another numeric vector and use cbind() to add this vector to your data frame.

After this step, you should have 3 variables in the data frame.

• Provide the R codes with outputs and explain everything that you do in this step.

Important Note:

• You must provide the R codes with outputs and explain everything that you do in each step. Failure to do this would result in a reduction in the mark. Check the report sections below and the marking rubric for more information.

E-mail: vipdue@outlook.com  微信号:vipnxx