本次代写是一个机器学习相关的assignment
Data
The objective of this application is to predict behaviour to avoid customer churning. Identifying the
determinants of churning and predicting churn will allow the company to develop more focused
customer retention programs. Each row represents a customer, each column contains customer’s
attributes. The data set includes information about:
• Customers who left within the last month – the column is called Churn
• Services that each customer has signed up for – phone, multiple lines, internet, online security,
online backup, device protection, tech support, and streaming TV and movies
• Customer account information – how long they’ve been a customer, contract, payment method,
paperless billing, monthly charges, and total charges
• Demographic info about customers – gender, age range, and if they have partners and
dependents
More specifically, the dataset includes:
Two numerical columns:
1. MonthlyCharges: The amount charged to the customer monthly
2. TotalCharges: The total amount charged to the customer
Eighteen categorical columns:
1. CustomerID: Customer ID unique for each customer
2. gender: Whether the customer is a male or a female
3. SeniorCitizen: Whether the customer is a senior citizen or not (1, 0)
4. Partner: Whether the customer has a partner or not (Yes, No)
5. Dependents: Whether the customer has dependents or not (Yes, No)
6. Tenure: Number of months the customer has stayed with the company
7. PhoneService: Whether the customer has a phone service or not (Yes, No)
8. MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
9. InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
10. OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
11. OnlineBackup: Whether the customer has an online backup or not (Yes, No, No internet service)
12. DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet
service)
13. TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
14. StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
15. StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet
service)
16. Contract: The contract term of the customer (Month-to-month, One year, Two years)
17. PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
18. PaymentMethod: The customer’s payment method (Electronic check, Mailed check, Bank
transfer (automatic), Credit card (automatic))
Data understanding
Using the full dataset,
1. Upload the data in R Studio and familiarize with the variables and their meaning. Check the
variables type (e.g., factor, numeric, etc.) and adapt if necessary.
CustomerID should not be part of the analysis.
Tenure should be categorical, cf. the description above. One can discretize it in a few categories
to avoid too many levels. Alternatively, one can treat it as numeric.
2. Visualize the missing values. Consider deleting them from the dataset.
3. Change the “No internet service” to “No” for the following columns: “OnlineSecurity”,
“OnlineBackup”, “DeviceProtection”, “TechSupport”, “StreamingTV”, “StreamingMovies”.
library(plyr)
cols <- c(10:15)
for(i in 1:ncol(data[,cols])) {
data[,cols][,i] <- as.factor(mapvalues
(data[,cols][,i],
from =c(“No internet service”),
to=c(“No”)))
}
程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: vipdue@outlook.com 微信号:vipnxx
如果您使用手机请先保存二维码,微信识别。如果用电脑,直接掏出手机果断扫描。
