这个作业是用Python完成时间序列数据练习和动量和均值回归

FE520 Assignment 5

Spring 2020

Submission Requirement:

For all the problems in this assignment you need to design and use Python 3, output

and present the results in nicely format.

Please submit a written report (pdf), where you detail your results and copy your

code into an Appendix. You are required to submit a single python file and a brief

report. Your grade will be evaluated by combination of report and code.

You are strongly encouraged to write comment for your code, because it is a convention to have your code documented all the time.

Python script must be a ‘.py’ script, Jupyter notebook ‘.ipynb is not allowed.

Do NOT copy and paste from others, all homework will be firstly checked by plagiarism detection tool.

Note: This assignment contain 4 questions, 130 points + 20 points Bonus = 150

Points in total. We will set the assignment total points = 120 points, thus, you will get

extra 10 bonus points automatically.

1 Time Series Data Practice(30 pts)

Recall what we mentioned in the class, we have two types of data splitting for training

and testing data: out of sample and out of time. It is proper to use Out of Time splitting

method for time series dataset. Writing a function to spilt ”Energy” dataset into training

and testing data.

Parameter input:

StartYear: int (default value = 2012), EndYear: int (default value = None).

Ontput:

Train, Test (Data type: Array(Numpy) )

If EndYear is None, we will only choose all data with ”Data Date” == StartYear

as Test data, all other data as Train data. By default, all company Data Date within

2012 will be selected as Testing data. If EndYear is NOT None, we will choose all data

with ”Data Date” == StartYear to EndYear as Test data, all other data as Train data,

For example, StartYear = 2010, EndYear = 2013, all data in 2010, 2011, 2012, 2013

will be selected as Testing data .

All return should be array from column ”Accumulated Other Comprehensive

Income (Loss)” to column ”Selling, General and Administrative Expenses”.

1

2 Momentum and Mean Reversion(30 pts)

Momentum and mean reversion are common trading strategy. For the purpose of this

homework, a stock exhibits momentum is defined as an asset whose price returns are

more likely to go up(down) on day t if the return went up(down) on day t-1. In other

words, the stock exhibits a positive auto-correlation. Mean reversion is the opposite of

momentum. Stocks are more likely to go up(down) on day t if that stock went down(up)

on day t-1.

You are provided below with a simulated dataset of series of stock returns. These

returns have been generated with a predetermined average momentum during one period and a predetermined average mean reversion in another period.

Please use dataset provided to answer the questions below. In order to do so, you

will need to clean the dataset. It comes with a number of flaws commonly seen in

dataset we receive.

Question:

1. In what month did the returns shift from exhibiting mean reversion to exhibiting

momentum, or from momentum to mean reversion. Please output the last month

that momentum(mean reversion) shift to mean reversion(momentum).

2. During the time period when these stock returns had momentum property, what

was the average momentum? Please note this is a single number, average cross

over all stock returns.

3. During the time period when these stock returns had mean reversion property,

what was the average mean reversion?Please note this is a single number, average

cross over all stock returns.

This is an interview question in industry, I have provided all information and hints in

the question and in the zoom video. This question aims to practice your ability to clean

the data using the technique we covered in the pandas lecture (especially in time series).

Please try to think about how to solve this question before looking into the hints.

Hints: The dataset provided is multiple stock returns, but questions are asked for

a single number of month that returns from one state to another state. Thus, what you

need to do is find a way to aggregate multiple stock into on market index, and observe

the index to find the answer.

If this is not enough, I have provided more detail Q & A in Discussion – Question

about Assignment 5.

3 Clustering & Classification (40pt)

1. Use sklearn.cluster.KMeans to do clustering on the given data set points.csv.

There are 5 clusters in this data set. Draw a scatter plot for the data and use color

to indicate their clusters.

2. Regard the clusters given by your KMeans model as the ground truth labels, randomly split the data set into training data (80%) and testing data (20%). Create a

2

linear SVM classifier and train it on training data set. Use the confusion matrix

to evaluate its performance on testing data set.

3. Regard the data set labels.csv as the ground truth labels, repeat the second

question. Compare their performance, discuss what do you observe, and how

would explain it.

4. (Bonus 10pt) Use tensorflow.keras API to create a fully connected neural network model, repeat the second question. Draw a plot to show how loss

changes when the step of training increases.

4 Regression (30pt)

1. In this question, we are going to use the diabetes data set. Use sklearn.datasets.load diabetes()

to load the data and labels.

2. Randomly split the data into training set (80%) and testing set (20%).

3. Create a linear regression model using sklearn, and fit training data. Evaluate

your model using test data. Give all the coefficient and R-squared score.

4. Use 10-fold cross validation to fit and validate your linear regression models on

the whole data set. Print the scores for each validation.

5. (Bonus 3pt) Use sklearn to create RandomForestRegressor model, and fit the

training data into it.

6. (Bonus 7pt) Use Grid Search to find the optimal hyper-parameters (max depth:{None,

7, 4} and min samples split: {2, 10, 20}) for RandomForestRegressor.

3

**程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持 Alipay WeChatPay PayPal等支付方式

**E-mail:** vipdue@outlook.com **微信号:**vipnxx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。