# Python辅导 | Predicting Price Direction

1. Predicting Price Direction [34 marks]

Unpredictability of short-term asset returns is a subject of asset pricing research: eﬃcient markets produce near-Normal daily returns with low correlation to past values. That limits application of autoregression on lagged returns. However, the progress is possible and in this assignment you will make direction predictions using any 2 out of 4 types of Classiﬁer of your choice.1

Predict sign of next daily move but welcome to modify the task to predict for longer periods, a 5-day move. Certain classiﬁers SVM, ANN are more suited to make such longer-term predictions.

The assignment limits the task to binomial prediction in asset price movement: positive or negative return−1,1. For some classiﬁers, particularly neural networks or if using bagging/boosting, re-label as 0,1.

Start with lagged log-returns rt−1,rt−2,… as your features. Use ADDITIONAL simple variations around price Pt from Table 1. More complex indicators (eg, RSI, Stochastic K, MACD, CCI, Acc/Distrib) are beyond the scope of the assignment.

Study Design:

• Choose 2 equities with a base of comparison (eg, same industry), or 2 broad market indexes, or 2 Fama-French factors. FF factors are good candidate series to try prediction of monthly returns.
• For classiﬁers other than ANNs, use 7 or fewer lagged values. Can also introduce past 5D return or 5D Momentum Pt −Pt−5 as a feature.

Classiﬁer A.1 Logistic Classiﬁer and Bayesian Classiﬁer

1. a) Make sure to implement penalised versions of logistic regression and discuss impact on coeﬃcients. Apply and discuss the diﬀerence between L1 and L2 cost functions, the impact made on regression coeﬃcients. b) Demonstrate the use of sklearn.model selection for reshuﬄed samples and k-fold crossvalidation.

Classiﬁer A.2 Support Vector Machines

1. a) Consider soft vs. hard margin, present in mathematical notation and consider impact on your 2D relationships. b) Speciﬁcally consider Momentum Feature vs Return t − 1 and provide 2D visualisation (up/down points in diﬀerent colour). While support vectors are diﬃcult to present, use SVM SVC.supportvectors and prepare interpretable visualisations. c) No need to vary type of kernel.

Classiﬁer A.3 Decision Tree Regressor (or Boosted Random Forest)

1. a) Visualise the decision tree Regressor (note limitations of graphviz) and discuss if splits are sensible choices. Split gives a percentage sorted into one class (up) versus another (down). b) Report hyperparameters: min number to split, minimum number in leaf, and maximum depth. c) Decision tree Classiﬁer builds a very elaborate tree that achieves perfect in-sample ﬁt (likely to be suited to non-time series data); it is critical to test the prediction on a holdout sample.

Classiﬁer A.4 Artiﬁcial Neural Network

If on believes the data carries autoregressive structure: a recurrent neural network model can be a successful alternative to time series regression. a) Attempt to use LSTM classiﬁer with features given in Table 1. LSTM can come out as one of best-predicting models from ﬁnancial ratios/volatility estimators/adv technical indicators but those features are beyond the scope. b) Dealing with the arbitrary length of sequence is the major characteristic of LTSM. Attempt prediction of 5D or 10D return for equity or 1W, 1M for FF factor, but for robust estimation use > 5−7 years of data for equity.

1. Prediction Quality and Bias (each chosen classiﬁer)

Task B.1 Investigate the prediction quality using confusion matrix (precision/recall statistics) and area under ROC curve – these are possible for all classiﬁers if prediction is binomial. Particularly check the quality of predicting the down movements (negative sign of return).

Task B.2 Improve your use of classiﬁer by changing features or hyperparameters, for example with sklearn.model-selection.GridSearchCV. Alternatively, introduce bagging/boosting and discuss impact on prediction quality. A new boosted model deals with mistakes of the previous models – common use is AdaBoost for decision trees as weak learners. Particularly describe steps taken to reduce misclassiﬁed negative returns. Present comparison BEFORE and AFTER your improvements.

Task B.3 Develop a scheme that utilises transition probabilities predict-proba() method. Provide separate scatter plots for probabilities of up and down moves, using colour codes for correctly/incorrectly realised prediction. Devise a P&L that relies on fractional betting and the edge p−(1−p) = 2p−1, where probability of move p is above a threshold 75%-90%. Discuss over-relying on transition probabilities for poorly predicted negative returns.

Work on these tasks can be appended to each classiﬁer use case.

Instructions

Work on ALL tasks in the format required. Recite mathematical underpinnings for each chosen Classiﬁer. Code must be submitted and be producing the computational output. Full mathematical workings required for Interest Rates Modeling questions.

Format and Coding: Submit ONE .pdf report ﬁle and ONE .zip ﬁle with data and code, ﬁle name starting with your LASTNAME. It is advantageous to merge all your workings in one PDF ﬁle.

• Implementation is best done in Python using sklearn. For those starting with Python, price direction prediction MODIFIED.ipynb provided as a template to start the work.
• It is acceptable to implement classiﬁcation in R/Matlab, but tutor’s support might be limited. Matlab use should not devolve to exploration with Classiﬁcation Learner App only.
• It is possible to have a limited implementation in Excel (eg, logistic regression), however that risks to be below passing mark (60%) because of missing other kinds of Classiﬁers/prediction quality.

Report Content and Analytical Quality:

• If printing out Python Notebook as your report – please ensure it comes across as an analytical report with a) headers to separate sections, b) clarity which sections address Questions A.1 – A.4 and B.1 – B.3, and c) avoid large tables of output (show the head/tail/selected sample).
• It is not expected that you will have particularly good accuracy in predicting short-term returns from past returns, but prediction analysis and clear explanation of improvement steps is what matters.
• Within each kind of Classiﬁer, you might like to present • Focus on explaining the underpinnings and tuning of Classiﬁers. Your implementation should include tuning of parameters that are speciﬁc to each Classiﬁer, eg, regularisation strength for Logistic, margin softness for SVM. It is good practice to save some data as a holdout sample (not used in estimation) on which to test your ﬁtted models.

E-mail: vipdue@outlook.com  微信号:vipnxx