首页 » Python辅导 » Python辅导 | Predicting Price Direction

Python辅导 | Predicting Price Direction

这个作业是使用python对股票价格进行预测

  1. Predicting Price Direction [34 marks]

Unpredictability of short-term asset returns is a subject of asset pricing research: efficient markets produce near-Normal daily returns with low correlation to past values. That limits application of autoregression on lagged returns. However, the progress is possible and in this assignment you will make direction predictions using any 2 out of 4 types of Classifier of your choice.1

Predict sign of next daily move but welcome to modify the task to predict for longer periods, a 5-day move. Certain classifiers SVM, ANN are more suited to make such longer-term predictions.

The assignment limits the task to binomial prediction in asset price movement: positive or negative return−1,1. For some classifiers, particularly neural networks or if using bagging/boosting, re-label as 0,1.

Start with lagged log-returns rt−1,rt−2,… as your features. Use ADDITIONAL simple variations around price Pt from Table 1. More complex indicators (eg, RSI, Stochastic K, MACD, CCI, Acc/Distrib) are beyond the scope of the assignment.

 

Study Design:

  • Choose 2 equities with a base of comparison (eg, same industry), or 2 broad market indexes, or 2 Fama-French factors. FF factors are good candidate series to try prediction of monthly returns.
  • For classifiers other than ANNs, use 7 or fewer lagged values. Can also introduce past 5D return or 5D Momentum Pt −Pt−5 as a feature.

 

Classifier A.1 Logistic Classifier and Bayesian Classifier

  1. a) Make sure to implement penalised versions of logistic regression and discuss impact on coefficients. Apply and discuss the difference between L1 and L2 cost functions, the impact made on regression coefficients. b) Demonstrate the use of sklearn.model selection for reshuffled samples and k-fold crossvalidation.

Classifier A.2 Support Vector Machines

  1. a) Consider soft vs. hard margin, present in mathematical notation and consider impact on your 2D relationships. b) Specifically consider Momentum Feature vs Return t − 1 and provide 2D visualisation (up/down points in different colour). While support vectors are difficult to present, use SVM SVC.supportvectors and prepare interpretable visualisations. c) No need to vary type of kernel.

Classifier A.3 Decision Tree Regressor (or Boosted Random Forest)

  1. a) Visualise the decision tree Regressor (note limitations of graphviz) and discuss if splits are sensible choices. Split gives a percentage sorted into one class (up) versus another (down). b) Report hyperparameters: min number to split, minimum number in leaf, and maximum depth. c) Decision tree Classifier builds a very elaborate tree that achieves perfect in-sample fit (likely to be suited to non-time series data); it is critical to test the prediction on a holdout sample.

Classifier A.4 Artificial Neural Network

If on believes the data carries autoregressive structure: a recurrent neural network model can be a successful alternative to time series regression. a) Attempt to use LSTM classifier with features given in Table 1. LSTM can come out as one of best-predicting models from financial ratios/volatility estimators/adv technical indicators but those features are beyond the scope. b) Dealing with the arbitrary length of sequence is the major characteristic of LTSM. Attempt prediction of 5D or 10D return for equity or 1W, 1M for FF factor, but for robust estimation use > 5−7 years of data for equity.

 

  1. Prediction Quality and Bias (each chosen classifier)

Task B.1 Investigate the prediction quality using confusion matrix (precision/recall statistics) and area under ROC curve – these are possible for all classifiers if prediction is binomial. Particularly check the quality of predicting the down movements (negative sign of return).

Task B.2 Improve your use of classifier by changing features or hyperparameters, for example with sklearn.model-selection.GridSearchCV. Alternatively, introduce bagging/boosting and discuss impact on prediction quality. A new boosted model deals with mistakes of the previous models – common use is AdaBoost for decision trees as weak learners. Particularly describe steps taken to reduce misclassified negative returns. Present comparison BEFORE and AFTER your improvements.

Task B.3 Develop a scheme that utilises transition probabilities predict-proba() method. Provide separate scatter plots for probabilities of up and down moves, using colour codes for correctly/incorrectly realised prediction. Devise a P&L that relies on fractional betting and the edge p−(1−p) = 2p−1, where probability of move p is above a threshold 75%-90%. Discuss over-relying on transition probabilities for poorly predicted negative returns.

Work on these tasks can be appended to each classifier use case.

 

 

Instructions

Work on ALL tasks in the format required. Recite mathematical underpinnings for each chosen Classifier. Code must be submitted and be producing the computational output. Full mathematical workings required for Interest Rates Modeling questions.

Format and Coding: Submit ONE .pdf report file and ONE .zip file with data and code, file name starting with your LASTNAME. It is advantageous to merge all your workings in one PDF file.

  • Implementation is best done in Python using sklearn. For those starting with Python, price direction prediction MODIFIED.ipynb provided as a template to start the work.
  • It is acceptable to implement classification in R/Matlab, but tutor’s support might be limited. Matlab use should not devolve to exploration with Classification Learner App only.
  • It is possible to have a limited implementation in Excel (eg, logistic regression), however that risks to be below passing mark (60%) because of missing other kinds of Classifiers/prediction quality.

Report Content and Analytical Quality:

  • If printing out Python Notebook as your report – please ensure it comes across as an analytical report with a) headers to separate sections, b) clarity which sections address Questions A.1 – A.4 and B.1 – B.3, and c) avoid large tables of output (show the head/tail/selected sample).
  • It is not expected that you will have particularly good accuracy in predicting short-term returns from past returns, but prediction analysis and clear explanation of improvement steps is what matters.
  • Within each kind of Classifier, you might like to present • Focus on explaining the underpinnings and tuning of Classifiers. Your implementation should include tuning of parameters that are specific to each Classifier, eg, regularisation strength for Logistic, margin softness for SVM. It is good practice to save some data as a holdout sample (not used in estimation) on which to test your fitted models.

程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB


本网站支持 Alipay WeChatPay PayPal等支付方式

E-mail: vipdue@outlook.com  微信号:vipnxx


如果您使用手机请先保存二维码,微信识别。如果用电脑,直接掏出手机果断扫描。

blank

发表评论

您的电子邮箱地址不会被公开。