Your overall task will be to implement a numerical algorithm for solving stochastic control problems using policy iteration combined with the “deep Galerkin method” for solving a linear PDE.
This will be broken into steps which a) give you credit for completion and b) allow you to check your progress.
- Create Python code for solving the LQR problem in 2 spatial and 2 control dimensions, see Section 1. We will need this to check our policy iteration algorithm produces correct answers in the end.
- Train a neural network for the optimal Markovian control from the previous step (supervised learning), see Section 2 and train a difffferent neural network for the value function using the value functions from the previous step. This will verify that the networks are “rich enough” and can provide good approximations.
- Implement the “deep Galerkin method” for solving a linear PDE, see Section 3.
- Implement the policy iteration, see Section 4.
You will be expected to submit a PDF report on what you did on Learn by the deadline. In the PDF provide a link to an online git repository where your code can be found. It should contain a README.md telling me how to run the code to reproduce any output (graphic / table) you include in the PDF. I will use the code from the latest commit made prior to the deadline.
Use numpy, scipy, matplotlib and torch and all the functions they contain but do not use other libraries. 1 You will be working in groups of three2 and the submitted PDF and git repo README.md should state your names, student numbers and individual contribution fractions. 3 You need to work together on each of the tasks at least to the extent that you can use code created in one task for the next one. There are 22 marks in total on this assignment but it is marked out of 20. Think of the extra two marks as a bonus.
Please note that a submission that doesn’t demonstrate why things work (e.g. test examples, convergence plots etc.) will get very few marks even if you have code that correctly solves the exercises.
Linear quadratic regulator
dXs = [HXs + Mαs] ds + σdWs , s ∈ [t, T] , Xt = x .
(1)Our aim is to minimize
Jα(t, x) := Et,x ?? t T
(Xs ⊤CXs + α⊤Dαs) ds + XT ⊤RXT ? ,
where4 C ≥ 0, R ≥ 0 and D > 0, are given and deterministic and we will assume 2 ×2.
The value function is v(t, x) := infα Jα(t, x). We know how to solve the Bellman PDE to obtain that
v(t, x) = x⊤S(t)x + ? t T
tr(σσ⊤Sr) dr ,
where S is the solution of the Ricatti ODE:
S′ (r) = −2H⊤S(r) + StMD−1MS(r) − C , r ∈ [t, T] , S(T) = R .
Note that solution takes values in the space of 2 × 2 matrices. The optimal Markov control is a(t, x) = −DM⊤S(t)x .
Exercise 1.1 (Solving LQR using from Ricatti ODE, 1 mark).
Write a class which:
i) Can be initialised with the matrices specifying the LQR problem and T >
ii) Has a method which will solve (approximate) the associated Ricatti ODE on a time grid which is an input (numpy array or torch tensor).
iii) Has a method that, given a torch tensor of dimension batch size × 1 × 2, will return a torch tensor of dimension batch size ×1 with entriesbeing the control problem value v(t, x) for each t, x in the batch (for xtwod imensional).
iv) Has a method that, given a torch tensor of dimension batch size ×1× 2,will return a torch tensor of dimension batch size ×2 with entriesbeing the Markov control function for each t, x in the batch (for xtwod im ensional).
Exercise 1.2 (LQR MC checks, 6 marks). Run a Monte Carlo simulation of the system with the optimal solution you have obtained and ensure that you’re converging to the optimal value function you obtained in Exercise 1.1. In particular:
- Defifine how you’ll measure the error and justify your choice.
- With number of Conte Carlo samples large (e.g. 105) vary the number of time steps in your simulation, take 1, 10, 50, 100, 500, 1000, 5000 and plot the error as a log-log plot.
- With a number of time steps large e.g. 5000 vary the number of Monte-Carlo samples, take 10, 50, 102, 5 · 102, 103, 5 · 103, 104, 5 · 104, 105 and plot the error as a log-log plot.
Hint: Let a = a(t, x) denote the optimal control from Exercise 1.1. Once you’ve plugged this in (1) you get dXs = [HXs + Ma(s, Xs)] ds + σdWs , s ∈ [t, T] , Xt = x .
(2)To do a time discretisation fifix N
∈ N (number of time steps);let τ:=T/N be the time step. Assume from now that you’d only possible want to start the SDE at time stk = kτ for some k = 0, 1, . . . , N. You have a choice of (at least) two schemes since you know a(t, x) = −DM⊤S(t)x.
2 Supervised learning, checking the NNs are good enough
For our purposes a neural network is a parametric function that depends on an input,say x ∈ Rd and parameters, say θ ∈ Rp and takes values in say Rd′ and we’ll write ϕ = ϕ(x; θ).
An example of one hidden layer neural network is
ϕ(x; θ) = ϕ(x; α(1) , α(2) , β(1) , β(2) ) = α(1) ψ(α(2) x + β(2) ) + β(1) ,
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx