# 机器学习代写｜COMP3670: Introduction to Machine Learning

Note: For the purposes of this assignment, we let lowercase p denote probability density functions (pdf’s), and upper case P denote probabilities. If a random variable Z is characterized by a probability density function p, we have that

P(a Z b) = Z ab

p(z) dz

You should show your derivations, but you may use a computer algebra system (CAS) to assist with integration or differentiation.1.

Question 1 Bayesian Inference  (40 credits)

Let X be a random variable representing the outcome of a biased coin with possible outcomes X = {0, 1}, x ∈ X . The bias of the coin is itself controlled by a random variable Θ, with outcomes2 θ θ,where

θ = {θ R : 0 x 1}

The two random variables are related by the following conditional probability distribution function of X given Θ.

p(X = 1 | Θ = θ) = θ

p(X = 0 | Θ = θ) = 1 θ

We can use p(X = 1 | θ) as a shorthand for p(X = 1 | Θ = θ).

We wish to learn what θ is, based on experiments by flipping the coin. Before we flip the coin, we choose as our prior distribution

p(θ) = 30θ2 (1 θ)2

which, when plotted, looks like this:

a) (3 credits) Verify that p(θ) = 30θ2 (1 θ)2 is a valid probability distribution on [0, 1] (i.e that it is always non-negative and that it is normalised.)

We flip the coin a number of times.3 After each coin flip, we update the probability distribution for θ  to reflect our new belief of the distribution on θ, based on evidence.

Suppose we flip the coin four times, and obtain the sequence of coin flips 4 x1:4 = 0101. For its two subsequences 01 and 0101, denoted by x1:2, x1:4 (and for the case before any coins are flipped),complete the following questions.

b) (15 credits) Compute their probability distribution functions after observing the two subsequences x1:2 and x1:4, respectively.

c) (3 credits) Compute their expectation values µ of θ before any evidence as well as after observing the two subsequences x1:2 and x1:4, respectively.

d) (3 credits) Compute their variances σ2 of θ before any evidence as well as after observing the two subsequences x1:2 and x1:4, respectively.

e) (5 credits) Compute their maximum a posteriori estimations θMAP of θ before any evidence as well as after observing the two subsequences x1:2 and x1:4, respectively.

Present your results in a table like as shown below.

f) (5 credits) Plot each of the probability distributions p(θ), p(θ|x1:2 = 01), p(θ|x1:4 = 0101) over the interval 0 θ 1 on the same graph to compare them.

g) (6 credits) What behaviour would you expect of the posterior distribution p(θ|x1:n) if we updated on a very long sequence of alternating coin flips x1:n = 01010101 . . .?

What would you expect µ, σ2 , θMAP to look like for large n?

Sketch/draw an estimate of what p(θ|x1:n) would approximately look like against the other distributions.

Question 2 Bayesian Inference on Imperfect Information  (50 credits)

We have a Bayesian agent running on a computer, trying to learn information about what the parameter θ could be in the coin flip problem, based on observations through a noisy camera. The noisy camera takes a photo of each coin flip and reports back if the result was a 0 or a 1. Unfortunately, the side of the coin with a ”1” on it is very shiny, and the reflected light causes the camera to sometimes report back the wrong result.5 The probability that the camera returns a correct answer is parameterised by ϕ [0, 1]. Letting X denote the true outcome of the coin, and bX denoting what the camera reported back, we can draw the relationship between X and b X as shown.

We would now like to investigate what posterior distributions are obtained, as a function of the parameter ϕ. Let bx1:n be a sequence of coin flips as observed by the camera.

a) (5 credits) Briefly comment about how the camera behaves for ϕ = 1, ϕ = 0.5, ϕ = 0. How you expect this would change how the agent updates it’s prior to a posterior on θ, given an observation of bX. (No equations required.)

b) (10 credits) Computep(bX = x|θ) for all x ∈ {0, 1}.

c) (15 credits) The coin is flipped, and the camera reports seeing a zero. (i.e. thatbX = 0.)Given an arbitrary prior p(θ), compute the posterior p(θ| bX = 0). What does p(θ| bX = 0) simplify to when ϕ = 1? When ϕ = 1/2? When ϕ = 0? Explain your observations.

d) (10 credits) Compute p(θ | bX Simplify your expression. = 0) for the same choice of prior p(θ) = 30θ2 (1 θ)2 as before.

e) (10 credits) Plot p(θ |bX = 0) as a function of θ, for all ϕ ∈ {0, 0.25, 0.5, 0.75, 1} on the same graph to compare them. Comment on how the shape of the distribution changes with ϕ. Explain your observations.

E-mail: vipdue@outlook.com  微信号:vipnxx