# Python代写｜Natural Language Processing Assignment2

• ALL

1. Recall that attention can be viewed as an operation on a query 𝑞 ∈ 𝑅𝑑, a set of key
vectors {𝑘1, … , 𝑘𝑖, … , 𝑘𝑛}, 𝑘𝑖 ∈ 𝑅𝑑, and a set of value vectors {𝑣1, … , 𝑣𝑖, … , 𝑣𝑛}, 𝑣𝑖 ∈
𝑅𝑑.

a) Please write down the equations for the attention weights 𝑎𝑖 and the output
𝑐 ∈ 𝑅𝑑, a correspondingly weighted average over the value vectors.

b) Describe what properties of the inputs to the attention operation would result
in the output 𝑐 being approximately equal to 𝑣𝑗 for 𝑗 ∈ {1, . . . , 𝑛}.

c) Consider a set of key vectors {𝑘1, … , 𝑘𝑖, … , 𝑘𝑛}, 𝑘𝑖 ∈ 𝑅𝑑 where 𝑘𝑖 ⊥ 𝑘𝑗 for all
𝑖 ≠ 𝑗 and ‖𝑘𝑖‖ = 1. Let 𝑣𝑎, 𝑣𝑏 ∈ {𝑣1, . . . , 𝑣𝑛}, 𝑘𝑎, 𝑘𝑏 ∈ {𝑘1, . . . , 𝑘𝑛}. Give an
expression of query vector 𝑞 such that the output 𝑐 is approximately 1/2(𝑣𝑎 +𝑣𝑏).

2. Perplexity.

a) You are given a training set of 100 numbers that consists of 10 each of digits
0-9. Now we see the following test set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. What is
the unigram perplexity of the test set?

b) You are given a training set of 100 numbers that consists of 91 zeros and 1
each of the other digits 1-9. Now we see the following test set: {0, 0, 0, 0, 0,
6, 0, 0, 0, 0}. What is the unigram perplexity of this test set? Please first use
your intuition to describe whether the perplexity should go up or down
compared with the result in a). Then, calculate the number to see if it aligns

3. In the lecture, we talked about the process of attention, Transformer, and one
popular pretrained model, BERT. Here we will try to implement some details of
them in Python.

For all inputs and outputs for a function or class, please state clearly in
comments about the shape and meaning of the parameters. Please make
necessary comments to make your code easy to understand.

Besides, all inputs and outputs in the problems below are organized in batch, with
the batch size equal to batch_size. Other dimensions of the variables should be
defined reasonably by students.

a) The core equations of attention are part of Problem 1. Here we hope you
could implement it in Python. Please define a function called attention,
taking four matrices (numpy.array) 𝑞, 𝑘, 𝑣, and an attention mask as inputs.

You will first do a sanity check on the dimensions of all inputs matrices. If
the dimensions fail the check, please raise an error. Only if all the
dimensions are proper will it calculate the attention weight and final output.
The default value for the attention mask is None. Below is a sketch of the
function.

def attention(q, k, v, attn_mask=None):
# Sanity check on dimensions of q, k, v, attn_mask
# Calculate attention weight and final output
return attn_weight, outputs

b) A Transformer encoder block consists of multi-headed attention, layer
normalization, feed-forward network. We follow the routine construction of
the feed-forward network, which are a combination of two stacked linear
layer with dropout. All modules mentioned above have their corresponding
implementation in PyTorch.nn. Please construct a Transformer encoder
block in PyTorch. Below is a sketch of the class.

class TramsformerEncoderBlock(Module):
def __init__(): # Please fill in all the related parameters
# construct self-attention, normalization, feed-forward network
# remember to save layers and parameters to self
pass
def forward(source,
# implement the forward process using defined layers in __init__
pass

(Please construct this block in your own codes, not using any of the PyTorch
implementations of the Transformer structure.) E-mail: vipdue@outlook.com  微信号:vipnxx 