Compute Resources for Homeworks
You can complete the course homeworks either using your personal computers, other compute resources you
have access to, or you can choose to use one of the GHC machines that have been assigned for this course
(ghc50.ghc.andrew.cmu.edu – ghc69.ghc.andrew.cmu.edu). The GHC machines (ghc50 – ghc69) are Red
Hat Linux machines with 8-core i7-9700 CPUs, 16 GB of RAM and a GeForce GTX 2080 GPU. You can log
into a GHC machine as shown below:
$ ssh <andrewID>@ghc<machine-id>.ghc.andrew.cmu.edu
$ Password: <enter-your-andrew-password>
Note that one or more of the GHC machines could be offline at anytime. If you are unable to log into a
specific machine, try one of the other machines in the cluster. You can also use the “w” command to see how
many other students are using a particular machine. i.e.
$ ssh -t <andrewID>@ghc<machine-id>.ghc.andrew.cmu.edu w
12:50:08 up 6:03, 1 user, load average: 0.01, 0.04, 0.05
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
<andrewID> pts/2 <machineID> 12:50 0.00s 0.05s 0.00s w
For students who are new to a linux programming environment, here are some commonly used commands:
Sequence Models with Attention
Speech recognition can be regarded as a sequence transduction task with speech feature frames as inputs
and language tokens as outputs. Sequence models have an encoder-decoder architecture, where the encoder
obtains hidden representations from the input audio features, and the decoder produces language tokens in
an autoregressive fashion. Attention can be used to obtain alignments between the encoded outputs and
the decoder outputs at every decoding time step. In this assignment, you will implement an attention-based
approach for speech recognition.
The primary reference paper we recommend for this assignment is Listen,Attend, and Spell (LAS). The code
for this assignment will be written using Pytorch, and we recommend the following resources to students
who are new to the toolkit: Pytorch Documentation, Pytorch Tutorials, and in particular, the tutorial for
Machine Translation using Sequence Models with Attention
Handout Structure and Coding Instructions
The Data is present within the asr_data directory of the handout. This contains three directories for train,
dev and test. Data that is input to the model and dataloader is stored in JSON format. The JSON file
contains utterance dictionaries, and each utterance has input, and output keys, which contain the path to
Kaldi ark files, and character level tokenized speech transcripts respectively. The JSON file for the unknown
test set only contains the input speech features and has no transcription.
The code template has the following structure:
1. conf- The directory has configurations in YAML format for training and decoding
(a) base.yaml- The baseline configuration that can help you get the necessary scores for this assign
(a) encoder.py- Has the Neural Network Modules for the basic RNNLayer, pBLSTM, and Listener,
which is the LAS Encoder. You need to complete the forward method for the pBLSTM and
(b) decoder.py- Has the Neural Network Speller Module which is the LAS Decoder. It contains the
forward and greedy_decode methods that you need to fill. Optionally, to get better performance,
you could also implement the beam_decode method in this file.
(c) las_model.py- Has the Neural Network Module Wrapper for SpeechLAS, the sequence model with
(d) attention.py- Has Neural Network Modules that implement Location Based Attention
3. train.py- Interface for training the ASR Models. Has all the training options and default values listed.
4. decode.py – Interface for decoding the trained ASR Model. This will generate the decoded_test.txt
file that you need to submit to Gradescope.
5. utils.py- Contains Loss,Accuracy, WER utilities and padding utilities for the nural network code.
6. trainer.py- Contains the Trainer object that performs ASR Training
7. requirements.txt- Contains the pip installable Python environment for this assignment
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: email@example.com 微信号:vipnxx