Select Connection: INPUT[inlineListSuggester(optionQuery(#permanent_note), optionQuery(#literature_note), optionQuery(#fleeting_note)):connections]

Table of Contents

Sequence Models have been motivated by the analysis of sequential data such text sentences, time-series and other discrete sequences data. These models are especially designed to handle sequential information while Convolutional Neural Network are more adapted for process spatial information.

Basic Model

Suppose a translation from French to English example. A first RNN could take care of encoding the input, this outputs a vector which will be fed to the decoder with a second RNN

Components:

  • Encoding network
  • Decoding network

This basic model could also work for Image captioning:

  • learn an encoding with a convolutional network
  • concatenate a RNN to generate an output which is a set of words

in the next section we will explore how to get not a random output (like in this basic model), but the most likely

Picking the Most Likely Sentence

The basic model described above is very similar to a Language Model, with the only difference that the input is a vector processed by another network. For this reason we will call the “machine translation” model Conditional Language Model, which compute:

What we want now is To do it we will use an algorithm called Beam Search.

A greedy approach will not work, will return less optimal sentences.

It has an hyperparameter (Beam Width), which regulates how many words we should keep. At the first step, the algorithm takes the most likely words, at the second step it takes the most likely words for each of the words taken at the last step, continues like this for every position.

For the decoder, at each position it will output:

Some refinements:

Length Normalization

instead of maximizing the product:

which could give very small numbers and lead to errors due to numerical rounding of floating point numbers, in practice the log scale is used like this:

Also, to avoid that longer sequences get small probabilities, we multiply the factor , usually with .

Discussion

Beam width :

  • large : better result, slower
  • small : worse result, faster
  • usually around 10, 100 is considered very large

Beam search runs faster than other search algorithms, but does not guarantee to find the exact maximum.

Error Analysis

human: algorithm:

Cases:

  1. , beam search is at fault
  2. , is a better translation than , so RNN model is at fault

for each dev set example, do this comparison and see who is at fault. If you find Beam Search is at fault increase beam width else deeper analysis to see if apply regularization and other ML techniques

Bleu Score

When you have multiple great answers, how do you measure accuracy?

BLEU (BiLingual Evaluation Understanding) could be a substitute to having humans evaluate each output.

As an example on Bigrams (pairs of words), consider the following references: Reference 1: The cat is on the mat Reference 2: There is a cat on the mat and the algorithm output: Output: The cat the cat on the mat

countcount_{clip}
the cat21
cat the10
cat on11
on the11
the mat11
so the precision is ,
  • , total number of bigrams in the output
  • , maximum number of bigrams of all references

on unigrams:

more generally, on n-grams:

We can also define the Combined Blue score on multiple ngrams:

is the brevity penalty, defined as:

which penalizes shorter translations.

Attention Model

Usually, with short sentences and sentences that are too long the model performs poorly. The intuition is that a human translator translates a small part, than reads the next and translates and goes on until it is done.

Computes a set of attention weights , which says how much attention word needs with respect to word . This repeats for each word of the input.

the context will be the weighted sum:

with the constraint that .

Computing attention

parameters can be computed with a small Neural Network with inputs and .

One downside is the quadratic time of the algorithm.

Examples in Audio Data

Speech Recognition