with autoregressive models and reinforcement learning

I have a poster to prepare

I 2pv* a Uoin8r to prTpa@e

POS Tagging

Protein folding

Spelling correction

Structured Prediction

Betty Fabre

Ph.D student

Tanguy

Urvoy

Orange Labs

Damien

Lolive

Jonathan

Chevelu

HPC Summer School , July2019

Encoder-Decoder / Autoregressive models

Encoder

Decoder

Input features  x

Structured output

y = [START, y1, y2, .., yn, STOP]

Fixed length representation of x

ref : [2014] Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le

Encoder-Decoder / Autoregressive models

Structured

output y

 

Decoder

START

Å·1

Å·2

...

STOP

Model

Model

Model

ref : [2014] Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le

Training vs. Decoding

START

Å·1

Model

Training : likelihood optimization & teacher forcing

START

Å·1

Å·2

...

STOP

Model

Model

Model

Training vs. Decoding

START

Å·1

Model

Training : likelihood optimization & teacher forcing

START

Å·1

Å·2

...

STOP

Model

Model

Model

Training vs. Decoding

START

Å·1

Model

Training : likelihood optimization & teacher forcing

START

Å·1

Å·2

...

STOP

Model

Model

Model

y1

yt

Training vs. Decoding

START

Å·1

Model

Training : likelihood optimization & teacher forcing

START

Å·1

max p(yt|y1, y2, ..yt-1)

...

STOP

Model

Model

Model

y1

yt

Training vs. Decoding

START

Å·1

Model

Decoding : autoregressive 

START

Å·1

ŷt =  argmaxy' p(y'|ŷ1,..ŷt-1)

...

Model

Model

→ decoding local while

evaluation  is global

→ exposure bias              

Reinforcement Learning Approaches

REINFORCE Algorithm

training with exposure bias

training with expectation

Reinforcement Learning Approaches

Actor-Critic Algorithm

ref: 

- [1992] Simple statistical gradient-following algorithms for connectionist reinforcement learning, Ronald J. Williams

- [2015] Sequence Level Training with Recurrent Neural Networks, Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

- [2017] An Actor-Critic Algorithm for Sequence Prediction, Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe,      Joelle Pineau, Aaron Courville, Yoshua Bengio

Â