Attention Is All You Need
Offers a new simple network architecture, the Transformer, which is based solely on attention mechanisms.
Transformers are Sample Efficient World Models
Offers a new approach to deep reinforcement learning that is more sample efficient than previous methods.
Understanding intermediate layers using linear classifier probes
Offers a new insight into neural network models by proposing to monitor the features at every layer of a model and measure how suitable they are for classification.