general
AU

Give me a graduate level summary of the paper attention is all you need, highlight how the transformer model inherently supports parallelization compared to RNNs

3 months ago