general
Home
>
general
>
No Title
AU
Give me a graduate level summary of the paper attention is all you need, highlight how the transformer model inherently supports parallelization compared to RNNs
3 months ago