AMD Interview Question

How does the self attention layer work in transformers?