Transformers process relationships between tokens in parallel rather than strictly one step at a time. That design made them especially effective for large-scale language modeling and many downstream NLP tasks.
Much of the current machine learning tooling around chatbots, retrieval, summarization, and fine-tuning builds on transformer-based models.
