A corpus is the pool of examples your system works over. In language tasks, that usually means a body of text such as documents, search queries, support tickets, or product descriptions.
In practical blog examples, the corpus is often the thing being indexed or embedded before retrieval. Once you name that collection clearly, the rest of the pipeline becomes much easier to explain.
