Glossary Entry

Corpus

A structured collection of text or other examples that you analyze, search, or use to train and evaluate a model.

Data NLP

Also called: text corpus, dataset corpus

Seed source: Google ML Glossary

A corpus is the pool of examples your system works over. In language tasks, that usually means a body of text such as documents, search queries, support tickets, or product descriptions.

In practical blog examples, the corpus is often the thing being indexed or embedded before retrieval. Once you name that collection clearly, the rest of the pipeline becomes much easier to explain.