Definition

The process of breaking up text into smaller units, known as tokens, which can be used in various natural language processing tasks such as parsing, indexing, and searching.