A scanner generator builds lexical analyzers from token definitions. Token definitions take this format: name:regex.
Lexical analyzers perform lexical analysis. Tautologies are tautologies. Lexers tokenize strings. Tokens get passed to parsers, and tokenization is the first major step in the process of compilation.
Define tokens. Press button. Receive lexer.
Deep, right? Play around with the example.
This tool automatically generates a non-deterministic finite automata (NFA) for each given regex. It then attaches them all to a common start state via ε-edges (aka λ-edges). The subset construction algorithm is run on that composite NFA, resulting in a deterministic finite automaton (DFA). The DFA maps cleanly to a set of tables that drive a lexer. A generic driver uses those tables to separate and classify tokens in a string.
(No DFA minimization occurs.)
The ball of code you see on this page contains tables and a generic scanner written in Ruby. Prefer Python? Strip the @ signs off the table declarations and port the driver. Prefer C++? So manly!