this video demos a PII detector I built and explains the high level architecture
GitHub repo link:
github.com/Arman-mi/LLM-PII-detector
Edit** on my part:
The NER model does not tokenize based on the labels, it tokenizes the input into subword tokens, then turns them into embeddings, passes the embeddings through a transformer to build contextual representations, then the model performs token level classification based on the labels