PIIGuard / Detectors
Person Name
Identifies given names, surnames, and full names in free-form text using lexicons built from public government name registries. Covers approximately 3.3 million given names and 164,000 surnames across the US, Canada, Mexico, and Argentina.
Endpoint: POST /v1/pii/detect/person
Validation & Confidence
Lexicon lookup against deduplicated government name datasets. Full name = first + last in sequence.
Full name pair: 1.0, high. Single unambiguous name: 1.0, high. Name overlapping common English word: 0.2, medium.
Example
from instructeer.guards import PIIGuard
# api_key = os.environ["INSTRUCTEER_API_KEY"]
pii = PIIGuard(api_key="rg_your_key")
result = pii.detect_person("Contact John Doe or just ask for Jane")// Response
[
{ "entity_type": "PERSON_NAME", "value": "John Doe", "severity": "high", "confidence": 1.0,
"extra": { "match_type": "full_name" } },
{ "entity_type": "PERSON_NAME", "value": "Jane", "severity": "high", "confidence": 1.0,
"extra": { "match_type": "given_name" } }
]Reference
| Dataset | Type | License |
|---|---|---|
| SSA Baby Names | US given names 1880–present | CC0 |
| US Census Surnames | US surnames 2010 | Public Domain |
| Ontario / Quebec / Alberta | Canada given names | OGL / CC BY 4.0 |
| Mexico RENAPO | Given names + surnames | CC0 |
| Argentina Historic Names | Given names 1922–2015 | CC BY 4.0 |
Notes
- Common word flag: Names that overlap common English words (e.g. "Do", "Will") score 0.2 with common_word: true in extra. Names primarily used as names ("Grace", "Victor") are exceptions and retain 1.0.
- Recall: ~60–70% for typical US/UK/CA user bases. Lower for names from regions not in the datasets.
- Middle initials: "Zane S. Ramirez" — middle initial handled, fires as single full_name detection.
Known Gaps
| Transliterated variants | "Sergei", "Sergey", "Serghei" may not all be present |
| Non-English naming conventions | Common word filter is US English — may penalize valid names from other traditions |
| Names separated by punctuation | A colon or period between tokens prevents full-name pairing |