PIIGuard / Detectors

Person Name

Identifies given names, surnames, and full names in free-form text using lexicons built from public government name registries. Covers approximately 3.3 million given names and 164,000 surnames across the US, Canada, Mexico, and Argentina.

Endpoint: POST /v1/pii/detect/person

Validation & Confidence

Lexicon lookup against deduplicated government name datasets. Full name = first + last in sequence.

Full name pair: 1.0, high. Single unambiguous name: 1.0, high. Name overlapping common English word: 0.2, medium.

Example

from instructeer.guards import PIIGuard

# api_key = os.environ["INSTRUCTEER_API_KEY"]
pii = PIIGuard(api_key="rg_your_key")
result = pii.detect_person("Contact John Doe or just ask for Jane")
// Response
[
  { "entity_type": "PERSON_NAME", "value": "John Doe", "severity": "high", "confidence": 1.0,
    "extra": { "match_type": "full_name" } },
  { "entity_type": "PERSON_NAME", "value": "Jane",     "severity": "high", "confidence": 1.0,
    "extra": { "match_type": "given_name" } }
]

Reference

DatasetTypeLicense
SSA Baby NamesUS given names 1880–presentCC0
US Census SurnamesUS surnames 2010Public Domain
Ontario / Quebec / AlbertaCanada given namesOGL / CC BY 4.0
Mexico RENAPOGiven names + surnamesCC0
Argentina Historic NamesGiven names 1922–2015CC BY 4.0

Notes

  • Common word flag: Names that overlap common English words (e.g. "Do", "Will") score 0.2 with common_word: true in extra. Names primarily used as names ("Grace", "Victor") are exceptions and retain 1.0.
  • Recall: ~60–70% for typical US/UK/CA user bases. Lower for names from regions not in the datasets.
  • Middle initials: "Zane S. Ramirez" — middle initial handled, fires as single full_name detection.

Known Gaps

Transliterated variants"Sergei", "Sergey", "Serghei" may not all be present
Non-English naming conventionsCommon word filter is US English — may penalize valid names from other traditions
Names separated by punctuationA colon or period between tokens prevents full-name pairing