Keep it Secret, Keep it Safe: Announcing the PII Data Anonymizer

Audrey Boguchwal

March 26, 2020

3 Minute Read

Samasource is excited to launch the PII Data Anonymizer as part of our platform for video training data. This technology enables obscuring of sensitive, personally identifying information (PII) in training data.


In light of new laws like GDPR and CCPA, it’s important for companies building AI and ML technologies to carefully manage data with PII information. Obscuring PII helps Samasource and our customers work to protect privacy

Samasource’s PII Data Anonymizer helps make more data available to train AI by keeping personally identifying information safe across a variety of data sources: People in camera images from retail spaces and public places, street-level images of people and license plates captured by vehicles, smart city applications on public transit and more.

Applications for anonymization range from autonomous transportation, detailed customer demographics, customer data like clothing and emotion, people counters, and security.

This deep learning pre-annotation technology allows Samasource to obscure faces and vehicle license plates that appear in data without the need for any human intervention. That means that private information remains private and is never seen by another person.

samasource-pii-data-anonymizer-for-training-dataaWhen Samasource receives customer data, it can be run through our anonymizer technology service before any labeling occurs. The service would automatically detect faces and license plates and obscure them, as well as blur faces and license plates so they are not recognizable.

Alternatively, it can replace faces and license plates with realistic computer-generated avatars. This AI-generated content creates training data that looks like real-time data when people and vehicles are the primary objects of interest for the algorithm.

Unlike manual blurring, Samasource’s PII Data Anonymizer is run without a human examining the data, which contributes to the privacy of PII data. It is built on deep learning and is run within our technology platform, ensuring that customer data never leaves Samasource’s secure cloud environment.

From pilots to multi-year projects, Samasource securely trains and validates computer vision and NLP models. We work on a range of use cases ranging from e-commerce to autonomous transportation, manufacturing, navigation, retail, AR/VR, and biotech. If your goal is to quickly build smarter AI, contact our team to discuss your training data needs.

Contact Us

Audrey Boguchwal

Currently a Senior Product Manager at Samasource, Audrey guides cross-functional teams to create thoughtful product solutions. She has guided teams of designers and engineers at HUGE Inc. and NBCUniversal, and monitored user analytics at the Wall Street Journal. With a BA in history from Harvard, an MA in anthropology from Columbia and an MBA from UNC Chapel Hill KFBS, Audrey is passionate a using technology and data analytics facilitate social impact and environmental solutions through technology.