Better Algorithms, Better Lives: Reducing Poverty Through Training Data

Audrey Boguchwal

May 24, 2017

4 minute read

Lasttime, we explained how Samasource uses the Hub, our online software, to connect low-income people to digital work. Now we’ll spotlight one of our clients to show how our image annotation work can be used in machine learning to train image recognition algorithms.

Markablecame to Samasource for help training their state-of-the-art image recognition technology that identifies fashion products in photos and videos. With Markable’s tech in place, viewers can click on clothing they see while watching TV shows and Markable will generate matching product results. Check out Markable’s demo to see how it works.

 

To create image recognition technology, engineers develop a machine learning program that learns to identify objects of interest from training data. For Markable, the training data is photographs of fashion products, clearly labeled so when the algorithm encounters an unknown product, it can infer what it is based on trained examples. This process is called “supervised learning” because the algorithm is given examples structured specifically for training. A high-quality training data set for a vision algorithm consists of tens of thousands of images in which objects are outlined and labeled according to the desired classification. The accuracy of the training data is important as an algorithm trained with inconsistently labeled data won’t learn patterns and won’t be able to identify objects.

How do all those training data images get labeled accurately and quickly enough to get Markable’s tech to customers? That’s where Samasource comes in. As part of our image annotation service, we have a trained, scalable workforce and the right tools to label thousands of images for training data, efficiently and accurately. We balance stringent quality requirements with on-time deliveries to ensure clients like Markable can stick to their project timelines.

 

18

 

Samasource worked closely with Markable to understand the project’s annotation requirements and refine them for ambiguous images. Markable provided Samasource with source images and then we trained a dedicated team of workers how to draw boxes around the objects of interest and label them. For Markable, Samasource workers learned to identify every minor visible detail, such as heel-length of a shoe, to ensure exceptional data quality. Samasource’s on-site quality analysts inspect a sample of annotated images daily to ensure quality, asking workers to redo any images that don’t meet client standards.

Here’s what image annotation work looks like in the Hub. Agents draw bounding boxes (or tight outlines) around each object and then use a menu to add labels. In the for-presentation-only image below, a worker has labeled the shirt, pants and is labeling boots with a multi-level menu. Samasource can set up label data entry to meet a variety of project needs: text entry, selection from a dropdown, or search for a label from a list, to name a few.

19

With Samasource’s workers, Markable has been able to improve their algorithm and better serve their customers. They wrote to us: “With the help of Samasource's annotation, we were able to surpass previous state-of-the-art accuracy on the largest open-source fashion e-commerce dataset. We had a great experience in working with Samasource and we will continue working with them in future.

Samasource’s client relationships are a win-win: clients provide our workers with life-changing job opportunities and workers help clients achieve their business goals, even on the trickiest projects.

In our third and final post, we’ll show you how Samasource’s workers take on a web research project.

Audrey Boguchwal

Currently a Senior Product Manager at Samasource, Audrey guides cross-functional teams to create thoughtful product solutions. She has guided teams of designers and engineers at HUGE Inc. and NBCUniversal, and monitored user analytics at the Wall Street Journal. With a BA in history from Harvard, an MA in anthropology from Columbia and an MBA from UNC Chapel Hill KFBS, Audrey is passionate a using technology and data analytics facilitate social impact and environmental solutions through technology.