June 6, 2019
4 Minute Read
Last week at Autonomous Vehicle Technology Expo, Kirk Boydston, Training Data Specialist, Samasource asked attendees if more data was required to get their machine learning model even close to 100% accuracy.
The answer was an unanimous yes, and the best practices Kirk shared emphasized the importance of a solid training data strategy to move toward level 4 autonomous driving.
Here are five considerations to move your machine learning model down the training data continuum.
The feedback gained from a proof-of-concept (POC) model provides the necessary insight to estimate how much data is needed to achieve level 4 maturity. Don’t wait until you have the “right” amount of data, or an all-encompassing data content scope.
Allow the results of your POC to drive the need for more data and refine, expand and improve as unknown use cases and edge cases surface.
Edge cases should not be an afterthought, but rather a key component of your training data strategy. Objects may have many different names e.g., Is it a scooter? Or a small moped? Or, perhaps a mini motorbike?
Define clear classification rules for subjective and specific objects, and set objective rules to determine their class. These training data rules will help you cover edge cases effectively.
Our directly managed workforce has annotated over one million images on SamaHub, and we’ve helped partners like General Motors and Volkswagen achieve high-quality data at scale with our training data expertise.
Bottom line is your actions have impact. Not only will an ethically minded training data strategy lead to higher quality results, you’ll be making a difference in communities near and far.
In addition to lower precision, models trained on biased data can have ethical, legal and safety problems. For example, data biased toward pedestrians with lighter skin may cause the model to identify pedestrians with darker skin less accurately.
You will almost always have overrepresentation of some elements and underrepresentation of others, and thoughtfully testing your model for bias before, after and throughout production will help move your model to maturity.
Model training is never done. Every day, the world is changing around you. New cars, new fashions, etc., require newly sourced and labeled training data. According to McKinsey Global Institute 1 out of 3 AI systems require model refresh at least monthly and sometimes daily. In cases where your model is weak, treat the occurrence like a bug that needs to be fixed and continuously evolve your model.
Achieving 100% accuracy can feel a lot like approaching the speed of light, and while 100% accuracy isn’t necessary for all algorithms (like chatbots), level 4 maturity is the goal automotive OEMs are striving for.
Kirk’s presentation, “Warp “Driving” Approaching AI’s Speed of Light,” urged attendees to be relentless in seeking out weak scenarios to train with new data.
His key takeaway was training and validation for machine learning may require different precision thresholds and volumes depending on the stage of maturity. If you want to get your AV machine learning model toward 100% accuracy, success will come from being quality-focused and iterative.
From self-driving cars to smart hardware, Samasource fuels AI. Founded over a decade ago, we’re experts in image, video and sensor data annotation and validation for machine learning algorithms in industries including automotive, navigation, AR/VR, biotech, agriculture, manufacturing, and e-commerce. Our staff are driven by a mission to expand opportunity for low-income people through the digital economy, and our social business model has helped over 50,000 people lift themselves out of poverty.