December 19, 2019
17 Minute Read
McKinsey Global Institute shared that among the top 5 limitations to adopt AI, two common challenges are labeling training data and obtaining datasets.
In this interview, Frédéric Ratle, Head of Artificial Intelligence at Samasource answers frequently asked questions about AI and machine learning.
Ratle also shares AI trends you can expect to see in 2020, how Samasource helps enterprise organizations overcome training data challenges, and his thoughts on how AI is shaping the world around us.
(00:04) Hello, and welcome to the Ask Me Anything series by Samasource, where we interview subject matter experts working in artificial intelligence. I'm your host, Sharon L. Hadden, an AI enthusiast and content marketing manager at Samasource.
Frédéric Ratle is the Head of Artificial Intelligence at Samasource. He brings 15+ years of R&D experience in machine learning, AI, NLP and computer vision to his role at the company. Frédéric holds a PhD in machine learning, has published numerous research papers and has experience bringing products to market across multiple industries, including healthcare, automotive, consumer electronics and retail.
(00:56) Hi Frédéric. Thanks for joining me today. A little bit about my background, I worked at NVIDIA for a number of years, and it was my first intro into machine learning, deep learning, artificial intelligence. The number one question we were always asked is what's the difference between machine learning and artificial intelligence?
(01:21) It's a really good question. Well, machine learning specifically is concerned with algorithms that can efficiently learn from data. For example, building a classifier that can learn to distinguish, let's say lion images from tiger images based on a set of labeled images is a typical machine learning problem, but it's a subset of AI because AI also includes approaches that aren't data-driven, which I refer to as symbolic approaches. For example, rule-based systems or knowledge or ontology based systems.
Those are AI, but they're not specifically machine learning. Knowing this, the distinction has been a little bit blurred. So we tend to call everything AI. Another difference is that AI implies some kind of goal of mimicking human intelligence, while machine learning clearly is in engineering territory and really aimed at building data-driven decision making systems.
(02:18) That's a great way to put it, Frédéric, and where does deep learning fit into the picture of all of this?
(02:26) Right, so deep learning is a part of machine learning. It's a class of models in machine learning so it's also concerned with algorithms that can learn from data. But the difference with other kinds of models is that it's specifically concerned with so-called deep architectures. These are a model that stack multiple layers of representation and that is believed to lead to the model being able to learn more meaningful features as opposed to traditional shallow models like support vector machines, for example.
It's a field that has existed for a while actually. People have been trying to use deep neural networks since the 80s, I think, but it only started gaining some traction about 10 years ago, mostly because researchers have found mathematical tricks to optimize those models in an end-to-end way. And also thanks to the increasing availability of computational power, so really cloud computing.
(03:26) Anytime I'm doing research around the history of AI, it's remarkable to see how long we've been using it, but have only just arrived at being able to maximize the technology. Let's talk a little bit about AI versus general artificial intelligence.
(03:46) What we describe as a general artificial intelligence, sometimes is also called strong AI. It's the idea that a machine would be able to be trained and then, it would be able to learn any task that a human can learn.
It's more of an academic notion really, and I believe we're very far from that because we lack a lot of scientific knowledge about many mechanisms that underlie human intelligence and reasoning. The other kind of AI that you mentioned is the one that's been most successful, in my view. And people in academia typically refer to it as narrow AI.
It's the ability to really, for machines to mimic your very specific cognitive ability that's normally associated with humans, like speech for example, and in a way that is useful to us. This is mostly on the engineering side and most AI work in industry fall into that category. But that being said, a strong versus narrow AI or AI versus general AI, is really more of an academic debate in my view.
There's an article that was written last year by professor Michael Jordan from Berkeley, which I really liked, where he presented a framework to categorize different kinds of AI work in a way that is meaningful. I really encourage our listeners to take a look at it. It's called, Artificial Intelligence—The Revolution Hasn't Happened Yet. The three categories that he outlines, there are first what he calls the human imitative AI, which is mostly an academic field of work where people try to build an intelligence that somewhat resembles that of humans.
The second category is what he calls intelligence augmentation. And this is really an engineering domain that aims at augmenting human intelligence with things like web search, machine translation, and these are things that that really changed the way we interact with information.
And the third category is called intelligent infrastructure. So everything around internet of things, sensors—these are basically systems that capture information about the world and try to make intelligent decisions based on that.
(06:08) That really sounds like a crash course on AI, the way you've described it. I would love to know if you have any just examples of technology that isn't artificial intelligence. I know when you were describing kind of the difference between machine learning and AI, you talked a little bit about rule-based systems and I think often in movies media, AI is depicted as you know, anything smart is AI. Could you just share a few examples of technology that isn't AI? Maybe some things that are even commonly mistaken as AI?
(06:49) Actually, I don't really want to single out a particular field because I think in every field there is room for research and development around something that is AI. I do want to point out that many fields of research and many fields of engineering are now called AI because as you say, the media talks a lot about it.
Even practitioners in those domains are starting to talk about AI, but those domains often have a history of their own and most importantly, they have challenges and goals of their own. If you think, for example, the fields like data mining, like operations research, like control theory, even some parts of statistics for example.
I don't think it's necessarily useful to call them out as AI in that sense because we can easily lose sight of what the goal of those disciplines is. For example, if you look at operations research, its goal really is to solve some very specific optimization problems in business. But if you call it an AI, it kind of blurs the notion of what the goal of that field is.
(07:50)Well, thanks. Thanks for laying that out Frédéric. I'd love to talk more about your work at Samasource, specific to SamaHub, our training data annotation platform.
(08:02) Humans are much better than machines at recognizing and judging complex situations and use cases. SamaHub is a human-in-the-loop labeling platform where our customers can upload their data and receive annotations. We really want to make this platform more intuitive and more helpful to AI practitioners, in terms of tools that are available to slice and dice data sets, for example, to sample data sets for labeling. Because sometimes you have huge videos, but it's not exactly clear if you need to label the whole of it or whether you need to apply some smart way of just picking the frames that are important.
We support many types of annotation formats within that platform, but there are many features that are in the making, in my team, that will soon make their way into production.
(08:57)Thanks for really pointing out that human-to-machine interaction. I think it often gets lost that humans help train AI. So thanks so much for pointing that out. Of the features that you're working on within SamaHub, how do you see that technology benefiting our customers at Samasource?
(09:21) So things like object detection and image segmentation in computer vision should really be an integral part of a data labeling solution. I think that what's really at the top of our team's list, in terms of priority is better quality in terms of annotation, and better efficiency. So that's really what we're looking to achieve. I believe customers can benefit from even better quality of our in-house labeling and also our ability to take on larger sets of data in a matter that is efficient and scalable. It's about being able to scale and also preserving the quality
(10:06) In terms of limitations around AI—I read so much from McKinsey Global Institute, and I think it was last year, they shared a report of the top five limitations to adopt AI. Two of those common challenges are labeling training data and then obtaining datasets. How can Samasource help with this?
(10:28) Of course, Samasource as a training data provider can of course help with this aspect, more profoundly, as machine learning and AI gain importance in technology, but also generally in our lives, so will the importance of those data sets and more specifically how those data sets are gathered, if you will.
So I think that beyond the core labeling services, providing expertise on data acquisition and data labeling not only in technical terms, but also from a social and ethical perspective is really essential and will be increasingly important as society evolves, and also as regulation evolves.
(11:13) Well, in your opinion, Frédéric, how is AI shaping the world around us?
(11:20) Oh wow. That's a really good and open question. I have a few ideas about it. Uh, of course, this is only a subset of all of the ways that I think the world will be shaped. But I think it's shaping the world in many ways and whether those ways are positive or negative are really dependent on how we make use of those advances.
So first from engineering perspective, I think it's really pushing the boundary of many fields that we thought really were more the realm of humans. For example, if we think of speech recognition, machine translation, and also computer vision, advances in AI have rendered those systems very close to human capability.
While maybe 10 or 15 years ago that technology really wasn't there yet, and using these systems really felt still cumbersome. So in that sense, I think it's changing the way we interact with technology and the way we interact with information in general. From a social perspective, I think it's pushing a lot of automation and decision making, in ways that can be both good and bad.
I think it can be very good because machines are much better than humans at making very large scale inferences and taking a lot of factors into account when making that decision leading to, you know, high quality decision systems. It can also be very bad, because if we give too much control to these systems without any human oversight, it can be dangerous. And, the nature of the data used to train these models will ultimately determine its performance.
Also, if you look at a system that's working in a given context and that's been trained in a given context, it may have unpredictable behavior if we slightly changed the context. So that can be also very dangerous.
And finally, I think it's of course changing the dynamics of the job market. And the impact that will have on society is I guess in the realm of politics more than technology, but you know, how do we make the greater number of people benefit from the advances that are provided by AI.
(13:40) Frédéric, 2020 is just around the corner and lots of reports are coming out on the state of AI. Are there any trends that you want to call out that we could expect to see in 2020 regarding AI?
(13:58) There are many exciting fields right now that are progressing very, very quickly in AI, but I'd like to call out three things.
I think in the last one or two years there's been a lot of progress in natural language processing. I think what happened in vision, you know, around 2015, 2016 is happening now in NLP because the accuracy of models is increasing quickly. So I think we can expect some more progress in that area in general.
Of course human communication is not only based on words. I think to reach a really human level of understanding of semantics, we'll have to integrate other modalities like nonverbal communication, visual cues, etc. But there's still room for progress, just based on text.
The second thing I really find very interesting is causality, so the study of cause and effect between different co-variants and explainability, the ability to explain the why it model is making the prediction it's making will keep growing as a topic of interest. It's actually growing very fast right now, and on another level I think that another area that is a really, really interesting is lower resource machine learning.
So models that can function with energy constraints or size constraints. Because as we grow more conscious about the energetic and environmental impact with machine learning, there's been a couple of articles recently in various newspapers, so models that require less computing will become more popular.
This is overall a pretty positive trend I think. Not only because it's more environmentally friendly, but also because I think it puts less of an advantage on very large players in the industry that have an unlimited access to computing power.
(16:01) For sure. So, are we talking at all about AI at the edge?
(16:08) Yeah, we're seeing that a lot. On a lot of devices you can see that you have models that you run on the platform, and I think that's something that will only grow because there will also be more regulations pertaining to data exchange.
(16:24) For sure. For sure. Well, is there anything else that stands out to you regarding the state of AI or any challenges with AI adoption?
(16:34) I think there's a couple of things, but just to call out, two of them, you mentioned earlier the data labeling aspect that is difficult for organizations, but I think it goes deeper than that.
I think even the raw data really is an issue because very often in various organizations, depending on their level of technical savviness, the raw data, if available even is often distributed across different departments and various formats, you know—Excel sheets, SQL databases, etc. So it's very difficult to actually take that data and even just do anything with it. I think that's a very big challenge that organizations really need to tackle.
The second aspect I wanted to call out is also the increasing debate on the use of data in many applications where massive data is collected from users. I think it's a very welcome debate because we're all aware of that regulation has somewhat been lagging behind in that respect, and it has led to a number of problematic situations. But, I think it will be very interesting to see how will the technologies be impacted by all of the new regulations with respect to data ownership and privacy.
(17:50) It's been incredibly eye opening talking with you today, and I think my last question for you is just, what do you love most about working in AI?
(18:02) Most definitely I think it's the ability to constantly work on new problems, and also the ability to really work with really smart and talented people all the time. I also enjoy really shaping a whole new engineering discipline. I think that's very exciting.
This interview is the second installment of a new audio blogging series titled, "Ask Me Anything," where Samasource interviews subject matter experts working in artificial intelligence.
Sharon is the Content Marketing Manager at Samasource where she's responsible for telling the story behind the company's impact sourcing mission and human-powered training data solutions. Sharon holds a MS in Integrated Marketing Communications and is passionate about helping social enterprises transform abstract concepts into results-driven marketing.