Designing Responsible AI From the Start With Wilson Pang

Artwork for podcast The Business Integrity School

Episode 12 • 14th April 2022 • The Business Integrity School • University of Arkansas: Sam M. Walton College of Business

Cindy Moehring: 00:00:03

Hi, everyone. I'm Cindy Moehring, the founder and Executive Chair of the Business Integrity Leadership Initiative at the Sam M. Walton College of Business, and this is TheBIS, the Business Integrity School podcast. Here we talk about applying ethics, integrity and courageous leadership in business, education and most importantly, your life today. I've had nearly 30 years of real world experience as a senior executive. So if you're looking for practical tips from a business pro who's been there, then this is the podcast for you. Welcome. Let's get started.

Cindy Moehring: 00:00:41

Hi, everybody, and welcome back to another episode of TheBIS, the Business Integrity School. I'm Cindy Moehring, the founder and executive chair, and we have a very special guest with us today. Remember, the topic for this season is responsibly tech savvy. It's all about tech ethics. And we have a CTO with us today, Wilson Pang. Hi, Wilson.

Wilson Pang: 00:01:02

Hey Cindy. Thank you for having me here.

Cindy Moehring: 00:01:04

You bet. Let me tell you a little bit about Wilson, not only is he a CTO, he has quite an illustrious background. And he is also the co author of our book of the semester, Real World AI. So Wilson is the Chief Technology Officer at a company known as Appen. In addition to the co author of our book, Appen is a company that has over 20 years of experience providing high quality training data, with a leading technology platform managed services and a global team. They they help other companies power their AI globally. Wilson himself has over 19 years of experience in software engineering and data science. Before he joined Appen, he was the Chief Data Officer at CTrip, which is the second largest online travel agency in the world. He was also a Senior Director of Engineering at eBay, and a Tech Solution Architect at IBM, lots of really great experience Wilson, who that I think, probably prepared you for your CTO role that you have today. So congratulations, and thank you for being here with us.

Wilson Pang: 00:02:09

Thank you, Cindy, I think really appreciate you having me here to just share the difference perspective on talk to all this future business leaders.

Cindy Moehring: 00:02:17

Yeah, I agree. Why, why don't you, if you don't mind, I love the audience to get to know the guests a little bit at the beginning. I mean, I can't wait to read the bio, but they don't really get to know you. So can you tell us a little bit about your personal journey to where you are today, which is CTO of Appen? And how did you end up there? How did you know that working in the AI space was something you even wanted to do?

Wilson Pang: 00:02:41

Sure, sure. That's, of course, I've been in the with the tech industry for actually more than 20 years now. And the last 12-13 years is all about data and machine learning. CTO of Appen now, and now Cindy you have already shared the company's really providing high quality training data to support other company to build AI. So that gives me the opportunity to really observe all kinds of different AI applications, not just for one for one company, but many company, many industries.

Cindy Moehring: 00:03:13

Yeah, that was that's a bird's eye view to a really important topic. When you think about that. I mean, you are you're seeing AI for a bunch of different companies and all different industries.

Wilson Pang: 00:03:23

Yes, yeah. And then before that when I was with CTrip, basically, I'm doing all kinds of AI for the travel industry, it's really a deep into why industry and I see how AI and data to help there. And then back to eBay, my journey at eBay, there's multiple part like I basically started with an empire for payment, then go to such size, that time, the first time for me to get exposed to AI and machine learning. AI was not even a buzzword at that time, that's probably 12-13 years ago, right? Then I also get the opportunity to lead some horizontal efforts to do the all the data solution to support the whole company, all different part of the functions like marketing and finance, like a product, all kinds of areas. I feel when I look at looking back at my career, I find myself like super lucky for few things, one is really, I I got to work on the real AI applications, like long time ago, like search sites, right? AI was not not hardware at that time, but I am lucky enough to be able to go really deep and see how AI works in one domain. Then I'm also very likely to be able to lead those horizontal efforts to support applications from different functions within our company. And then now at Appen, I can see all those, like you said Cindy, the broader view, how AI is working across different industries. Yeah, those give me a lot for different perspective there. Say I saw a lot of success and failure like, either through my firsthand experience, or through just observing how others are doing AI.

Cindy Moehring: 00:03:24

Yeah.

Wilson Pang: 00:03:26

Gave me a very deep vertical view of how AI works, as well as broader view how AI is used in different places. I think those are super helpful for my career.

Cindy Moehring: 00:05:17

Yeah, I would bet. So you are an engineer and data scientist. And I just have to ask you, just to start out, since we're talking about responsible tech and tech ethics, is it natural, do you think for engineers and data scientists to think holistically about AI in this way? Like, think about the ethics side of it, or is that something that's kind of a learned skill, like learning to ride a bike?

Wilson Pang: 00:05:51

Yeah, that's great question, Cindy, is actually a very all natural for data scientists to think holistically. Here's why. If you look at how our data scientists are trained at college, right, basically, they learn the math behind the AI model, they learn how to change a parameter of the model, how to really make the model work, all those technologies, and the theory behind it AI.

Cindy Moehring: 00:06:14

Got it.

Wilson Pang: 00:06:14

No one was really talking about ethic at that time. And then out of college, the motor industry started working in the real a real world AI problems, right? They were the peers, they spent a lot of time on data, they cared now, more than just less precision, recall all the theory metrics, but what's the user conversion rates? What's the click through rate for s? What's the user engagement? They're trying to use AI to solve a real business problem, drive business growth. That's the whole focus. Only the last few years have more and more people start to realize that to say, what's the potential damage AI can bring? If do not do this, Like, ethical way? So how to bring all those different. But this like, even now, it's still a pretty recent topic how to get it right, right.

Cindy Moehring: 00:07:07

I think you're right.

Wilson Pang: 00:07:09

Yeah. How to really like imagine the AI ethics, how to bring a different perspective into the AI team, how to get the data, right, to remove all the bias, how to treat the folks who are having the AI unfairly, how to protect the user privacy, like, those are all the phases we need to consider. Um, but we are still at the earliest database. So the fact is that more and more people care about AI ethics. And we are all working together to solve the problem. Cindy, I like your program is just another great example. Right? I'm super excited to see you bring all this tech ethics and those perspective to our future business leaders.

Cindy Moehring: 00:07:48

Yeah, yeah. Well, because, as I've discovered, and what came out in the book, Real World AI, which is written by, so interestingly, you as an engineer, and then a non technical person, Alyssa, who's a product manager, but bringing those two perspectives together to write what I would call a plain English guide, for really anyone, whether they're a data scientist and software engineer, or a product manager, or you know, lo and behold, in marketing, everybody has something to learn, I think from the the approach that you took in that book. So So let me ask you, how do you think is the best way for data scientists and engineers in particular, to hone their skills in this area? Does it come through practice? Or is it more training that needs to be done?

Wilson Pang: 00:08:40

I think the number one thing there is really to raise awareness, like your program is happening, or a book. There's a lot of effort is helped you. So AI today is penetrating to almost all the industries, and it's impacting almost every piece of our life in society, right? You can, it's really hard for me to think of any area, there's no AI. So basically, a lot of human decisions now replaced by the AI decisions. If the ethic is not an important factor to consider there, the damage can be huge. Once the awareness is there, I think the data scientists or the engineers, they can be equipped with all those different, like methodologies and tools for them to like, measure the AI ethics for to help them to build AI like responsible AI, like here, correct way. And of course, I think you hit the very important point. Training is just probably like one small part, the how to practice in their day to day life, again and again.

Cindy Moehring: 00:09:42

Yeah, again, and you're right. I mean because it changes all the time, and there's new issues, new questions that need to be asked. And, you know, it would seem to me that sometimes unfortunately, we all do learn from the mistakes that others make. And I think particularly in the tech field, there's this, you know, desire to move quickly, so you can stay ahead of the competition a bit, which may have caused some of these deeper questions about, not can we, but should we, to not be brought to the to the forefront quite as much. What what do you think, were some of those like, what are some of the main risk areas for the deployment of that AI? Where maybe we did have to learn through some bad examples. What comes to mind for you?

Wilson Pang: 00:10:30

Oh, there's quite a few risk areas of error. Why should it be up there? Um, the first one is really the AI potentially using user's private data? For example, facial recognition, right? This is a very classical AI example there to train a facial recognition, AI model how to use users like picture data. But when you train the model, do you have the right of those data? Do you have the user's consent? Can you really use this data those held to be altered before you can deploy a facial recognition model. So the first category, really AI model types use their private data. Second category is really AI and those highly regulated domains. There's different laws from different country, for example, in US, education, public accommodation, like hotels, housing, employment, those are the areas protected by the law, you you cannot make, how any discriminations there against gender, age, religion, etc, right. So those are protected human rights. And now you can see there's a lot of new AI models making those decisions in those domain. And we have to make sure those AI models don't have those discriminations there. So that's the second category. Last but not least, there's also a lot of AI builders on controller data. Probably you will have heard of, there's a lot for the called a big language model, like a GPT-3 is super powerful. You can use the model to write an article, you can tell it's from AI or from people, you can use the model to find answers for you from internet, you can even use the model to write a program to build a website for you, super powerful. How those kind of model is trained, they use almost all those texts from internet, they can access to train the model. Meanwhile, you can imagine all the texts from internet, there's a lot of bias, like from those texts too, right.

Cindy Moehring: 00:12:31

Just, right, written by humans. And we all have an implicit bias that well, yeah.

Wilson Pang: 00:12:36

Yeah. And those are learning into the model. So you have to be aware with those own cultural data, there's a bias how you can remove those. So be aware of those, like putting your AI bias and put some measurements in those areas is super important.

Cindy Moehring: 00:12:53

Yeah, it is. So let's, let's let's now get down to brass tacks, if you will get really get really practical now that we've identified some of those risk areas. And I'm just going to ask you, where do you think the responsible AI journey should begin for a company? Like it, Obviously, it begins with design, but let's just talk about that for a minute. Like, what should the design phase really look like? And who should be part of the team and all of that, like, how does the company get started?

Wilson Pang: 00:13:24

Yeah, that's, that's a great question. I think a lot of people have a wrong impression, like AI team probably only consists of data scientist.

Cindy Moehring: 00:13:31

Yeah, I know.

Wilson Pang: 00:13:33

Like PhD, like they have different knowledge around machine learning. But that's not true. In reality, I think the team need to help people from different perspective, I normally you need a people who really understand the business problem, the AI is only useful way to solve a problem there, right? It can be a product manager, it can be an SME in an areas. And then you need to, like come to the data scientists who can model the business problem into AI problems. And then you need like data engineers to help you to process data, other software engineer to help you to deploy the model to production and build a service on top of that. And if the all those people, especially the product manager, SME, data scientists, they should really keep AI ethics in mind, otherwise, that will cause a problem, right. And meanwhile, for the risk area we discussed earlier, if those areas are involved when you build the AI model, well, normally, I think, the legal team is also part of the discussion in the design phase.

Cindy Moehring: 00:14:39

Should be, right, and probably HR if you're talking about any of those kinds of risk areas with personal information about employees. So so we talked about the diversity of the team, in terms of different perspectives. I think obviously also just diversity in terms of the individuals around the table would also be important, but what are the kinds of questions specifically that you think should be asked in the design phase by the team to make sure that responsible and ethical AI is top of mind?

Wilson Pang: 00:15:13

Cindy, I really like the way that you are thinking here. Also, I write a question is always the first super important step to get the result you want, right? Here are some questions, I always ask where we touch any AI product. Who are the potential users of this AI product? Does the AI product perform the same way against different group of the users? Do we have a good way to measure not only performance, but also like fairness, right? And also, like for certain errors, do you have a lesson if it doesn't perform as expected? Do you have a safe net? Or like a backup plan? Can you kill the idea? Or can you use a different approach? So I think those are the key question to ask in the design phase.

Cindy Moehring: 00:16:04

Okay. All right. So it sounds like a lot of effort needs to be put into design. I'm not sure all companies have always spent enough time there. Because there's this rush as we all loaded to get things done quickly. But if you ask the right questions at the beginning in the right way, hopefully you can get it right and design. And then after design, I think would come, what modeling like so you get a prototype essentially put together? But what what happens in that phase? We know it requires data. And we know that the data that gets fed into the model, we need to again, consider ethics. But we've already talked about the fact that we all have implicit bias, right? So and you mentioned the bias is going to be in the data itself, right? If you've got an AI looking for. So it just seems like this conundrum that you can't resolve how do you build a model? And ask the right questions to ensure that the model that you're building isn't biased, or at least isn't exacerbating bad biases? Maybe just how do you make it better?

Wilson Pang: 00:17:11

Mm hmm. Um, I think absolutely right, accurate data, gather data right is the most important step due to any real world AI models. We all know like garbage in garbage out. Right? And also, if the Data Wise, the model has wise, exactly, yeah, there's a two major categories of questions to ask when it comes to data. The first group is really about the data itself. Do you have the right data? Do you have the fair representativeness of different classes of the data? Let me give you a real example to bring this to life. Yeah, I want to build a very like simple AI model to classify the tweets. Is this tweet a positive tweett or a negative tweett? Right? Simple model, I get 1000 tweets as my training data, then I look at the data, 900 are for male, 100 are for female. Clearly, you don't have enough to represent, representative for female tweets, right, you need to fix that, let's fix that. Now I have 500 for male 500 for female, there. Then, I get people to label those tweetss positive or negative. For those 500 for a male, I say like 400 are positive, 100 are negative, clearly, like the negative representative is not enough. It's imbalanced. Like those are the examples like, you can see the data is wrong and the wrong data to create problem for your model. And in the interest of logic, bias or fairness, probably smaller. So back to the first group where you have to get the data, right, you need to look at the class imbalance, the label imbalance and all those stuff there. That's important. The second group is really about how you, you gather data, how you use data, not the data itself, but how you collect the data, how you use the data, you need to ask, like how the data was collected? Do I have the consent from the users who give you the data? Does it hurt you the privacy? And also if you're getting people, if you look at the AI ecosystem, it's not just data scientists, parent vendors, right? We also have a group people who are having to label in the data collecting the data. Those are the people normally you don't get not get like super high pay as data scientist. How can we make sure those people are also paid fairly? How can we make sure we are also cared those people's abilities? So that's the second group. Basically, how you, the way you collect data, the way you use data, there's a lot as to continue to make the ethical tool. Okay. Okay.

Cindy Moehring: 00:19:46

So, one of the things you mentioned earlier was the measurement. And I would, I would imagine that once you design it and you model it, you then want to do some measurement of your product. prototype your model before you just put it into production. And so that's probably comes out in the monitoring of it, I would imagine of the model, but how do you measure it? How do you actually just figure out if you are measuring not just performance? But how do you measure the the ethical aspect of an AI? Model?

Wilson Pang: 00:20:23

Yeah, I think this is the essential part of people to understand to make responsible AI. Because to me, there's no such thing as ethical metrics. The ethical metrics is accurate the performance metrics by different dimension.

Cindy Moehring: 00:20:40

That's interesting. Yeah,

Wilson Pang: 00:20:42

why don't see that. Give you an example, let's say I was recognition AI model, how the model is measured. There's typical metrics called word error rate, essentially, it is, as it harms many words, I recognize not correctly, compared with the whole population, I speak one sentence, that's 10 word, the engine recognize two wrong, then the word error rate is 20%. That's a performance metrics, right. But how to measure the ethical part for voice recognition engine, you probably want to measure that whatever it against different age group, different people with accent, different people with standards, etc. There was a report published, I forgot maybe a year ago, talking about all those major voice recognition engines. So the their word error rate is very low for like a normal English, or let's say, for maybe, like a yo, yo, whatever it's from the voice recognition engine is pretty low. But for people with accents, rate, whatever it is pretty high. I think that's how you really using the performance metrics for measured and against different dimension, different group people. That's the way you can detect the potential bias or issues.

Cindy Moehring: 00:22:01

Oh, wow. So you really do have to think differently than just what's the math behind the model that we want to, you know, build, you have to like, then look at the output, and then consider all these other questions, hopefully, before you put it into production. So that sounds like a whole lot of cross functional teamwork to me going back to the design phase, and we talked about, you know, the team that needed to be put together. And I and I would imagine, based on the first question I asked you about, is it natural for engineers and data scientists to think this way? And the answer being? Well, no, it really isn't. I would imagine that on this team, you've probably in your experience, had to deal with some call them non adopters, people who just don't understand aren't aware, or maybe don't really care about responsible and ethical AI. So if you are faced with a team member like that, what, what's your advice to others about how to bring them on board and get them to come along or deal with that situation?

Wilson Pang: 00:23:05

Yeah, semi accurate is my personal experiences. equidad cases are really rare. Most of the AI buys are introduced by like, all intentionally, once people, right know there's a potential bias, the damage they can bring by not consider the AI ethical part, they will like, really invest mixer, right? That's Maderna the case. I think the case really to increase the awareness, and also give people the tools how to measure.

Cindy Moehring: 00:23:37

Got it, got it. And if you are leading a team like that, and you kind of sense that maybe somebody just wasn't really getting it, maybe it's just a matter of making sure that you or another team member is raising the right question. So again, the person can learn more by doing and see that asking these kinds of questions is is a normal part of the process. And something that kind of has to be done before you can control it out? To your point probably wasn't happening 10 years ago, right. But, but then we had some very, you know, famous mistakes, whether it was you know, the Amazon hiring tool that got it wrong, or the facial recognition, or you know, and then people Sutter sort of stepped back and thought about it a little bit. So let's talk about deployment. We've talked about, you know, design and and then you have to build your model. But at some point, you do have to get to deployment. And usually that's pretty quick in today's day and age. Are you done really after you design it and put it into deployment? Or is there more that that has to be done after you deploy it and what I'm thinking about here in particular, I'll just give you an example. Like the Apple Pay card that got designed, and arguably, it was giving men more credit than it was to women, even though their backgrounds could be you know, 100% the same and equal. And from what I understand, unfortunately, when some of the cold started coming in to the call center Like, they weren't prepared for those customer service questions. And their answer was, why don't you know? That's that's the model we had built. I mean, the AI decided it. So how important is explainability? In transparency? In this process, once you roll it out?

Wilson Pang: 00:25:21

I, it really depends on the use case. Okay? A lot of times AI works like a black box, right? It works, but you don't know how it works. It's okay, if certain use cases, let's say you build an AI model to predict advertisement, the click through rates. Price, okay, right. So as people click More, you don't really care how the model works. But if you build a model, let's say if you build a model to help doctors to diagnose this disease, or maybe have the back to your, your example, have people to approve a credit card application or not. You have to be able to explain how the AI works, why someone gets approved, why someone ought to get approved, right? Otherwise, like people will lose trust there. And also, I could bring back a good example of how I learned this is really a hard way, that's probably 12 or 13 years ago, I was always eBay leading a small search science team. So we build a lot of machine learning model to have to rank the product help people find the product they want. We build the product view, the model, and the user conversion increase, we were super happy. But then we get a phone call from like our customer success team. A lot of sellers, asking them, why my partner used to rank within the top 10 on the search results page. Now, second page, third page, the way I look at that is hard problem. We didn't know we know it works. We didn't know why. Then there's a few months of effort to build a tool to explain how the model works to show people how like the ranking factors. Maybe you need to make the title right, the picture might not be blurry, like all those kinds of deciding factors, like what we show those, then that helps the customer success team to have the sellers to make better listing. And also it's sure I got some the gray hair set and time. That

Cindy Moehring: 00:27:32

yeah, but but it was also going through that that exercise I would imagine also helped to show that it wasn't bias decision making, it was actually very valid factors like with a blurry picture, or you didn't have a good description. So you know, giving that kind of information when you're able to explain it and be transparent. Probably helped a lot of sellers really gain their trust back in eBay. But if your answer just simply would have been? I don't know. If you wouldn't have gone to you know, go figure it out. So you could explain it to him. I think that would have exacerbated when a lot of people feel about AI still today, which is distrust, right? And how do you think transparency helped in that situation? So let's just stay with the example that you provided, once you were able to be transparent and explain it to the sellers? How did that have that help? Did it? Did it turn the tide? Did it cause them to have trust to get an eBay and continue to work with the company?

Wilson Pang: 00:28:33

No, really, actually, it's, it's on the other side. Basically, once you have that transparency, you can not only help people to build a trust or AI model, but also to really encourage the right of the hearers back to the eBay example, right? I think the seller now they know they need to have high quality pictures, they know that they know they need to make sure the description of the like the product title needs to be accurate. They also know like their shaping performance matters. And also if there's any, like a buyer dispute with them, so that will also cost their item record lower than other people then actually in turn help them to inquire a lot of good behaviors. They not only trust the AI model, but also improve the picture like make better service ship faster, which actually going back to give the buyer a much better experience. Yeah, this is this kind of transparency helps not only the trust but many other things.

Cindy Moehring: 00:29:34

Yeah, yeah, I think you're probably right. Okay, so this is going to lead us to a really hard question here. How do you square everything we just talked about which takes time and got to work cross functionally and you know, anytime you have a group of people can slow things down someone say How do you square all of the design and the and the deployment of models with this need to create minimally viable products and the need for speed, some would call it get that technology out fast. Those two seem to be kind of incongruent ideas. So how do you square all of that?

Wilson Pang: 00:30:14

I personally, I'm a strong believer of MEP concept there, you always shaping a minimum viable product to the market quickly. I think that's the only way you can collect real user feedback. Instead of a lot of people spend a lot of time just thinking what a user might like. In reality, the user reaction can be very different from what else you think there. But you're right, then also bring a AI fairness problem, if you just ship the product move fast without considering the ethical part. Yeah. If people read them our book, right in the beginning of the real world AI book there, we share an example of IBM Watson computer vision API development example there, the team moves really fast. They build the model, they ship them out. And then we even without understanding, like the training data, so what are the tags? What are the different classes for those images there? They create a problem. Like for a picture like a VR tear wheelchair, you use a pretty bad tags, flags for the VR chair cause a big PR problem. That's a example like you do ship product fast, but without considering the AR part as a part, right? Because of the problem there. Back to the MVP, and also how to square get this right. To me, I think the case the definition for why boils back to the performance fight, right, like, measure the performance, but also should we consider AI buys? You know why? By definition? The answer to this? Yes. Especially when you build AI in this high risk area we discussed earlier. Please move fast, and move out consider AI bias as

Cindy Moehring: 00:32:03

that makes a ton of sense. So back to measuring performance at a minimally viable level, right back to measuring it on different dimensions. It it all, all you have to do is change your mindset about what is considered viable, and kind of open your mind to viable does it mean just getting over the, you know, the the click through rate or the golden to True Performance methods, metrics, but you think about performance metrics and viability for those differently. Right. Yeah, and more holistically, that's a really good way to think about it and what isn't an either or, it actually is all together and is just one, I love it. Okay, well, son, I got to ask you, I know you're a CTO and you are steeped in this area. But you must have some places you go to for inspiration to learn more to continue to keep your skills relevant. And I love to leave the audience with recommendations. So where do you go? Or what would you recommend to the audience? If they want to deepen their knowledge of this area more beyond your book? Would it be? Is there a good podcast series? Is there a good documentary you could recommend another book? Anything? What do you what do you what where do you go for inspiration on this?

Wilson Pang: 00:33:19

Yeah, sorry, I think yeah, ethic is a big topic, right? We probably can continue this type of discussion for another day on this topic. Oh, no. Yeah. If the audience want to learn more about this topic, and the recent developments in this field, there are a few tech blogs are highly recommended. So these are all from the speaker players. So Google published their AI principles. If you search Google AI principles, you can find their blog, there's a lot of deep consideration how to make AI right? In all the different perspectives. They're similar things from the Microsoft AI blog, if you search, resourceful AI, Microsoft, you can find a blog. And the last part, I really like a lot, it's really the AI blog from Facebook. So they have their five pillars, five pillars of responsible AI. They not only give you like, why is this important, how to measure this, they even give you some tools or some open source software to help you really make it real in your project. Those are the places I think if people are interested, they can learn a ton from those three blogs.

Cindy Moehring: 00:34:34

Those are great. Additionally, they're obviously going to be very practical. They come from the business world, they come from some other big tech companies that are trying to iterate and get better themselves. So those are those are really great recommendations. Wilson, thank you. You've been very very gracious with your time and with your wisdom and everything you've been able to share with us authentically about your own journey and even some mistakes along the way that you We're able to and others to learn from. So I can't thank you enough for spending this time with us. I love the book real world AI, it is a fabulous practical read on how companies can implement it effectively. So thank you for writing the book. And thank you for being the guest here today.

Wilson Pang: 00:35:18

Thank you. Thank you. It's really great to be here. And thank you for your awesome program to get the mobile leaders to understand the AI ethic part.

Cindy Moehring: 00:35:25

Yeah, you're welcome. We're on this journey together. All right. Thank you.

Wilson Pang: 00:35:31

Thank you, Cindy. Bye.

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube