In this second half of the interview, Wendy Ku shares her experience speaking at the Women in Data Science Conference. She also explains her definition of “fairness in AI,” why diversity is important to the field of analytics, and what she enjoys about her work.
Don’t miss part one of Ku’s Q&A, where she relates her experience with Tech’s MSA program and how it prepared her for her role as a senior data scientist at Getty Images.
On your LinkedIn profile, you say that you’re “passionate about fairness in AI.” Could you explain what that means to you?
It’s about thoughtfully choosing the applications we use machine learning for and being mindful—at every step of the model’s development—that this is human data, and there are human biases going into it.
And that’s one reason why diversity, in terms of the people working on these models, is so important?
Yes, exactly. Think about it this way: All these machine-learning solutions are trained from data real people created—so if they have any historical bias, for example, that goes into the model. Our role, as we’re working on these models, is to give our best shot to ensure training is fair and to mediate how much bias is going into it.
A model can be theoretically great, but ultimately people are using it. It’s important to remember that whatever we decide the model outputs will be, it’s going to be for people to use and not just a metric.
How do you approach this on a daily basis at work?
My team has certain goals and commitments that are part of how we work on these models. One of those is making sure we consider D&I [diversity and inclusion] impact from the very beginning. At every step, we think about ways we can reduce and measure different types of bias. So even when we’re cleaning data, we’re conscious about how sampling bias and popularity bias could creep in; we’re not waiting to consider it after the model is already trained. That said, in statistics, bias comes from user preferences, and we try to define what biases are specifically harmful given the ML application, and manage these from the outset.
What do you love about this work? What challenges you?
I learn so much on a day-to-day basis, because the industry changes so quickly. BERT [Bidirectional Encoder Representations from Transformers], an attention-based language model, came out in 2018. When I was at Tech, NLP was all about BERT and how it beat human-level performance. But now it’s all about ChatGPT and even bigger models with trillions of parameters. In my field of NLP and computer vision, this segment of data science especially evolves quickly. As companies publish and open-source their methodologies, there’s a lot we can do to build on top of current techniques.
The work is intellectually intriguing, and having the customer impact is great, because machine learning is usually so abstract.
You were a speaker for the Women in Data Science (WiDS) Stanford 2023 Conference. What was that experience like for you?
WiDS is one of my favorite conferences—it’s a mix of people in industry and academia, and the community is so supportive. It was a great audience, which made getting up on stage easier. My presentation was also live streamed, so my parents were watching, and my coworkers were commenting!
It was a little emotional for me as well, to give that talk. Three years before that, when I attended WiDS for the first time, I was job searching and couldn’t get any interviews for an internship, much less a full-time job. All my MSA friends had internships plus full-time roles lined up, and I didn’t know what I was doing wrong. But just a few years later, there I was, working in my chosen field of computer vision, and up on the stage at WiDS.
The best part of the experience was the younger people who reached out to me. There are people in this industry I personally look up to, and then I had these undergraduate students come and talk to me about my work and about how to be confident in what they’re doing.
Your WiDS presentation can be seen on YouTube, but could you please give our readers a brief overview of what you talked about?
In my talk, I walked through the process of designing a machine learning application from ideation, model training to evaluation. Using Getty Images’ Similar Images feature as an example, I shared about how building ML solutions in industry are different than in academia.
Last question: What advice do you have for people who are interested in a data analytics career?
Stay curious and adaptable, and be prepared to be a quick learner. Those are the best skills, because in data science, so much growing happens on the job, regardless of how much you learn in school. The willingness to learn quickly is important because the field is moving so fast.
Wendy Ku’s WiDS talk on YouTube