AI-Powered Hearing Aids: Transforming Lives Through Tech
Andrew Song of Whisper AI discusses pioneering AI in hearing aids, improving auditory experiences, tackling industry challenges, and personalized healthcare advances.
File
Season 2 Ep. 14 Andrew Song of Whisper AI on building better hearing aids with AI
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: The five senses, touch, sight, smell, taste, and hearing. They are critical to how we perceive the world around us. Without them, we are lost. And as we grow older, they tend to weaken, particularly when it comes to sight and hearing. Today's guest, Andrew Song, is the co-founder and CEO of Whisper AI, and he's on a mission to help give people great hearing, regardless of their age. A mathematical and computer science graduate from the University of Waterloo, Andrew is pioneering the practical application of modern artificial intelligence in hearing aids. Welcome to the show, Andrew. So great to have you here with us.

Speaker 2: Thanks for having me. I'm really excited to be here.

Speaker 1: Yeah, so nice to have you on. I've actually known about your company for several years now. I remember in the early days, one of your co-founders, Dwight Crowe, would actually stop by OpenAI sometimes, and we would chat about what y'all were building. And it's, I mean, of course, it's been quite the journey since because that's three, four years ago. But let's go even further back. Where did you grow up?

Speaker 2: I grew up near Toronto in Canada, in a little steel town called Hamilton. I think Hamilton, when I was growing up, I like to tell Americans, it's like the Pittsburgh of Canada. A lot of steel industry, and then, you know, maybe not so much anymore.

Speaker 1: And how, growing up there, did you get excited about technology and from there, AI, hearing aids? What was that like?

Speaker 2: You know, I think growing up where I grew up is, it's almost hard to think about from where I am now, because I get to live in the Bay Area, I live in Silicon Valley, I get to see a lot of new technology. But you know, where I grew up, that wasn't really a reality for me, you know. One of the things that I think I was very lucky at a young age to kind of have two influences on me that steered me in this direction. One of them is my mother, who worked a lot with computers and has a degree in physics. And because of what she worked on, we were able to have various computers at home as part of her work that I was able to tinker with. And I remember, I have all these memories at a very young age, when our classroom got a computer, that I would be the IT person, because I was the only person that had ever really worked with a computer in any substantive way, right? Other than maybe playing a game, playing Solitaire or something, you know, beyond playing Solitaire, I was the only one who had ever typed commands into MS-DOS, right? As someone in grade three. And I think as a child, that gave me maybe a lot of confidence, you know, an area of expertise insofar as any eight-year-old has expertise. Maybe it gave me an area of expertise and I wanted to learn more. And a few years later, one of the big influences for me was actually Bill Gates's book called The Road Ahead. I encourage a lot of people to read it, with the memory that it was written kind of in the mid to late 90s, and in that book, Bill Gates talks a lot about what the future will look like. And you can imagine as an 8, 9, 10-year-old, 11-year-old, 12-year-old reading this, it really sounds like you're reading a science fiction novel. But what was, that was really motivating for me, that was really a vision of the future that I was excited about. But as I grew older, one of the things that I was most impressed by that book, and one of the reasons I revisit it, was how accurate it is. You know, it's one thing to have predictions about the future that you, you know, you talk about over a coffee or a beer or something with your friends, and then five years say, oh, I was right, you cherry-picked the ones that you're right about. It's really another thing, I think, to write a book saying, okay, in 10 years, this is what's going to happen, here's how we're going to communicate, here's how live video streaming is going to work, and then, you know, wake up one day as a university graduate and find that like 90% of the book worked out. Bill Gates has a unique view of the world, so you can see that. And so that was really inspiring.

Speaker 1: After graduation, you worked for a while at Facebook? That's right, yeah. How do you make the jump from your position at Facebook to starting your own company and specifically to want to build hearing aids? What inspired you to do that?

Speaker 2: You know, one of the things I really discovered about myself at Facebook is first that I really loved working on products that help people communicate or be with other people, be more social. I think that's one of the things that really connects a lot of the different work I did at Facebook and with hearing aids. But for me, and, you know, for my co-founders, for Dwight, as you mentioned, and our other co-founder, Shlomo, we really each had our own individual stories about how hearing loss affected our family and affected their quality of life, and I think that's one of the first learnings that anybody who experiences hearing loss, either as for themselves or through a family member or friend, someone they love, one of the things they realize is that hearing loss is actually, in some sense, it's very little about hearing, funnily enough. It's of course about the sound and, you know, the decline in the hearing system and all those hearing functions, but hearing loss is, you know, connected. There's a reason hearing loss is connected to a higher feeling of loneliness or a higher feeling of stress or a higher risk of dementia. It's because of those interactions that you're not able to have when you have hearing loss, and I saw that through my own grandfather. He's a really big inspiration for me. I think about him a lot, and when I think of, and I think what's really great about Whisper as a company, when you talk to individual members of our team, when you meet employees, everybody sort of has their person in their mind. For some people, employees, it's themselves. For some people, maybe it's a grandparent or a parent. For some people, it's a friend they knew growing up, and I think that connection is, for me, where I really knew that this was a great opportunity to build a better product and ultimately a great business.

Speaker 1: Yeah, I've definitely noticed it myself. I mean, if somebody has a hard time hearing, it's not just, I mean, often in one-on-one conversations, it kind of still works because you work on it, but in group conversations, especially, they have a really hard time actively participating or even keeping track of what's going on, unless the group is really carefully paying attention to it, and sometimes people won't know. I mean, it's complicated.

Speaker 2: Yes. One of the, if you don't mind me sharing, one of the first stories that I still hold on to, I really remember from the early days of the company, we were interviewing lots of people who had experienced hearing loss, people who used hearing aids, people who didn't, just trying to understand what that experience was like, and a woman shared this story with us about how hearing loss was affecting her. She was maybe a mid-50s. She had colorful hair, dyed hair, very colorful. I remember she loved having purple hearing aids, and she said the biggest impact before she got hearing aids and when she knew she had to do something was, she was out at dinner with a friend, two or three friends, two or three of her girlfriends, and somebody told a joke, and she didn't laugh, and she didn't laugh because she couldn't hear the joke, right? And the other friend laughed, and then it was, you can imagine that awkward situation, how awkward that must make you feel, and how isolating that must make you feel. But then for her, this actually went a step beyond, because that friend actually didn't connect her not laughing to a hearing loss issue. She connected it to a personal judgment. So that friend became actually a little bit upset at her. You can imagine if you tell lots of jokes and one friend's not laughing, you're going to get a little bit annoyed at them. So then she found out that this friend was upset through these other people that she was having lunch with, and now she has to, now she's already embarrassed enough. Now her friend is upset at her, she has to go talk to her friend, and she has to talk about this hearing loss, this medical condition that she's trying to not admit she has and hide. And we all, especially in the modern era, have different stresses that we all deal with. Imagine that on top of everything else that you're doing to live life. I think that's an incredibly human, a challenging kind of human situation to live with. And it's one I think a lot about for how our work is important.

Speaker 1: Now, you and your co-founders noticed that using hearing aids at the time aren't good enough to get the full experience of proper hearing. Why did you think it was possible to change that? I mean, presumably there's a big industry that's supposedly trying to make the best possible hearing aids and have a lot of money going into those efforts. Why do you think, hey, actually, we can do something different here?

Speaker 2: What's funny is I think our starting point was maybe a little bit more humble. Maybe we didn't think we could, but there's a few glimmers that that's maybe worth looking into. And that's maybe where deep learning and neural networks and AI, whatever the term of the day is to describe this body of work that's taken over the world. That's where you started to see first in research and then as we developed it more and more in reality, what was possible. And I think what really motivated us in 2016, the research, the basic science, the basic academics around how neural network models, machine learning models could be used to improve hearing. Some of that research was published. But when you talk to industry, the industry insiders, the people who are responsible for making technology decisions and hearing aids, the conversation would sort of be like, well, that's all very cute, very, very, very cool. We have a far out 20 year research arm that's looking at that technology, but that can never work in a hearing aid today. And sometimes they would talk about maybe the amount of processing capability that was needed. And there's no way to put more, you know, you're not going to strap a laptop to somebody's head. So that's never going to work. All the people just had a lot of skepticism about the fundamental algorithms. So, you know, we got into this interesting discussion once about how the height of the sound source might affect the models. Things like, OK, I'm new to this. OK, that's interesting for us to think about and see. But it was a lot of, you know, at some point when someone tells you enough, people tell you things aren't possible, but there's a good fundamental result underneath it. You start to get a little bit skeptical. And I think that skepticism eventually turned the tide for us and said, well, you know, this is important and there's a there's a reason why people need this. And I think that there's actually a really big opportunity here. And then you go figuring out whether you can make it possible.

Speaker 1: How does a regular hearing aid work? What is it? What does it do? What is the device that people put in their ears? And then from there, can you help us understand how the whisper hearing aids work?

Speaker 2: Yeah, a regular hearing aid, not to oversimplify too much, but in many ways, a regular hearing aid, it's kind of like if you use some people might be familiar with an equalizer. Those people who might aren't familiar with an equalizer. One of the ways I like to explain it is if you used Winamp or Windows Media Player growing up, which you have to be a certain age to use, I understand maybe not all of the listeners have used that, but go take a look at Winamp on Google. A hearing aid is basically a compressor, so it makes things louder, but tries to make loud things not too loud. And then all of the equalizer dials that you saw on Winamp. So, you know, on Winamp, you could say I'm listening to classical music and it would change the dials a little bit or you could change the dials yourself. And there's there's sort of that type of process inside of a hearing aid. And then more advanced ones have a little bit more sophisticated noise reduction or maybe a directional microphone systems. But it's fundamentally based off of a low, low power, low compute availability system which are running very constrained kind of signal processing algorithms. That's really that's really what's important. And the reason that's the case is we want the devices to be small. People want hearing aids to be discreet. You want a long battery life because you need to wear them all day. You know, if you think about wearing AirPods for five hours, that feels great if you're listening to music, but it doesn't even come close if you're wearing a hearing aid. And you need very, very low latency because you're getting a direct path of sound from somebody talking to you. You're seeing their mouth move and you're getting amplification all at the same time. And so the latency constraint is a lot more challenging than compared to, you know, Bluetooth headphones and watching a video because there's no direct path and things like that. So you're really constrained around those kind of this classic classical signal processing algorithms.

Speaker 1: So, Andrew, if I can play this back to you to make sure I understand. So a hearing aid, the starting point of a hearing aid is actually microphones. There's microphones built in there. Is that right? Yes.

Speaker 2: There's usually one or two microphones per hearing aid.

Speaker 1: And then there's a little speaker in there. So it's still you're still hearing with the hearing aid. It's just that the sound has been transformed. It's been first recorded, then passed on, transformed with different amplification for different frequencies. Is that right? That's right.

Speaker 2: And those amplifications can change depending on the person's specific hearing loss through a process called fitting. And that's where the audiologist, a doctor might come in.

Speaker 1: Got it. So a doctor would fit how your ear works, which frequencies you're better at hearing, worse at hearing. And then the electronics would preamplify to make up for that.

Speaker 2: Exactly. Bump it up a little bit at this frequency, ease it off a little bit at this frequency, similar to what when amp was doing to your rock music without without, you know, without the with the microphone and with a speaker, not through your not the recorded music, of course.

Speaker 1: So that seems like a fairly kind of basic model that that's not too hard to understand what's going on. But now you're saying that's also not enough. Why is that not enough?

Speaker 2: The first, most important reason is because the people who wear hearing aids say that's not enough. If you look at some of the biggest challenges with hearing aids, it's following complex conversations with multiple speakers in a noisy environment. And the reason that's a complex situation, the simplified model can't really help is you don't really know what to amplify. You're if you're forced to make simple decisions like make every sound between 500 hertz and, you know, let's say two kilohertz louder in a noisy environment, because that's where speeches, you know, speech tends to exist. That doesn't just make everything louder. Well, you're also going to make a coffee grinder louder in that basic model. And no matter how you do that box drawing, you're going to be a little bit you're going to be really constrained. And as a result, as a result, when we're talking, if if there's someone else talking in the background, if we're at a restaurant. Not only is it going to make it harder for me to hear you, but it's actually going to add to my again, my distress, my dissatisfaction, and that's what you constantly hear from people who use hearing aids.

Speaker 1: So they're they're hearing all these things amplified that they don't want to hear amplified. They just want to hear a specific person speak or a specific set of people speak, not everything else. And so naive amplification amplifies independent of who is speaking. So, I mean, I can think of ways to get around this, but but I'm curious, how how is the whisper hearing system getting around this?

Speaker 2: By using AI, of course, that's that's that's of course a simple answer, but I think there's two problems that we we really solved at a technical level. And then we can talk about modeling and all the fun stuff around AI. But going back to when we talk to these industry experts, I think there was two things that we wanted to be able to do. The first is we wanted to be able to use deep learning models in order to make an advancement on this problem. And you can look at a lot of offline experiments. So we want to enable that deep learning models. I think the state of the art in terms of model compression has gone a long way, but certainly in 2016, the state of the art model compression was not very good. And so we thought, OK, maybe we need a little bit more just need to store this thing. It's a lot of numbers, a lot of weights sitting there. Right. And the second problem is you need a lot more processing power. This kind of simple constraint system that you've designed was never really no one ever conceptualized something like a deep learning model. And you need to be able to add that computation power in a way that doesn't affect latency. It doesn't create a latency issue. And so that's what the problem that we were focused around that and the way we do that is through a little pocket device called the whisper brain. And so when you look at the whisper hearing system, there's traditional the traditional earpieces, which are kind of what you think about when you probably when you think about a hearing aid in your mind, it's what it looks like, exactly like that. And by themselves, those are great, premium hearing aids that have all the traditional technology you would want. But when the whisper brain is nearby, it's in your pocket on a table somewhere, it's adding that superpower engine to your hearing aid with the models, with all those capabilities through a special kind of wireless protocol we designed in order for it to not have a latency issue. And that sort of gives this idea of strapping a laptop to your head, of course, in a more practical, it gives the power of strapping a laptop to your head without maybe the downsides of strapping, literally strapping a laptop to your head.

Speaker 1: Now, back in 2016, when you started, definitely, I mean, getting deep learning slash neural net models into a hearing aid would have been completely impossible. Even a laptop was a stretch at the time, I would say, and still today, a lot of it is run on bigger machines than laptops. But here you have a special pod that runs these models. So I'm really curious, what's in this pod? Is there NVIDIA, is there somehow NVIDIA GPU in there? What do you put in there?

Speaker 2: What we look for is a kind of specialized mobile processor that would be good at handling, you know, multiply, accumulate, and so on. Multiply accumulates while you're doing a lot of multiply accumulates in parallel. Actually, what the Whisper Brain is, you know, for those of your listeners who haven't seen it, it's a palm size, maybe a little bit smaller than a deck of cards, roughly that, between the size of a car key and a deck of cards, maybe. And what I like to tell people is inside there, there's really three things. There's a radio so that the wireless communication can work, a radio and an antenna. There's a huge processor, kind of like a, you know, maybe like a 2017's mid-grade Android phone processor, that equivalent in order to be able to run all of this deep learning. And there's a battery, you know, most of it is actually a battery to power all of this and to keep it alive and to, again, get the battery. You need all day battery life to be able to do all this stuff. So from that point, it's actually quite simple, but how you integrate that into the hearing aid is, I think, where we have to spend a lot of time to get it right.

Speaker 1: Now, this pod that you carry around or sits in your pocket, that's largely battery, apparently, volume wise. But of course, the magic is in the processing and the low latency radio connection. It's using deep learning, as you've alluded to. Now, you can't just put some deep learning into a pod and assume it works. You need to somehow train it, right? You need to have some notion of that. Somehow, there is some input-output pattern that you think, you know, you can leverage to get an effectively better amplifier system than just the frequency amplification and deamplification, right? And so what goes into that? What are you leveraging in terms of data to learn from?

Speaker 2: I would say we're still on this journey. Anyone who works close, deeply in AI knows that sort of data is the lifeblood of any AI problem. And we certainly believe that very early on. When we started, very naively, we used what kind of every researcher used, which was kind of the publicly available research data sets. And that got us maybe a few months of progress. But one of the things that we learned is that those data sets, the primary one that a lot of people use is the Wall Street Journal. There's a data set called Wall Street Journal 2Mix that a lot of folks use in this problem space. What you find is that that's like a lot of research data sets. It's oversimplified. Those were voices recorded in a quiet, anechoic, no echo, no reverberation environment laid on top of each other. And the real world is just a lot more complicated than that. So very quickly, we were building models that showed great performance, but what we knew wouldn't extend into the problem space that we had. And so we had to be a little bit more creative in how we approached data and training.

Speaker 1: So I'm trying to imagine these data sets. I mean, I'm familiar with vision data sets and speech recognition data sets, but I imagine you're not doing speech recognition here because you don't want to transcribe into text. You want to do something a little different. What exactly do you want to do? What's the input like and what's the desired output?

Speaker 2: Yeah, so the input is usually audio. Audio can be many different forms. In some models, it's waveforms and in some forms, it's maybe features of audio like the FFT of the audio. And what's great when you take the FFT of an audio, you almost have a vision problem. You have a vision problem where you can't see the whole, you know, you have to be very careful if you want it to be if you want to use it in a real time system. But you get a little bit closer to a vision problem, which is actually pretty nice. So that's the input of the problem. You know, for the Wall Street Journal mix, for example, it's that data set is literally people reading the Wall Street Journal. It's a very old data set. And so different voices, men, women, older, younger, different inflections. And they still read sentences like the Fed cut interest rates by five basis points in 2019. It was like, you know, that's what you're listening to all day. And then you mix that on top of each other. And then for a lot of the early models we did, we took the you're taking the Fourier transform of those and passing in the kind of magnitude data as features into your model. And what you're getting out of that can be can be very different. And that's where maybe we've innovated the most over time, I think, in the most basic construct of the model, what you're really trying to do is get a almost like an image mask. So if your input is an FFT of audio, what you're looking to try to get out is an image mask highlighting where the important areas of sound are, where you want to focus your amplification and where you want to maybe not amplify as much. And, you know, you can steer that model, you can steer that problem based on what data you give it in whatever way makes sense. So we tend to our data sets are very speech focused because most people when they're in a noisy place, they want to hear the person talking to them. Right. But that's not always true. And, you know, you can think of exceptions to those cases or you can imagine if you're a bird watching, you know, suddenly you care more about birds than humans. Right. All of these different things. I am very excited about the kind of the longer range of this problem and why there's so much work to do. But that's kind of the basic setup of the problem.

Speaker 1: Got it. And so the way I'm interpreting this is the data set is very complex because multiple voices are overlaid and other sounds and voices could be overlaid. And then out you want something clean. You want to hear just Andrew or just Peter or something like that. Is it fair to say that then the neural net will learn to split these sounds effectively into each speaker, but then the person listening, do they have some kind of clicker and they say, OK, they click through and then they lock on to a specific speaker or how does that work and how does that then work if there's multiple people that alternate speaking? I'm curious.

Speaker 2: Yeah, so the way, you know, the way our models are set up, given our problem space is that we really want to focus on the kind of the focused speaker, the primary speaker and primary is ultimately defined by what the person wants to listen to. So usually that tends to be if you think about just take away hearing, it's just how humans think about it tends to be the loudest person. Right. That's actually a good proxy in a lot of situations. If we're talking, you're closer to me at the level of my ears, you're louder than all of the background noise. And if we're at Starbucks and someone yells out a coffee order, you know, so David, they yell really loudly. Maybe I don't want to hear that. But that speech is designed to be heard. Right. And if you have good hearing, you're going to hear that. And your mind is momentarily going to focus on that and then come back to the conversation. Now, there's a lot more kind of features that you can do. But as a starting, you know, for simplifying purposes, you can think about it like that. And because this person, you know, the user of our product has hearing loss, we still need to be able to make adjustments based on their frequent hearing loss, based on the frequencies, based on a lot of these other factors, these core algorithms that are in the earpieces that have nothing to do with the AI. So there's a kind of post-processing step that the that all of the audio goes through. So you can almost think of the model as giving outputting features that go into a larger audio system that's being adjusted. I think one of the things that is really unique about our problem that makes it maybe a little bit more complicated than just thinking about noise reduction or remove or clicking through voices is that we're not just talking about the audio. We're talking about the audio. Just thinking about noise reduction or remove or clicking through voices is actually that people want background noise, some level of background noise. They just don't want it to be drowning out the speaker, right? Background noise is really important because it tells you where you are. If you're in a Starbucks and you don't hear any of the background noise and it's just a person speaking, you actually get very, your brain gets very, you get a headache. You start to feel nauseous. Your brain is not happy about that situation. And I know that because we've tried that situation before. It's very nauseating. Background noise also has a safety component too, right? If there's a siren going off, that's background noise in some sense, but you want to know that that's there. And so often we're not just trying to remove noise and clicking through different speakers. We're actually trying to balance that given all of the other hearing loss factors that somebody has. That makes the problem maybe even more challenging than the research model, model research scope that researchers tend to look at.

Speaker 1: Now, as a startup, you start from nothing. You start out and you got to start building, right? And I mean, how do you get started? How do you start iterating with people who try it out? I mean, how do you measure progress, their experiences and tie that back into your own work?

Speaker 2: Yeah, that is something that we've changed a lot over the time as we've matured as a company too. To give you a picture of what it was like early on, we were a very small company and mostly building offline models because we didn't have a hearing aid hardware yet. There's a whole other parallel track of development that was doing that. And so we did a lot of offline evaluation. We were able to take some of our models, eventually got sophisticated enough that we could record the output of a hearing aid, premium hearing aid in the same situation. We could record and then we could go to a cafe or a park in San Francisco where we are and try to record some of that ambient environment, have a conversation and then bring that result back, play it through the hearing aid, play it through our models and then do offline evaluation with expert list, sometimes with expert listeners, sometimes with non-expert listeners. There's this kind of crowd labeling platforms that we would try, sometimes within ourselves and do something, maybe something as simple as a PESC score, which is a metric that's used in VoIP or just have people rate them on a scale of one to five.

Speaker 1: That reminds me of some recent work that is maybe not directly applicable yet, but there's some work on learning with human feedback. So rather than supervised learning, a human would rate the quality, for example, of opening eye to the text summarization, where a human would rate the quality of summarization. And it seems like some of that could at some point make its way into the hearing aids too, where a human would rate different processing modalities that are based on different ways in neural network, what they're getting and over time figure out an even better objective for that specific user, what they want to hear.

Speaker 2: Yes. And that's one of the exciting areas that we're starting to explore now. We've come a long way since that initial step. Now we have expert listeners in-house. We have a research department that works with people who use hearing aids and who use our hearing aids. And we're able to try different scenarios on the device or offline in a sound booth. And ultimately, that's where I think a lot of the future of hearing is, is being able to find an individual's path through their auditory system, right? And ultimately, program the hearing aid and develop the models so that it supports the need of that individual person. Because, you know, just like how the brain processes sound on an individual level is also very different. And that's ultimately really the goal of the hearing aid. It's to give the brain the type of information it needs in a way that's easy to process, given the compromised auditory system, so that you can get all of the information, you know, about the world, just like you and I might.

Speaker 1: Now, I'm kind of curious when people have hearing aids or have trouble hearing, right? What you talk about is on the one end, just kind of equalizer slash amplifiers, right? And the other end, the whisper hearing system, which does quite complex source, locating different sources in the sound and amplifying them differently, giving different emphasis and so forth. Now, in the human brain, when I hear something, the sound comes in, and then I have to do all that processing myself. And it makes me wonder, just from a medical point of view, when people have trouble hearing, to what extent is it kind of a physical thing versus is it a processing thing? Is that well understood? Of course, your intertwined processing is physical.

Speaker 2: In 10 years, we'll look back and we'll realize it wasn't well understood. Maybe that's the right way to say it. Hearing loss certainly has a very physical component. Hearing is literally micro hairs on your cochlea moving back and forth and responding to different frequencies. And hearing loss is often that physiological process being disrupted. But of course, that disruption can have downstream effects on that sound processing. And so, I think a real vision of a hearing aid is not just about sound and adjusting these sources and adjusting these scopes, but almost being a counteracting force to this disruptive physiological issue so that the processing in your brain can still do what it needs to do. Because ultimately, the brain is probably the most powerful computer, in some regards, the most powerful computer that we all have.

Speaker 1: And you want to keep leveraging, of course, the brain as much as possible. Now, I'm curious. From my perspective, certainly, I'm always leaning towards AI will likely be the solution because it seems like more data, more compute, and over time, things will only get better. But at the same time, as you're building this product, I wonder to which extent there are also other things that you spend a lot of time on. I mean, I could imagine you might have thought about microphone arrays that maybe you turn your head and you're kind of specifically, thanks to the array of microphones, by turning your head, you're zoning in on a specific sound somewhere. I mean, maybe that's too crazy. That's not possible with sound. But I'm curious, on the electronic side, aside from the AI, what are some things you thought about hard?

Speaker 2: Yeah, I think everything related to the sound system, the audio system of the product is super, super important. Down to, you know, I'll give you a problem related to sound that probably no one has really thought of on this podcast before, which is microphone sealing. You know, when you have a microphone, right, it has to get attached to a circuit board. It's an electronic microphone. And it has to be sealed because if it's not sealed, you know, sound can seep in through various areas and you won't get the actual signal properly. And there's a whole factoring process behind this. And then you have to ask yourself, well, how do you even test that seal? How do you know if that seal is correct? You know, and that has a big impact on the results of the whole sound system, but also of the machine learning. Eventually, each hearing aid is an input source that's coming into this model. And imagine there's like a random bias that's being added to each hearing aid in a way that you can't control, which is manufacturing, right? You want to minimize that bias as much as possible, even just so that your models, you don't have to be, you want your models to be robust to that. But at some point, you just need better inputs. That is, I think, a problem that we spent a very, very long time thinking about in our office, prototyping, and then testing a lot in our manufacturing, in our factory, because we knew it had such a big effect on the AI, right? Of course, how we design our microphone arrays is really, really important because of, like you said, because it can steer kind of where the sound is coming from. It, again, has a really big impact on what the inputs look like, whether they're pointing forward, whether they're pointing sideways, whether one is pointing forward or pointing sideways, all of these things. And so those are some of the more, I think, like hardware-oriented problems. I think the other big problem that we spent a lot of time figuring out is around the integration of all of this. You know, it's one thing to get a processor, put a battery on it, and run some models, and run that in a loop. You say, OK, that's great. But you need those results to get back to the earpieces and process the sound in under six milliseconds round trip from when the sound first hits the microphone. Six milliseconds to do all of that.

Speaker 1: Yeah.

Speaker 2: And just to give a benchmark for many folks, typically the latency on a Bluetooth link in one direction to send data over Bluetooth in one direction is somewhere, you know, at the fastest, maybe like 10 milliseconds. So just to send the data using Bluetooth, if you were to use Bluetooth as your protocol, in one direction, never mind back, you have to get it back at some point somehow, too. But let's not even focus on that problem. We just send it forward. You know, it already busts your latency requirements. And so that's where a lot of that goes into kind of the design of the communication system that goes into the hardware design, because you need to be finding low latency ways of getting data from point A to point B within the chips. And then you need it to be really robust. So you have to focus a lot on antennas and a lot of, you know, those types of things. So that's another, you know, there's like five PhDs of work there that we had to, you know, ask friends about and call, call in favors and understand in order to make this work.

Speaker 1: Wow. Now, what if somebody wants to try out the Whisper hearing system? What should they do?

Speaker 2: Well, it's very easy. They can go to our website. We work with hearing doctors, typically audiologists, they're called, around the country. And they can go to our website. There's a little form they can fill out, and we'll get in contact with them, refer them and help them schedule an appointment with their nearest doctor. And the reason we do that is so that first, we want to make sure they have a really a good baseline understanding of their hearing, because, again, that hearing aid is not a one size fits all device. It's programmed and personalized to an individual person's needs. And then that doctor is going to be able to explain how to use the product. If, as you use it, maybe you're noticing certain aspects of the sound can be adjusted for you as you get used to it, as you adapt to it. And that's what that doctor is there for. But we have doctors all over the country now. And so our website is whisper.ai.

Speaker 1: Now, how affordable is it?

Speaker 2: Our hearing aids are a little bit unique in that we offer them on a monthly plan. And one of the reasons we do that is we actually produce software upgrades to the hearing aids. You know, we'll call them new model updates, new model architectures and new weights. And we do that because we get a lot of data. So we can, over time, you know, over the year or so that this hearing aid has been launched, we've improved the models and the sound system of the hearing aid maybe four or five times already in production. And so our hearing aid right now is $139 a month. It's a three-year term. And that makes it a little bit more affordable for people. So they're not spending thousands of dollars up front, which is what a typical hearing aid would ask of folks.

Speaker 1: And is it typically covered through health insurance?

Speaker 2: There are certain health insurances that do cover it. For example, if you're a veteran that gets care through the Veterans Administration, there's often hearing aid coverage there. There are certain employers who have special plans that do hearing aids. Unfortunately, hearing aid coverage is not as broad as one would hope, given the impact that it has to so many other facets of life. You know, even just the risk of dementia, reducing the risk of dementia alone would be worth it. But I think that's an area that I'm personally really passionate about and hope to see some change over time.

Speaker 1: Yeah, seems like some advocacy is needed there. Maybe a foundation that people can donate to help other people acquiring the right hearing aids for themselves.

Speaker 2: For sure. Yeah, it has such a big impact on people's lives and people's other health outcomes, too. You know, imagine if you're an older person and you have hearing loss and you don't have a hearing aid. If your doctor tells you what you're supposed to do about your health, you're not even going to be able to, like on some basic level, you can't even hear that. So I think the, you know, there's a human argument, the human arguments may be easy to understand, but I think the economic argument is also will be easy to reveal. And I hope, you know, one of the hopes I have is that Whisper can maybe help reveal that argument and make some progress there as well.

Speaker 1: Talking about, let's say, maybe older people looking for care, but also older people often looking to talk with their grandchildren on video calls and so forth. And I'm curious, with video call, maybe things are a little bit easier and you can achieve even higher quality. Is that right? Because you kind of know that the sound is coming from that other device.

Speaker 2: It's sort of a mixed bag, I think, for people. Certainly when you know where the source is coming from, you can do a lot more to target the sound, you know, up into just wearing headphones and making them really loud. You know, that's not a great long-term solution, but if you're on a video call, you know, that gets you a certain way. But I think the impact of COVID in general has been probably pretty challenging for people with hearing loss, just because of masks alone. I've gone through the experience where I'm talking with someone and they're speaking at the regular volume. The sound is at the normal volume that I'm hearing, but I can't lip read them. So it's actually harder for me to understand what they're saying. And that's maybe only a small, small picture of what it's like to have hearing loss. And so, you know, having masks has been a real big challenge for people with hearing loss.

Speaker 1: Is there a favorite customer or patient story that you're willing to share? We get so much great feedback.

Speaker 2: I think it's one of the most motivating and one of the best parts about working on this problem and, you know, working at Whisper. You know, the one that stays with me that I'll share, I visited a clinic one day and there was an older gentleman and his son had visited to try some hearing aids. And this was for the father and he had tried lots of hearing aids and ultimately didn't like work, didn't buy any of them over the many years in which he tried them because they didn't really, he didn't feel like they were helping. And, you know, hearing aids are not an inexpensive purchase. And, you know, the doctor, I happened to be there that day, but the doctor told him a little bit about Whisper and, you know, use the word AI, you know, all of these kind of tech things. And, you know, this patient sort of like, okay, like, I don't care about AI, I don't care about this. Just like, can I try the thing? You know, I don't care that it has this, I don't care about anything that you're saying, actually, nothing you're saying matters. And he tried them and his reaction, sort of the reaction was instant in that what he noticed the difference was. And so much so, I think his reaction was one thing, but the reaction of his son that brought him was what I really remember, because for a son, he had never seen his father be able to engage in that way. You know, his father's used to operating in a certain context, in a certain world. And so now he can hear and he thinks, oh, like, oh, someone's talking over there. That's interesting. Okay, but for the rest of us, we knew that person was talking over there and this person and his father just was never paying attention to it. And all of a sudden, his father was more engaged, was kind of following around things, was talking at a more normal volume, because you can also hear yourself a little bit better. That's another impact that it has. And his son just started crying in the room in that instant. And that's, I think, a very powerful thing where you can really see how it's going to make a really meaningful impact on somebody's lives.

Speaker 1: What a beautiful story. Now, Andrew, I must ask, I mean, early on, you mentioned your grandfather as some of your inspiration. Does he wear whisper earring aids?

Speaker 2: He does, though he doesn't actually wear the final version. So it's a funny story. My grandfather lives in China, where I was actually originally born. And the last time I visited China, I was actually visiting to go to a manufacturing, had some pre-production hearing aids and gave him a set. And they're not nearly as good as what we have now, funnily enough. But he has them and he does use them. But then, as your listeners probably know, it's not so easy to visit China in the past 18 months. So since we've launched, I actually haven't been able to go and see him. I'd love to do that. And it's one of the things that I hope I get to do soon. In order to give him a pair. Now, we talked a lot about hearing, of course.

Speaker 1: I mean, really, really important. And it's what you spend most of your time on. But I got to imagine you're also thinking more broadly at times. And I'm curious, what are some other places in health care where you see very big impact potential for AI? Yeah, health care is a huge field.

Speaker 2: And I certainly don't claim to be an expert in any one of them. But I always get more excited about the very practical impacts that AI can have. You know, the areas where you don't even know necessarily that they add, you don't have to market it as AI. It's just it's just bringing some good to the world, let's say. And certainly, I think where the kind of intersection of computer science and health care is, intersection of computer vision and health is a big area of just bringing robustness to the health care system. I think, for example, if somebody has, if you're looking at an X ray, and you're looking at an MRI or some sort of scan and trying to diagnose something, just having AI be able to support a doctor's decision or improve the decision making of a medical team. I think that's amazing work, and has a huge measurable effect in the quality of life of so many people today and reduces errors and all the other things that our health care system is really concerned about. It's not, you know, you notice I didn't describe that as a vision where like, you send pictures and then the AI is your doctor and decides all of your health care outcomes, you know. Certainly with enough data, I think we can we can reach maybe a vision or get a lot closer to that vision. But that, you know, I think that vision sometimes ignores all of the important steps along the way that have real meaningful change to people's day to day lives. I also think there's a lot of health care problems like hearing, where personalization and kind of an individual's care journey is going to be very, very different than the average person's care journey, right. And, you know, a mathematical way to say that is, you know, the variance of given, given a same people with the same set of inputs, the correct output has high variance, right. Hearing is one of those people, one of those things, if you have two people with the same measured hearing loss, the actual hearing aid settings that are best for that person are all over the map, you know, the prescription is just the starting point, it's not an ending point, unlike glasses, let's say, the prescription is the ending point in glasses. And I think of those problems, AI is very well suited to help optimize and map that journey through this very complex space, you know, with human cognition in the way and human, human preferences, all of these things, like you were mentioning earlier. And I think that that's going to be a big area where we can kind of as a health care system, we'll be able to move away from saying, well, we average 900 data points, so that when you have this measure, when you have this problem, we know on average, this is the best result. So we'll give you this, we'll recommend this result, we can actually take a lot of those individual factors and give you a more personalized result. And I think that's going to support healthcare and healthcare outcomes in a way that wasn't possible before.

Speaker 1: Well, that'll be a beautiful feature once we get there. Yes. So well, Andrew, thanks so much for joining us. This was a really great conversation. Thank you.

Speaker 2: Yeah, I had a lot of fun. Thanks for having me and can't wait to can't wait to listen to it. You know, when it's all done, I really appreciate you inviting me on.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript