So, yeah. It turns out, just like what often happens in real life, saying people’s names is hard for Alexa. Last names and uncommon first names are particularly difficult. There is a way to specify the pronunciation of expected slot values using Speech Synthesis Markup Language (SSML). This could potentially be helpful in this case.
Since Alexa isn’t great at taking unexpected speech and turning it into written text, saying a first name that wasn’t in its list of U.S. first names often caused Alexa to either respond with something completely wrong or just refuse to do anything at all. For example, Lizelle was interpreted as Mozelle. I remedied this by extending the first name slot types with values for our uncommon first names. There was a similar issue with the tweeker topic slot. I did more research and testing into ways to trick Alexa into accepting and transcribing unexpected words. The suggested approach involved filling custom slots with thousands of nonsense words and phrases. Despite what the Internet said, this didn’t work. Throw in people with different accents and oh my. Let the fun begin.
There is also the Abby/Abbey, Irvin/Ervin, and Jeff/Geoff problem. Names with different spellings that are pronounced the same. This was falsely remedied by extending the first name slot types with these different spellings for people who work at Watershed. It seems impossible to deduce which spelling a person wants to use from the way they speak a name. One approach would be to get the person using the app to login prior to recording the tweeker. This isn’t a complete solution given Alexa still wouldn’t know which name to use for the other person involved in the tweeker.
Related Reading: Alexa developers get 8 free voices to use in skills (via TechCrunch)
Last, there is the getting the correct name into its corresponding slot problem. Even though Alexa is great at knowing when to prompt users for required slot values, with names it’s hard to get it right. For example, David is a common first and last name. There were often cases where Alexa would simply put the wrong name in the wrong slot even though it thought it was correct. One approach to solve this problem would be to redesign the VUI to prompt for each name individually. That would increase the time it took to record a tweeker and honestly, sounds like a bad experience.
Overcoming bad data
All of these technical issues add up to one huge challenge with this prototype: bad data. While it was incredibly easy to get data into Watershed, adding bad data to drive insights around who is having tweekers and what they’re talking about is beyond problematic.
So, could voice technology be useful in this type of learning context? I think if the mentor and mentee identification problem could be solved it would definitely be useful.
There are tons of other learning uses cases we could explore with voice technology. Here are just a few:
- You could get a refresher about topics you’ve previously learned about. Tracking those interactions could help a learning organization understand the topics people need to reacquaint themselves with most often.
- What are the top learning courses this week?
- Who are the most active learners this week?
- Who has compliance certifications expiring this month?
- What are the most watched videos this week?
- What percentage of people have completed the sales training program?
- Who are the top five people that completed sales training and have the highest sales?
These ideas are less problematic because using all the data we have in Watershed, we could easily feed content into Alexa so it knows what to listen for.
Want to see more?
If you’re interested in viewing the code for these two prototypes, I uploaded them to Watershed’s GitHub page:
[Editor's Note: This blog post was originally posted on May 31, 2017, and has been updated for comprehensiveness.]