Learning to detect named entities in bilingual code-mixed open speech corpora



Journal Title

Journal ISSN

Volume Title



This research addresses the problem of code-mixing in speech-based cognitive services, and the subtasks of language identification in multilingual speech commands, search, and named entity recognition. According to the American Community Survey (ACS) published by the United States Census Bureau, more than 20 percent of U.S. residents speak a language other than English at home. Many bilingual speakers habitually and even subconsciously switch languages in mid-sentence and mix them in successive sentences. For example, this happens when a user wants to listen to popular music by artists from different countries and use the native pronunciation of the artist's name. Misrecognition of these embedded named entities by an automatic speech recognition (ASR) system can lead to wrong search results. For instance, when a user wants to play songs by Chinese singers on Spotify, home assistants frequently play the wrong songs because they only recognize English. When callers leave voicemail messages on Google Voice that are transcribed to text, specific named entities (people, places, and things) and the surrounding context of messages are often misinterpreted. Malfunctions of this kind are inconvenient and detract from the holistic user experience for home assistant users. To develop a machine learning-driven approach towards coping with such usability issues, I developed a research test bed centered around code-mixed bilingual sentences. We collected voice recordings from 40 individual participants for multiple commands, multiple streaming music service names, and about 100 Chinese names. We segmented and recombined these samples automatically using sound editing software to combinatorically enumerate a set of utterances, each of which is a short command phrase. Instead of traditional ways to use Hidden Markov models (HMMS), I used a deep learning model which is part of the Baidu DeepSpeech Project and developed by contributors to the Mozilla DeepSpeech open source repository on GitHub. This narrows the focus of our code-mixing task, and the associated supervised learning task, to language identification and segmentation of utterances in different languages at the phrase level. This facilitates development of a prototype web application through which users can contribute their voice data to improve the system. In current and continuing work, I am improving the phrasal model using deep learning to develop a working prototype that integrates with cognitive service APIs (e.g., Amazon Alexa, Google Home) for Chinese/English music search.



Code-mixed, Speech recognition, Deep learning, Recurrent neural networks, Cognitive services, Bilingual named entities

Graduation Month



Master of Science


Department of Computer Science

Major Professor

William H. Hsu