Chapter 5 deals with what we've defined as symbolic and non-symbolic communication. Symbolic communication is essentially any communication using some form of well codified and structured language, so speech or writing, but also sign-language. Non-symbolic communication includes more unstructured and uncodified methods such as gesture, expression, body language and the like.
In this page we'll summarise work around the first two below, but then we have separate pages on the other three as they are so key to VH work, and our own interests!
- Speech Recognition (Speech to Text)
- Speech Generation/Synthesis (Text to Speech)
- Natural Language Understanding and Communication
- Natural Language Generation
- Internal Dialogue
Speech Recognition (ASR)
- Google Web Speech API demo - also GitHub of demos
- IBM Watson ASR demo
- Microsoft Bing ASR demo
- Speech Recognition Anywhere - Chrome extension, works pretty well
References of note include:
- Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication,56, 85-100.
- Cutajar, M., Gatt, E., Grech, I., Casha, O., & Micallef, J. (2013). Comparative study of automatic speech recognition techniques. Signal Processing, IET, 7(1), 25-46.
- Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6), 82-97.
- Johnson, M., Lapkin, S., Long, V., Sanchez, P., Suominen, H., Basilakis, J., & Dawson, L. (2014). A systematic review of speech recognition technology in health care. BMC medical informatics and decision making,14(1), 94.
- Kelly, S. D. (2001). Broadening the units of analysis in communication: Speech and nonverbal behaviours in pragmatic comprehension. Journal of Child Language, 28(2), 325-349.
- Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: a review. International journal of speech technology, 15(2), 99-117.
- SSML: A speech synthesis markup language and on Alexa
- Lyrebird - voice "cloning" - so so results at the moment (Nov 18)
- VocalID - voice "cloning"
- iSpeech - TTS service
(we'll add to this or post in the blog as we find new ones)
References of note include:
- Black, A. W., Bunnell, H. T., Dou, Y., Muthukumar, P. K., Metze, F., Perry, D., ... & Vaughn, C. (2012). Articulatory features for expressive speech synthesis. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 4005-4008). IEEE.
- Kisner, J. (2015, Jan 23). The technology giving voice to the voiceless. The Guardian. Available online https://www.theguardian.com/news/2018/jan/23/voice-replacement-technology-adaptive-alternative-communication-vocalid
- Taylor, P. (2009). Text-to-speech synthesis. Cambridge: Cambridge University Press.
- Yamagishi, J., Veaux, C., King, S., & Renals, S. (2012). Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology, 33(1), 1-5.