Communication - Symbolic

Chapter 5 deals with what we've defined as symbolic and non-symbolic communication. Symbolic communication is essentially any communication using some form of well codified and structured language, so speech or writing, but also sign-language. Non-symbolic communication includes more unstructured and uncodified methods such as gesture, expression, body language and the like.

In this page we'll summarise work around the first two below, but then we have separate pages on the other three as they are so key to VH work, and our own interests!

We treat the first two relatively lightly as we think that any flexible virtual human system should just be able to take advantage of whatever ASR and TTS systems are available through an API, they aren't key and are being driven by lots of other use cases. We do recognise though that particularly for speech recognition performance can be improved by a tight feedback loop between the audio detection and the user intent so far derived from the conversation.

Speech Recognition (ASR)

Interesting links include:

References of note include:

  • Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication,56, 85-100.
  • Cutajar, M., Gatt, E., Grech, I., Casha, O., & Micallef, J. (2013). Comparative study of automatic speech recognition techniques. Signal Processing, IET, 7(1), 25-46.
  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6), 82-97.
  • Johnson, M., Lapkin, S., Long, V., Sanchez, P., Suominen, H., Basilakis, J., & Dawson, L. (2014). A systematic review of speech recognition technology in health care. BMC medical informatics and decision making,14(1), 94.
  • Kelly, S. D. (2001). Broadening the units of analysis in communication: Speech and nonverbal behaviours in pragmatic comprehension. Journal of Child Language, 28(2), 325-349.
  • Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: a review. International journal of speech technology, 15(2), 99-117.

Speech Generation/Synthesis

Interesting links and demos include:

(we'll add to this or post in the blog as we find new ones)

References of note include:

  • Black, A. W., Bunnell, H. T., Dou, Y., Muthukumar, P. K., Metze, F., Perry, D., ... & Vaughn, C. (2012). Articulatory features for expressive speech synthesis. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 4005-4008). IEEE.
  • Kisner, J. (2015, Jan 23). The technology giving voice to the voiceless. The Guardian. Available online
  • Taylor, P. (2009). Text-to-speech synthesis. Cambridge: Cambridge University Press.
  • Yamagishi, J., Veaux, C., King, S., & Renals, S. (2012). Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology, 33(1), 1-5.