Speech Recognition Terminology
What are terminology of Speech Recognition?
Artificial Intelligence (AI): The study and programming of computer systems that can perform tasks normally associated with human intelligence (including perception and discovery, decision-making, and speech recognition).
Phoneme: The smallest unit of speech. Phonemes help distinguish one word from another.
Algorithms: An equation or set of rules that a computer follows (coupled with AI decision-making capabilities).
Artificial Neural Networks: These are considered the “foundation of AI,” and refer to the pieces of the processing system that computers use to perform human-like, intelligent problem-solving.
Natural Language Processing (NLP): This is the area of AI that revolves around the study of machines interacting with humans through “natural” (i.e., human) language.
Machine Learning: This is where artificial intelligence equips a machine to learn and improve its processes automatically without human intervention.
Interactive Voice Response (IVR): This is the technical name for the interactive telephone systems that can respond to voice and dial pad commands.
User Interface (UI): This is where the program “comes to the surface” where it can interact with a user. UI refers to all the means a system uses to interact with humans, including the desktop, display screens, keyboards, and mouse.
Directed Dialogue: In contrast with a user speaking freely to the user interface, this is where specific, pre-determined phrases need to be utilized for the system to recognize the command and act on it. This is especially common in IVR systems on customer service telephone lines.
Conversational Interface: The conversational interface is where the system intakes natural language, which in the case of speech recognition will always be spoken.
Speech Engine: This is the software that takes spoken input, compares it to available vocabulary and assigns meaning to it.
Deep Learning: This is a subset of machine learning, which is a subset of AI. It’s considered “deep” because it has the ability to machine-learn from unstructured, unlabeled data on a partially or completely unsupervised basis.