Glossary
Closed Captions - Encoded into the video as ancillary data. They can be turned on or off by the user.
CEA-608 - Line 21 Data Services
CEA-708 - Digital Television (DTV) Closed Captioning
Open Captions - Burned into the video and don’t give the user the control to toggle the captions on or off.
Subtitles - web equivalent of closed captions in video is typically referred to as web captions or video captions. These are text-based representations of dialogue and sounds synchronized with online video content. They serve the same function as traditional closed captions by providing accessibility to people who are deaf or hard of hearing and ensuring they can follow the content.
SRT - SubRip Subtitle
WebVTT- Web Video Text Tracks
SCC- Scenarist Closed Captions
STL- Spruce Subtitle File
SMPTE-TT- Society of Motion Picture and Television Engineering Timed Text
Assistive Listening - refers to technology and devices designed to improve the hearing experience for individuals with hearing loss or difficulty in noisy environments. These systems amplify sound, reduce background noise, and enhance speech clarity, making it easier to engage in conversations or listen to audio content.
Communication Access Realtime Translation (CART) - a service that provides real-time, word-for-word transcription of spoken language into written text, making live events, meetings, and conversations accessible to individuals who are deaf or hard of hearing. A trained CART captioner listens to the speech and transcribes it instantly, displaying the text on a screen for users to follow along. This service is commonly used in educational, workplace, and public settings, offering immediate access to communication and ensuring inclusivity for individuals with hearing impairments.
Speech-to-text (STT) or Automatic Speech Recognition (ASR) - is a technology that converts spoken language into written text. It uses advanced algorithms, including machine learning and natural language processing (NLP), to recognize and transcribe speech patterns.
Text-To-Speech (TTS) or Voice Synthesis or AI Dubbing- generates spoken language from text input. When the focus is on replicating natural-sounding human voices, especially for specific individuals or for highly realistic speech, terms like voice cloning and synthetic voice are also used. In contexts involving interactive responses, you might also hear conversational AI or speech generation.
Machine Translation (MT) - is a type of machine translation that uses artificial neural networks to automatically translate text from one language to another. Unlike traditional machine translation methods, which relied on statistical models or phrase-based systems, NMT uses deep learning techniques to improve translation accuracy and fluency by understanding context, grammar, and sentence structure.
Descriptive Audio (audio description or video description) - A narration track added to movies, TV shows, videos, and live performances that describes visual elements for people who are blind or visually impaired. The description provides details about key visual elements, like characters’ actions, facial expressions, scene changes, costumes, and other important visual information that enhances the storytelling experience.
Transcript - a written version of spoken content, capturing word-for-word what was said in audio or video recordings.
Speaker Identification or Diarization - the process of identifying and distinguishing between different speakers in an audio or video recording. In speech processing and transcription, diarization is often used to segment the audio into sections that correspond to individual speakers, essentially answering the question, “Who spoke when?”
In-Room Captions - converting speech to text then displaying that text on a screen for the audience to read in a venue.
Mobile Captions - converting speech to text then displaying that text on a personal mobile device such as phone or tablet for an individual to read.
Mobile Listening - Allowing an individual to listen to the spoken audio for a meeting on their personal mobile device in the original or translated languages.
STT Latency - The time it takes to convert from spoken words to text.
STTS Latency - The time it takes to convert from spoken words to text to translated text to spoken words: STT → MT → TTS
Error Rate or Accuracy - a percentage of words that are accurately transcribed.
Caption Editing - The ability to edit closed caption or subtitle data
Serial Digital Interface (SDI) - 608 and 708 support in the VANC.
Network Device Interface (NDI) - a network method to move live video streams on a network. It does not transfer any closed caption or subtitle data.
Web Content Accessibility Guidelines (WCAG) - Part of a series of web accessibility guidelines published by the Web Accessibility Initiative (WAI) of the World Wide Web Consortium (W3C), the main international standards organization for the Internet. They are a set of recommendations for making Web content more accessible, primarily for people with disabilities—but also for all user agents, including highly limited devices, such as mobile phones.