This article was co-written with the help of both ChatGPT and Google Bard as a demonstration of the technology discussed in this article. You can also read along with Aberdeen's President, Matt Cook in the recording below - but not really, this is Matt's voice cloned using a short clip of Matt's voice given to AI.
Artificial Intelligence (AI) has revolutionized numerous industries, and its influence on language-related technologies is particularly remarkable. In this blog post, we will explore how AI is transforming closed captioning, language translation, and even the creation of cloned voices. These advancements not only enhance accessibility and inclusion but also have far-reaching implications for communication in an increasingly globalized world.
Closed captioning is an essential feature for individuals who are deaf or hard of hearing, enabling them to access audiovisual content. Traditional closed captioning methods rely on human transcriptionists, however, AI-powered speech recognition algorithms have made significant strides in this field.
Using deep learning techniques, AI models can more accurately transcribe spoken words into text, providing real-time closed captioning. This is not up to the FCC guidelines for broadcast but is oftentimes good enough for other situations where the alternative is to have no closed captions at all. These models continuously improve their accuracy by analyzing large amounts of data and learning from diverse sources. As a result, AI has made closed captioning more accessible, enabling individuals to enjoy online videos with greater ease.
Our team is working hard to develop and launch AberScribe, our new AI transcript application powered by OpenAI, sometime in mid-2024. From any audio/video source file, the AberScribe app will create an AI-generated transcript that can be edited in our online transcript editor and exported into various caption formats. AberScribe will also have added features for creating other AI-generated resources from that final transcript. Resources like summaries, glossaries of terms, discussion questions, interactive worksheets, and many more - the possibilities are endless.
Sign up to join the waitlist and be one of our first users: https://aberdeen.io/aberscribe-wait-list/
Language barriers have long hindered effective communication between people from different linguistic backgrounds. However, AI-powered language translation has emerged as a game-changer, enabling real-time multilingual conversations and seamless understanding across different languages.
Machine Translation (MT) models, powered by AI, have made significant strides in accurately translating text from one language to another. By training on vast amounts of multilingual data, these models can understand and generate human-like translations, accounting for context and idiomatic expressions. This has empowered businesses, travelers, and individuals to engage in cross-cultural communication effortlessly.
In addition to written translation, AI is making headway in spoken language translation as well. With technologies like neural machine translation (NMT), AI systems can listen to spoken language, translate it in real-time, and produce synthesized speech in the desired language. This breakthrough holds immense potential for international conferences, tourism, and fostering cultural exchange.
The advent of AI has brought about significant advancements in speech synthesis, allowing for the creation of cloned voices that mimic the speech patterns and vocal identity of individuals. While cloned voices have sparked debates regarding ethical use, they also present exciting possibilities for personalization and accessibility.
AI-powered text-to-speech (TTS) models can analyze recorded speech data from an individual, capturing their vocal characteristics, intonations, and nuances. This data is then used to generate synthetic speech that sounds remarkably like the original speaker. This technology can be immensely beneficial for individuals with speech impairments, providing them with a voice that better aligns with their identity.
Moreover, cloned voices have applications in industries like entertainment and marketing, where celebrity voices can be replicated for endorsements or immersive experiences. However, it is crucial to navigate the ethical considerations surrounding consent and proper usage to ensure that this technology is used responsibly.
Artificial Intelligence continues to redefine the boundaries of accessibility, communication, and personalization in various domains. In the realms of closed captioning, language translation, and cloned voices, AI has made significant strides, bridging gaps, and enhancing user experiences. As these technologies continue to evolve, it is vital to strike a balance between innovation and ethical considerations, ensuring that AI is harnessed responsibly to benefit individuals and society as a whole.
Been tasked with figuring out how to implement closed captions in your video library? The process can be overwhelming at first. While evaluating closed captioning vendors, it’s good to understand the benefits of captioning, who your audience is, what to consider when it comes to quality, and what to expect from a vendor.
There are several things that an organization should consider and evaluate before choosing a closed captioning vendor. Some of the most important factors include:
Overall, closed captioning is a valuable tool that can benefit a wide range of audiences. It makes videos more accessible, engaging, and comprehensible for everyone.
By considering these factors, organizations can choose a closed captioning vendor that will meet their needs and provide a high-quality service:
Use these tips when evaluating closed captioning vendors and you’ll ensure that their videos are accessible to everyone and that they provide a positive viewing experience for all viewers.
In 2022, just days before winning the primary to become the Democratic candidate for the Senate in Pennsylvania, John Fetterman suffered a stroke. Like many stroke victims, he experienced a loss of function that persisted long after his recovery, including lingering auditory processing issues that made it challenging for him to understand spoken words. In interviews in the months that followed, John Fetterman relied on closed-captioning technology to help him comprehend reporters' questions and assist in his debates against his primary opponent, Dr. Mehmet Oz.
Upon being elected to serve in the US Senate, closed-captioning devices were installed both at his desk and at the front of the Senate chambers to facilitate his understanding of his colleagues as they spoke on the Senate floor. John Fetterman serves on several committees, including the Committee on Agriculture, Nutrition, and Forestry; the Committee on Banking, Housing, and Urban Affairs; the Committee on Environment and Public Works; the Joint Economic Committee; and the Special Committee on Aging. Closed-captioning has proven invaluable, benefiting both John Fetterman and his constituents in Pennsylvania, extending its utility beyond merely enabling him to watch TV at night or understand reporters.
With the assistance of closed-captioning technology, John Fetterman has been able to serve the people of Pennsylvania at the highest levels of government. During a hearing with the Senate Special Committee on Aging, Fetterman himself expressed gratitude for the transcription technology on his phone, stating, "This is a transcription service that allows me to fully participate in this meeting and engage in conversations with my children and interact with my staff." He later added, "I can't imagine if I didn't have this kind of bridge to allow me to communicate effectively with other people."
Captioning and transcription efforts extend well beyond being a mere requirement for broadcasting a program. As captioning technology continues to advance, an increasing number of individuals, like John Fetterman, will have the opportunity to participate in public life, even at the highest levels of government. They will serve others, even as transcription and captioning technology serves them.
Take a look at his setup in action here. Dedicated monitors with real-time captions displayed are becoming an increasingly popular setup at live events. Alternatively, explore the convenience of live captioning on mobile phones, making captions accessible from any seat in the venue. Either option is easily achievable — contact one of our experts to find out more.
On October 11, 2022, the Federal Communications Commission (FCC) released the latest CVAA biennial report to Congress, evaluating the current industry compliance as it pertains to Sections 255, 716, and 718 of the Communications Act of 1934. The biennial report is required by the 21st Century Communications and Video Accessibility Act (CVAA), which amended the Communications Act of 1934 to include updated requirements for ensuring the accessibility of "modern" telecommunications to people with disabilities.
FCC rules under Section 255 of the Communications Act require telecommunications equipment manufacturers and service providers to make their products and services accessible to people with disabilities. If such access is not readily achievable, manufacturers and service providers must make their devices and services compatible with third-party applications, peripheral devices, software, hardware, or consumer premises equipment commonly used by people with disabilities.
Despite major design improvements over the past two years, the report reveals that accessibility gaps still persist and that industry commenters are most concerned about equal access on video conferencing platforms. The COVID-19 pandemic has highlighted the importance of accessible video conferencing services for people with disabilities.
Zoom, BlueJeans, FaceTime, and Microsoft Teams have introduced a variety of accessibility feature enhancements, including screenreader support, customizable chat features, multi-pinning features, and “spotlighting” so that all participants know who is speaking. However, commentators have expressed concern over screen share and chat feature compatibility with screenreaders along with the platforms’ synchronous automatic captioning features.
Although many video conferencing platforms now offer meeting organizers synchronous automatic captioning to accommodate deaf and hard-of-hearing participants, the Deaf and Hard of Hearing Consumer Advocacy (DHH CAO) pointed out that automated captioning sometimes produces incomplete or delayed transcriptions and even if slight delays of live captions cannot be avoided, these captioning delays may cause “cognitive overload.” Comprehension can be further hindered if a person who is deaf or hard of hearing cannot see the faces of speaking participants, for “people with hearing loss rely more on nonverbal information than their peers, and if a person misses a visual cue, they may fall behind in the conversation.”
At present, the automated captioning features on these conference platforms have an error rate of 5-10%. That’s 5-10 errors per 100 words spoken and when the average conversation rate of an English speaker is 150 words per minute, you’re looking at the possibility of over a dozen errors a minute.
Earlier this year, our team put Adobe’s artificial intelligence (AI) powered speech-to-text engine to the test. We tasked our most experienced Caption Editor with using Adobe’s auto-generated transcript to create & edit the captions to meet the quality standards of the FCC and the deaf and hard of hearing community on two types of video clips: a single-speaker program and one with multiple speakers.
How did it go? Take a look: Human-generated Captions vs. Adobe Speech-to-text
Open captions and closed captions are both used to provide text-based representations of spoken dialogue or audio content in videos, but they differ in their visibility and accessibility options.
Here's the difference between closed and open captions:
Feature | Open Caption | Closed Captions |
---|---|---|
Visibility | Permanently embedded in the video | Separate text track that can be turned on or off |
Accessibility | Cannot be turned off | Can be turned on or off by the viewer |
Applications | Wide audiences, noisy environments | Diverse audiences, compliance with accessibility regulations |
Creation | Added during video production | Generated in real-time or embedded manually during post-production or uploaded as a sidecar file |
Both open and closed captions serve the purpose of making videos accessible to individuals who are deaf or hard of hearing, those who are learning a new language, or those who prefer to read the text alongside the audio.
The choice between open or closed captions depends on the specific requirements and preferences of the content creators and the target audience.
In the July ‘21 release of Premiere Pro, Adobe introduced its artificial intelligence (AI) powered speech-to-text engine to help creators make their content more accessible to their audiences. Their extensive toolset allows their users to edit, stylize, and export captions in all supported formats straight out of the sequence timeline of a Premiere Pro project. A 3-step process of auto-transcribing, generating, and stylizing captions all within the platform already familiar to its users delivers a seamless experience from beginning to end. But how is the accuracy of the final product?
Today, AI captions, at their best, have an error rate of 5-10% - much improved over the 80% accuracy we saw just a few years ago. High accuracy is crucial for the deaf and hard-of-hearing audience as each error adds to the possibility of confusing the message. To protect all audiences that rely on captioning to understand programming on television, the Federal Communications Commission (FCC) set a detailed list of quality standards by which all captions must meet to be acceptable for broadcast back in 2015. Preceding those standards, the Described and Captioned Media Program (DCMP) published its Captioning Key manual over 20 years ago and has since been a valuable reference for captioning of both entertainment and educational media targeted to audiences of all age groups. Simply having captions present on your content isn’t enough, it needs to be accurate and best replicate the experience for all audiences.
Adobe’s speech-to-text engine has been one of the most impressive that our team has seen to date, so we decided to take a deeper look at it and run some tests. We tasked our most experienced Caption Editor with using Adobe’s auto-generated transcript to create & edit the captions to meet the quality standards of the FCC and the deaf and hard of hearing community on two types of video clips: a single-speaker program and one with multiple speakers. Our editor used our Pop-on Plus+ caption product for these examples, which are our middle-tier quality captions that fulfill all quality standard requirements but are not always 100% free of errors.
Did using Adobe’s speech-to-text save time, or did it create more work in the editing process than needed? Here’s how it went…
In-depth comparison documents that evaluate the captions cell-by-cell are available for download here:
In this example, we used the perfect scenario for AI: clear audio, a single speaker at an optimal words-per-minute (WPM) speaking rate, and no sound effects or music.
The captions contained the following issues that would need to be corrected by Caption Editor:
Here’s the clip with Adobe’s speech-to-text captions overlayed on the top half of the video, and ours on the bottom half.
For the next clip, we went with a more realistic example of television programming where there are multiple speakers, an area where AI is known to struggle and has difficulties identifying the speakers. This clip also features someone with a pronounced accent, commentators speaking over one another, and proper names of athletes – all of which our editors take the time to research and understand.
The same errors detailed in the single-speaker example are present throughout, among the other difficulties we expected it to have. In fact, there were so many errors that our editor was unable to use the transcript from Adobe and started from the beginning using our own workflow.
Here’s a sample of the first 9 cells of captions with what Adobe transcribes in the first column, notes from our Caption Editor, and how it should look.
Adobe’s Automated SRT Caption File | Issue | Formatted by Aberdeen |
---|---|---|
something you are never seen in your life, correct? | No speaker ID. | (Pedro Martinez) It's something you have never seen in your life, |
“Correct” is spoken by new speaker. | (Matt Vasgersian) Correct! | |
So it's. | Missing text. | So it's--so it's MVP of the year! |
So we're all watching something different. OK | (Pedro) We're all watching something different. | |
He gets the MVP. | Okay, he gets the MVP. | |
I'd be better off. | Completely misunderstood music lyrics. | ♪ Happy birthday to you ♪ |
Oh, you, you guys. | (Matt) You guys. | |
Let me up here to dove into the opening night against the Hall of Fame. | Merged multiple sentences together. | Just left me up here to die. |
You left me up here to die against the hall of famer. |
Take a look at the clip. Again, with Adobe's speech-to-text on the top and Aberdeen on the bottom.
In-depth comparison documents that evaluate the captions cell-by-cell are available for download here:
Overall, the quality of the auto-generated captions exceeded expectations, and we found them to be in the top tier of speech-recognition engines available. The timing and punctuation were particularly impressive. However, when doing a true comparison to the captioning work that we would consider acceptable, AI does not meet Aberdeen’s broadcast quality standard.
Aberdeen's post-production Caption Editors are detail-oriented and grammar savvy and always strive to portray every element of the program with 100% accuracy so that the viewer misses nothing. For our most experienced Caption Editor, it took a 5:1 ratio in time for them to edit and correct the single-speaker clip; meaning, for every minute of video, it took 5 minutes to clean up the transcript and captions. Assuming your team is educated in the proper timing of caption cells, line breaks, and grammar, a 30-minute program may take over 2.5 hours to bring up to standards with a usable transcript. In the second example, the transcript was unusable and would have taken more time to clean up than it did to transcribe from scratch. Double that timeline now.
Consider all of the above when using this service. Do you have the time and resources to train your staff to know how to edit auto-generated captions and get them up to the appropriate standards? How challenging may your content be for the AI? Whenever and however you make the choice, make sure you deliver the best possible experience to your entire audience.
This article is current as of February 4th, 2022.
A few months ago, Zoom announced that auto-generated captions (also known as live transcription) were now available for all Zoom meeting accounts. The development has been a long-awaited feature for the deaf and hard-of-hearing community.
As popular and ubiquitous Zoom has become, it can be overwhelming to understand its multiple meeting features and options – especially in regards to closed captioning. Here at Aberdeen Broadcast Services, we offer live captioning services with our team of highly trained, experienced captioners with the largest known dictionaries in the industry. CART (Communication Access Realtime Translation) captioning is still considered the gold standard of captioning (See related post: FranklinCovey Recognizes the Human Advantage in Captioning). Our team at Aberdeen strives to go above and beyond expectations with exceptional captioning and customer service.
Whether you choose to enable Zoom’s artificial intelligence (AI) transcription feature or integrate a 3rd-party service, like Aberdeen Broadcast Services, the following steps will help ensure you’re properly setting up your event for success.
To get started, you'll need to enable closed captioning in your Zoom account settings.
Scroll down to the “Closed captioning” options.
In the top right, enable closed captions by toggling the button from grey to blue to “Allow host to type closed captions or assign a participant/3rd-party service to add closed captions.”
Below is a detailed description of the additional three closed captioning options here in the settings...
This feature enables a 3rd-party closed captioning service, such as Aberdeen Broadcast Services, to caption your Zoom meeting or webinar using captioning software. The captions from a 3rd-party service are integrated into the Zoom meeting via a caption URL or API token that sends its captions to Zoom. For a 3rd-party service such as Aberdeen to provide captions within Zoom, this feature must be enabled.
As mentioned earlier in this post, auto-generated captions or AI captions became available to all Zoom users in October 2021. Zoom refers to auto-generated captions as its live transcription feature, which is powered by automatic speech recognition (ASR) and artificial intelligence (AI) technology. While not as accurate, ASR is an acceptable way to provide live captions for your Zoom event if you are not able to secure a live captioner. If you will be having a live captioner through a 3rd-party service in your meeting, do NOT check “Allow live transcription service to transcribe meeting automatically.”
Unless you expect to use Zoom’s AI live transcription for most of your meetings, it is best to uncheck or disable live transcription as Zoom’s AI auto-generated captions will override 3rd-party captions in a meeting if live transcription is enabled.
This setting gives the audience an additional option to view what is being transcribed during your Zoom meeting or webinar. In addition to viewing captions as subtitles at the bottom of the screen, users will be able to view the full transcript on the right side of the meeting.
The meeting organizer or host can control permission of who can save a full transcript of the closed captions during a meeting. Enabling the Save Captions feature grants access to the entire participant list in a meeting.
Transcript options from 3rd-party services may vary. At Aberdeen Broadcast Services, we provide full transcripts in a variety of formats to fit your live event or post-production needs. For more information, please see our list of captioning exports or contact us.
Once the webinar or meeting is live, the individual assigned as the meeting host can acquire the caption URL or API token.
As the host, copy the API token by clicking on the Closed Caption or Live Transcript button on the bottom of the screen and selecting Copy the API token, which will save the token to your clipboard.
By copying the API token, you will not need to assign yourself or a participant to type. Send the API token to your closed captioning provider to integrate captions from a 3rd-party service into your Zoom meeting. We ask that clients provide the API token at least 20 minutes before an event (and no earlier than 24 hours) to avoid any captioning issues.
Once the API token has been activated within your captioning service, the captioner will be able to test captions from their captioning software.
A notification in the Zoom meeting should pop up at the top saying “Live Transcription (Closed Captioning) has been enabled.” and the Live Transcript or Closed Caption button at the bottom of the screen will appear for the audience. Viewers can now choose Show Subtitle to view the captions.
Viewers will be able to adjust the size of captions by clicking on Subtitle Settings...
Yes! Captioning multiple breakout rooms occurring at the same time is possible using the caption API token to integrate with a 3rd-party, such as Aberdeen Broadcast Services. Zoom's AI live transcription option is currently not supported in multiple Zoom breakout rooms, which is why it is important to consult with live captioning experts to make that happen. Contact us to learn more about how it works.
Enjoy this post? Email sales@abercap.com for more information or feedback. We look forward to hearing your thoughts!
Closed captioning is an essential aspect of modern media consumption, bridging the gap of accessibility and inclusivity for diverse audiences. Yet, despite its widespread use, misconceptions about closed captioning persist. In this article, we delve into the most prevalent myths surrounding this invaluable feature, shedding light on the truth behind closed captioning's capabilities, impact, and indispensable role in enhancing the way we interact with video content.
Let’s debunk these common misunderstandings about closed captioning and gain a fresh perspective on the far-reaching importance of closed captioning in today's digital landscape.
While closed captions are crucial for people with hearing impairments, they benefit a much broader audience. They are also helpful for people learning a new language, those in noisy environments, individuals with attention or cognitive challenges, and viewers who prefer to watch videos silently.
While there are automatic captioning tools, they are not always accurate, especially with complex content, background noise, or accents. Human involvement is often necessary to ensure high-quality and accurate captions.
Some formats, like SCC, support positioning and allow captions to appear in different locations. However, most platforms use standard positioning at the bottom of the screen.
While it's possible to add closed captions after video production, it's more efficient and cost-effective to incorporate captioning during the production process. Integrating captions during editing ensures a seamless viewing experience.
Closed captioning is essential for television and films, but it's also used in various other video content, including online videos, educational videos, social media clips, webinars, and live streams.
Different platforms and devices may have varying requirements for closed caption formats and display styles. To ensure accessibility and optimal viewing experience, captions may need adjustments based on the target platform.
Captioning standards and regulations vary between countries, and it's essential to comply with the specific accessibility laws and guidelines of the target audience's location.
While manual captioning can be time-consuming, there are cost-effective solutions available, including automatic captioning and professional captioning services. Moreover, the benefits of accessibility and broader audience reach often outweigh the investment.
In summary, closed captioning is a vital tool for enhancing accessibility and user experience in videos. Understanding the realities of closed captioning helps ensure that content creators and distributors make informed decisions to improve inclusivity and reach a broader audience.
At Aberdeen, we know the success of our company and our clients is dependent on the collective contributions and positive collaboration of all our team members. When we are all working toward the same goal, it’s important that each member of our team applies their areas of expertise and skills to achieve that goal. Successful teamwork balances a team’s abilities with the needs of the company, resulting in a more collaborative culture – and we’re all about that!
This carries over to our relationship with our clients. Our mantra has always been that when we come alongside your business, we’d like you to feel like we are an extension of your team. Think of our entire team as an additional member of yours that is always available – we’re the one that comes in early and stays late.
We recently took on a project from a long-time client and with all teams involved embracing that team-player mentality, communication was a breeze and everyone played a key role in coordinating a successful launch of our AberFast Lite service.
Here’s our President, Matt Cook, talking about what the value means to him along with a snippet from a discussion with our client on the project, John McKinnon of In Touch Ministries, where they discuss how the value of team players was the driving force behind everyone reaching their goal.
The full case study and interview with In Touch Ministries’ John McKinnon can be found here: https://aberdeen.io/blog/2021/08/25/case-study-satellite-replacement-with-terrestrial-delivery/.
This approach to tackling projects together as a team is a huge factor behind the success and excellence we bring to our service every day. We look forward to working with you and your team.
In this video, Becky Isaacs, Executive Vice President of Aberdeen Broadcast Services, interviews Rob and David from Arlington County, Virginia. The two have been partners in live captioning for the past decade. Rob serves as the executive producer for Arlington County's government cable channel, which encompasses both live streaming and traditional cable TV programming. Meanwhile, David manages engineering, provides production assistance, and oversees live captioning.
Arlington TV is the visual communications channel for Arlington County Government and serves a country of about 230,000 people. Described as a hyper-local version of C-SPAN, Arlington TV broadcasts county board meetings, commission meetings, talk shows, and community town hall meetings. With a mixture of cord-cutters and cable television subscribers in the community, it’s important that their programming is available across multiple outlets – Comcast Xfinity, Verizon FiOS, and online platforms like Facebook, YouTube, and Granicus.
Rob and David describe their technical setup, which involves connecting Aberdeen's live captioners to their audio and video systems via analog phone lines and the EGHT 1430 Ross OpenGear card, with Granicus handling the video feed and captions. This setup ensures that live captions are integrated into their broadcast and streaming content. They share advice for others in a similar position, emphasizing the importance of having a reliable setup and the benefits of human captioners for context and troubleshooting. They appreciate Aberdeen's flexibility and availability during unpredictable meetings, demonstrating a strong working relationship.
Arlington County began captioning due to a Justice Department requirement for ADA compliance. They had issues with smaller captioning companies in the beginning but eventually found Aberdeen back in 2009. Rob and David expressed their deep appreciation for Aberdeen's captioning services. They emphasized the absence of problems and the seamless captioning experience – Rob likened this achievement to an Emmy or Academy Award, highlighting that they haven't received complaints in many years, a testament to Aberdeen's reliability. David is particularly impressed by Aberdeen's accuracy and speed in handling challenging speakers, including those with accents or speech impediments. Aberdeen's team, including captioners and account managers, ensures that captioning support is available regardless of the time, accommodating early morning and late-night meetings.
Rob and David acknowledge that captioning often goes unnoticed when it works seamlessly, and they appreciate how Aberdeen's human captioners excel in understanding context, resulting in more accurate and contextually relevant captions. This focus on enhancing the viewer experience underscores the value of their partnership with Aberdeen.