Video Conferencing Caption Quality Addressed in the Latest CVAA Report

Photo of a conference call on Zoom

On October 11, 2022, the Federal Communications Commission (FCC) released the latest CVAA biennial report to Congress, evaluating the current industry compliance as it pertains to Sections 255, 716, and 718 of the Communications Act of 1934. The biennial report is required by the 21st Century Communications and Video Accessibility Act (CVAA), which amended the Communications Act of 1934 to include updated requirements for ensuring the accessibility of "modern" telecommunications to people with disabilities.

FCC rules under Section 255 of the Communications Act require telecommunications equipment manufacturers and service providers to make their products and services accessible to people with disabilities. If such access is not readily achievable, manufacturers and service providers must make their devices and services compatible with third-party applications, peripheral devices, software, hardware, or consumer premises equipment commonly used by people with disabilities.

Accessibility Barriers

Despite major design improvements over the past two years, the report reveals that accessibility gaps still persist and that industry commenters are most concerned about equal access on video conferencing platforms. The COVID-19 pandemic has highlighted the importance of accessible video conferencing services for people with disabilities.

Zoom, BlueJeans, FaceTime, and Microsoft Teams have introduced a variety of accessibility feature enhancements, including screenreader support, customizable chat features, multi-pinning features, and “spotlighting” so that all participants know who is speaking. However, commentators have expressed concern over screen share and chat feature compatibility with screenreaders along with the platforms’ synchronous automatic captioning features.

Although many video conferencing platforms now offer meeting organizers synchronous automatic captioning to accommodate deaf and hard-of-hearing participants, the Deaf and Hard of Hearing Consumer Advocacy (DHH CAO) pointed out that automated captioning sometimes produces incomplete or delayed transcriptions and even if slight delays of live captions cannot be avoided, these captioning delays may cause “cognitive overload.” Comprehension can be further hindered if a person who is deaf or hard of hearing cannot see the faces of speaking participants, for “people with hearing loss rely more on nonverbal information than their peers, and if a person misses a visual cue, they may fall behind in the conversation.”

Automated vs. Human-generated Captions

At present, the automated captioning features on these conference platforms have an error rate of 5-10%. That’s 5-10 errors per 100 words spoken and when the average conversation rate of an English speaker is 150 words per minute, you’re looking at the possibility of over a dozen errors a minute.

Earlier this year, our team put Adobe’s artificial intelligence (AI) powered speech-to-text engine to the test. We tasked our most experienced Caption Editor with using Adobe’s auto-generated transcript to create & edit the captions to meet the quality standards of the FCC and the deaf and hard of hearing community on two types of video clips: a single-speaker program and one with multiple speakers.

How did it go? Take a look: Human-generated Captions vs. Adobe Speech-to-text