Winter Olympics Sign Language broadcasts Digital People —— The Perfect Combination of Technology and Humanities!
There are over 27 million hearing-impaired people in China. They have an enormous demand for education, social interaction, entertainment, and access to information like the rest of us. As a mass communication medium, TV is one of the most popular communication channels. However, traditional artificial sign language translation has a huge workload, and it is extremely difficult for the host to cooperate with the sign language host. In order to facilitate the hearing-impaired people to enjoy the Winter Olympics equally and conveniently, BRTV launched an intelligent sign language broadcast digital person with the technical support from LUSTER and Zhipu AI. She completed nearly 100,000 sign language corpus studies in just over 3 months, and translation accuracy is up to 90%. In programs such as Beijing News and Good Morning Beijing, this sign language broadcast digital person carries out special sign language broadcasts of the Winter Olympics to the hearing-impaired community.
But how did she do that? As in a normal scenario, it takes over 2 years to be proficient in sign language, as sign language movements have complex expressions, and the sequence of speech differs from normal speech.
This is all attributed to the advancement of intelligent digital human technology.
1. Highly accurate multimodal sign language corpus collection solution-Create 100,000 high-quality winter Olympic sign language corpus
In recent years, the construction of AI systems focuses on the layout of the algorithm layer and the application layer, and the construction of the data layer is far from adequate, especially for the digital human-related industries, the quantity, quality and open-source of the underlying database are still obviously insufficient. The existing domestic sign language corpus database is small in quantity and mostly based on images, videos, and other two-dimensional planes, which cannot meet the needs of AI training. Because the sign language sequence differs greatly from that of Chinese, the dialect differentiation is also more complex, and information needs to be conveyed through expressions, mouth patterns, and actions. In addition to traditional two-dimensional flat image and video acquisition, three-dimensional body movement and expression information data acquisition and structured parameter expression are essential. As a comprehensive information carrier of body, gestures and expressions, the construction of the sign language corpus database has a more obvious demand for 3D motion information capture.
2. AI sign language digital brain- Intelligent extraction of key semantic, automatic generation of sign language sequences with a high accuracy rate
The digital brain of sign language is a computer imitating the brain of a hearing-impaired person, based on the technical support of “Wudao 2.0” super large-scale artificial intelligence model, converting the Chinese text information into a sequence of sign language words, mainly including the research of Chinese semantic distillation model and AI sign language word separation fast coding algorithm. The AI sign language word separation algorithm is used to divide the distilled Chinese text into corresponding sign language word sequences based on the Winter Olympics sign language corpus for digital human expression input.
3. Cross-modal anthropomorphic generation algorithm generates sign language sequences into corresponding anthropomorphic actions, gestures, and expressions. The expressions are natural to understand.
Digital human is the carrier and presentation form of Winter Olympic sign language broadcast. Through LUSTER’s high-precision realistic digital human full-process production solution, it can achieve one-click digital modeling, highly restore real human hair and skin, pore-level detail reproduction, and more realistic and intimate. Through the cross-modal anthropomorphic generation algorithm, it can take the sequence of sign language vocabulary and generate corresponding action information to drive the digital human model to make corresponding actions, gestures, and expressions. This algorithm can take into account the coherence of short-time adjacent gestures and the semantic integrity and consistency of long-time gestures, support normal speech speed drive, and natural and smooth action expressions.
Technology makes life better!
The presence of the Winter Olympics sign language broadcast digital person allows ALL communities to enjoy the Winter Olympics event equally, conveniently and without barriers. In the future, this technology will provide more convenient services in more places. LUSTER will also continue to promote industrial innovation and give back to the community with better services.