OpenAI recently developed a ‘Voice Engine’ that can clone speech in less than 15 seconds.
The tool is still undergoing closed testing and can reclaim users’ voices; however, the company is cognizant of the possibility of misuse.
OpenAI, the AI company that developed the generative AI tool ChatGPT, has introduced a novel vocal duplication technology known as “Voice Engine.” This audio model accurately reproduces analog speech patterns, including intonation and vocal intonation, using a comparatively limited amount of the original audio.
The company writes in a blog post on Friday, “It is noteworthy that a small model with a single 15-second sample can generate realistic and emotive voices.”
The AI voice platform ElevenLabs offers an immediate voice replication tool that requires a minimum sampling duration of one minute. To maintain a professional service level, nearly ten minutes of continuous speech are required for optimal results in order to maintain a professional service level.
Bitcoin hosts approximately $70,000, but meme coins dominate.
The company presented a variety of instances that demonstrated the capabilities of this technology. In one instance, a vascular brain tumor significantly impaired a young patient’s speech, and OpenAI cloned her voice using an older recording she had created for a school assignment. Today, she sounds as follows, as determined by OpenAI.
Collaboration was established between OpenAI and Lifespan, a nonprofit organization associated with Brown University’s medical school, and the developers of the “alternative communication app” Livox, which catered to individuals with disabilities. The group successfully utilized a recording that the woman had produced for a scholastic presentation.
Subsequently, the Open AI Voice Engine successfully delivered immediate text-to-speech functionality, enabling the patient to communicate proficiently using her own voice.
Additionally, OpenAI demonstrated how HeyGen generates natural-sounding translations of speech uploaded in one language into another using its technology.
Voice Engine, which was reportedly first developed in late 2022, is already powering the preset voices accessible via ChatGPT’s Voice and Read Aloud feature and OpenAI’s text-to-speech API. As a result of recent developments, the organization claims it is exercising caution prior to a wider release.
OpenAI wrote, in reference to the widely denounced practice of “deepfakes,” “We hope to provoke a discourse on the responsible application of synthetic voices and how society can adjust to these novel capabilities.” An increasing number of private citizens, government officials, and celebrities are having their accents imitated for malicious purposes, including political campaigns, bogus advertisements, and blatant criminal activities. President Joe Biden of the United States has advocated for additional protections against the malevolent use of AI voice impersonations.
Indeed, Meta disclosed last summer that “potential risks of misuse” had stalled the development of its AI voice tool.
“In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not widely release this technology at this time,” according to OpenAI.
OpenAI is imposing restrictions on Voice Engine even before it is released to the public, including a catalog of notable individuals that it will not emulate.
“We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures,” according to OpenAI.
Without consent, the Voice Engine participants conducting testing today have consented to abide by OpenAI’s usage policies, which prohibit impersonating another person or organization. The company prohibits developers from developing methods that allow individual users to clone their own voices, and requires informed and explicit consent from the original speaker.
The blog post states, “We will make a more informed decision regarding whether and how to deploy this technology at scale, based on these discussions and the outcomes of these small-scale tests.”
Unexplainable LENX protocol transfers raise concerns.
Parallel to Voice Engine, Open AI is concurrently developing a number of other initiatives. According to CEO Sam Altman, the organization is preparing to introduce GPT-5 this year. Additionally, the organization showcased its generative video tool, Sora. Sora will reportedly surpass models such as Pika, Stable Video Diffusion, and Runway ML in terms of technological sophistication, according to the manufacturer.
Currently, Open AI restricts Sora to “red teamers” to prevent any potential exploitation.
Vocal Engine has the potential to surpass the performance of alternative vocal replication tools, such as those provided by Meta, ElevenLabs, WellSaid Labs, and open-source models like RVC.
Additionally, Open AI is developing a covert initiative known only by its moniker, Q*. Sam Altman, who declined to provide further information, stated that the research team was intent on developing techniques and strategies that improve the reasoning of artificial intelligence.