July 22, 2024

Valley Post

Read Latest News on Sports, Business, Entertainment, Blogs and Opinions from leading columnists.

Just a 15-second sample of OpenAI's AI voice model of action – Artificial Intelligence

Just a 15-second sample of OpenAI's AI voice model of action – Artificial Intelligence

The model, called Voice Generation, has been in development since late 2022 and is what ChatGPT's reading function is based on.

OpenAI provides limited access to its text-to-speech platform Which he developed, under the name Voice Engine, can generate synthetic voice, based on a recorded clip of just 15 seconds. This voice is able to read instructions written in the same language as the real speaker or in a combination of other languages. “These bite-sized applications help us come up with approaches, safeguards, and how we envision how the voice engine can be used for good in a range of fields,” OpenAI said in a blog post.

Companies with access to the platform include edtech company Age of Learning, imagination platform HeyGen, health app software leader Dimagi, AI developer Livox, and Lifespan, which develops health systems.

In the following samples, published by OpenAI, one can see how Learning era This technology creates virtual physical statements and reads “real-time custom responses” written by GPT-4.

First, the audio clip in English, which is the basis:


Here, we have the three AI-generated audio clips, based on the model above:




OpenAI said it began developing the voice engine in 2022, and that the technology already supports the voices used in text-to-speech, as well as the read-aloud feature provided by ChatGPT. In an interview with TechCrunch, Jeff Harris, a member of the Voice Engine development team, said the model was trained using “a combination of publicly available data and legally licensed materials.” OpenAI said the model will initially be available to about 10 companies.

See also  PS5 Slim coming out this year says.... Microsoft - Playstation

Text-to-voice conversion through artificial intelligence is one area of ​​this technology that is still developing. Although the focus is mainly on instrumental sounds or natural sounds, there have also been attempts at sound generation, but to a much lesser extent, partly due to issues raised by OpenAI itself regarding how such materials are used.

Meanwhile, the US government is trying to limit the harmful use of AI voice-generating technology. Just last month, the Federal Communications Commission banned robocalls using voices generated by artificial intelligence models after complaints that citizens were receiving phone calls using a cloned voice of President Biden urging voters to vote. no Come to the election.

According to OpenAI, its partners have agreed to abide by the company's terms of use, which stipulate not to use the materials to promote the messages of individuals or organizations without their prior consent. At the same time, partners must obtain “explicit and informed consent” from the native speaker, not create ways for users to shape their own voices, and inform listeners that the voices being heard are generated by artificial intelligence software. OpenAI has also added inline watermarks to audio clips to track their original source and effectively track where footage has been used.

OpenAI has proposed several steps that it believes could limit the risks surrounding such tools, including phasing out voice access to bank accounts, policies to protect the use of human voices in AI models, and providing more comprehensive information about malicious use of AI and AI development. Systems to monitor where content generated by AI models is used.

See also  The front page of a newspaper in 1970 accurately predicted the solar eclipse of 2024!