Aflorithmic: “The potential of synthetic audio is similar to digital photography”

Aflorithmic, a London/Barcelona-based technology company, is pushing the boundaries of Audio-As-A-Service. They provide a platform that enables fully automated and scalable audio production using synthetic media, voice cloning and audio mastering, all of which can be delivered to any device, such as websites, mobile apps or smart speakers.

With it, claims Aflorithmic, anyone can create beautiful sound, ranging from simple text to music and complex audio engineering. All of this is possible without any prior experience in audio engineering.

For Dr. Timo Kunz, co-founder and CEO of Aflorithmic, the potential of synthetic audio is similar to that of digital photography: “In 2018, approximately 1,000 billion photos were taken. That’s more than 2.7 billion a day! It is estimated that ten percent of all photos taken were taken in the last twelve months. We expect a similar explosion in the production of synthetic audio.

For Kunz, it is important to explain what synthetic audio really is. “Synthetic audio uses algorithms to create and manipulate sound. It can be music, speech, other sounds, or all of these sounds mixed together. Most people will have experienced a product that uses text-to-speech (TTS ) – i.e. text that is “translated” into speech – you may know this from GPS or Siri or you may have heard it on TikTok. TTS models are often indistinguishable from human speakers”.

Highly qualified specialists in machine learning, software development, speech synthesis, AI research, audio engineering and product development work at Aflorithmic. The technology that has been developed attracts many professionals around the world. Recently, they hired former employees from Spotify, TikTok, and Glovo.

Novobrief sat down with Kunz to discuss the future (and present) of synthesized audio, the complexities of cloning a voice, and the emergence of audio as a central factor for the next few years.

What are the different functions today for synthesized audio production?

Audio is more than speech. We think of an audio experience as a vocal track, with sound design and post-production, bringing everything together and making the audio experience full and crisp. Aflorihmic’s has built an infrastructure to make this happen. It’s called api.audio and makes audio production scalable by automatically producing thousands of tracks in minutes.

We currently offer over 350 voices in over 50 languages ​​from 8 voice providers and this list keeps growing. We have also built a library with sound designs for different use cases such as advertising, news, education or lifestyle. Our product is api-first, which means it’s developer-focused. The big advantage is that it can be integrated with any platform, such as websites, mobile apps, smart speakers or games.

What sectors are you currently working on?

Over the past 3 years, we have extensively explored the impact of synthetic media on different verticals. Synthetic media production is growing rapidly and we believe it will totally change the way we produce and consume audio in the future. Currently we are seeing very strong interest from advertising and publishers and these are the main areas we will be focusing on this year.

Can you suggest some customer cases?

We collaborate with a content creation platform called Storyflash in Germany. They use api.audio to allow publishers to create their own newsletters. Since this process is completely automated, the publishing house can create new audio content using existing titles almost without lifting a finger. It uses its own smart sound design, which changes depending on the content as well as multiple speakers. The result is a far cry from the “speech-only” experience you might know from a screen reader. It’s more like a mini podcast or a short segment you’d hear on the radio. We’re currently talking to major publishers in Germany, Spain, the UK and the US, so expect to see more very soon.

Another use case is creating synthetic audio ads. Our technology is integrated with ad builders that allow you to type your ad text, choose a speaker and sound design. With just a press of a button, the ad maker will create your audio ad in just a few seconds. One example is our work with VocaliD, an American artificial intelligence company that recently integrated our technology into their Parrot Studio product. Deployment is scheduled for early February.

Do you think synthesized audio + immersive video is the future or is it already here?

I think it’s about to become the present. We will see a significant increase by mid-2022 as more businesses adopt the technology. We are confident that some of these adoptions will use our infrastructure.

Can you tell me more about the social commerce project involving Metaverse elements that you are working on?

We are working on a project with our strategic investor Crowd Media. A dedicated team is working on Social Commerce, which is a conversational AI experience with avatars. Think of Kim Kardashian having a one-on-one video chat with each of her followers. A first version of this product is planned for this year. As for Metaverse in the sense of a virtual world, we can definitely see the audio part working on api.audio. However, we are not actively developing anything specifically for this purpose at this time.

To clone a voice

How does the voice cloning process work?

It starts with a conversation about what the voice is intended to be used for. Once that’s clear, we’ll create a script for a voice actor to record. This is usually a few hours of audio data. These records will be processed by our machine learning infrastructure and then a model is being created. We will do a quality check on the model and it will eventually be accessible on api.audio.

What are we talking about when we talk about an ethical approach to voice cloning?

It is important to value the work of people. Our goal is not to replace voice actors or sound engineers, we want to help their industry make the transition. The idea is that licensing your voice becomes a new source of revenue. It helps voice actors to unlock more business opportunities by simply offering their model on platforms like api.audio. For example, we just added the original TikTok voice to api.audio. The voice actress behind the model, Bev Standing, sees these opportunities: she makes money every time her voice model is returned. Another thing we value is the right to be forgotten. This means that any voice actor who created a model with us can ask us to remove their voice clone from the API.

How to prevent fraud or identity theft?

First, it is in our interest to minimize the possibility of our technology being misused. However, both of these things are done by individuals.

Cloning a voice is not something you can do in an instant. Of course, it would be possible to hire us or one of the voice providers who collaborate with us using stolen recordings. I am confident that we will detect any questionable request, but I also want to clarify that there is no guarantee that we can prevent misuse of our technology.

It’s a fine line between monitoring what customers do with our technology and respecting their privacy. I don’t want to fall into whataboutism but you can also use Photoshop to fake identities or commit fraud. New technology always comes with risks, and I’m afraid voice cloning is no different.

How real can a cloned voice sound?

It all depends on how much data we have – the input determines the output. With a dedicated script, a good recording setup and a few hours of audio recordings, the voice will sound very real. For example, we are currently launching the world’s first podcast with a cloned speaker. Currently, long-form content such as audiobooks are still difficult to create due to missing nuances in the speaker’s voice that make them an excellent narrator. However, even these limitations will eventually disappear.

The Year of Audio as a Service

How does your business model work?

We operate a SaaS pricing model with monthly payments for API usage and different packages. On top of that, we also charge for production credits. Production credits are spent when you do things with the API. For example, creating a script with the API will cost you less than rendering speech using different speakers and enhancing the track with our automated real-time post-production.

What do you see as your main challenges in terms of growth?

The biggest challenge for us is to educate the market. Until now, it was not possible to adapt audio production as the process had to be manual given the need for a human speaker, a studio and experts such as sound and mastering engineers. . Now you can create millions of audio tracks in just minutes. Understanding and adapting this technology takes time. Often we think of use cases for potential customers to inspire them and show them how accessible our infrastructure is.

Will 2022 be the year of Audio-As-A-Service?

We certainly think so. Technology is often like an iceberg. The majority is below the waterline and you will only see a small portion of it lurking. We have worked intensively over the past three years to build the world’s first audio-as-a-service infrastructure. Now api.audio is at a level where it is both robust and extremely flexible. Being a B2B product, the sales cycles are also quite long and after a lot of pilots and integration work, you will see a lot more companies using our infrastructure in 2022.

What are your plans and goals for 2022?

More importantly, we will onboard a growing number of advertising and publishing clients. Additionally, we will be raising a Series A funding round. Finally, we will continue to push the boundaries of our technology to create an even better product.

About Norman Griggs

Check Also

10 Memes That Sum Up The Jurassic Park Movies Perfectly

It seems like the worst thing that happened to the dinosaurs after the asteroid disappeared …