Show HN: An open source framework for voice assistants

by kwindlaon 5/13/24, 5:21 PMwith 39 comments
by awenixon 5/13/24, 6:59 PM

Nice to see an open source implementation, i have been seeing many startups get into this space like https://www.retellai.com/, https://fixie.ai/ etc. They always end up needing speech-to-speech models (current approach seems speech-text-text-speech with multiple agents handling 1 listening + 1 speaking), excited to see how this plays with recently announced gpt-4o

by ilakshon 5/13/24, 6:12 PM

This is great but we really need an audio-to-audio model like they demoed in the open source world. Does anyone know of anything like that?

Edit: someone found one: https://news.ycombinator.com/item?id=40346992

by johnmaguireon 5/13/24, 6:17 PM

Siri came out in October 2011. Amazon Alexa made its debut in November 2014. Google Assistant's voice-activated speakers were released in May 2016.

From what I can tell, Siri is still a dumpster fire that nobody is willing to use. And I have no personal experience with Alexa, so I can't speak to it. But I do have a few Google Home speakers and an Android phone, and I have seen no major improvements in years. In fact, it has gotten worse - for example, you can no longer add items directly to AnyList[0], only Google Keep.

Or, as an incredibly simple example of something I thought we'd get a long time ago, it's still unable to interpret two-part requests, e.g. "please repeat that but louder," or "please turn off the kitchen and dining room lights."

I find voice assistants very useful - especially when driving, lying in bed, cooking, or when I'm otherwise preoccupied. Yet they have stagnated almost since their debut. I can only imagine nobody has found a viable way to monetize them.

What will it take to get a better voice assistant for consumers? Willow[1] doesn't seem to have taken off.

[0] https://help.anylist.com/articles/google-assistant-overview/

[1] https://heywillow.io/

edit: I realize I hijacked your thread to dump something that's been on my mind lately. Pipecat looks really cool, and I hope it takes off! I hope to get some time to experiment this weekend.

by userhackeron 5/14/24, 3:59 AM

Just made https://feycher.com thats similar, but has realtime lip syncing as well. Let me know if you are interested and we can chat

by xan_ps007on 5/13/24, 7:24 PM

We're also building bolna an open source voice orchestration: https://github.com/bolna-ai/bolna

by russon 5/13/24, 7:49 PM

LiveKit Agents, which OpenAI uses in voice mode is also open source:

https://github.com/livekit/agents

by orliesauruson 5/14/24, 3:11 AM

The whole VAD thing is very interesting, keen to learn more about how it works and especially with multiple speakers!

by canadiantimon 5/13/24, 6:02 PM

Very cool, great work! I can def self using this when I start building in that direction.

by 35mmon 5/14/24, 2:30 PM

How would I go about using this to live translate phone calls?

by bamazizion 5/13/24, 6:10 PM

I wonder how the just announced "GPT-4o" with real-time voice impacts projects like this?

The demo on real-time multi language translation conversation blew me away!