Posts: 2,407
Threads: 244
Joined: Oct 2007
With audio input and this guy, is it still pc based mics or can this be implemented in WebRIVA?
|Z-Wave|Sonos|Tivo|Hue|Plex|Roku|MyMovies|Echo|
Nest|Harmony|Neeo|LG TV|Smarthings|
Posts: 40,483
Threads: 491
Joined: Aug 2002
It wouldn't be doable currently in WebRIVA. The audio still has to be gotten to CQC. If it was going to be supported it would be more likely via your just doing a SIP based phone call to CQC. That's an existing mechanism for getting audio from a phone to somewhere else, so that would be the better way to do it.
Dean Roddey
Explorans limites defectum
Posts: 40,483
Threads: 491
Joined: Aug 2002
09-11-2017, 07:24 PM
(This post was last modified: 09-11-2017, 07:28 PM by Dean Roddey.)
And incredibly obvious issue just hit me. How do we deal with a wake up word or phrase? There's no way to do that at all, which is a huge hole in this entire scheme. We can't just sit there and continuously stream audio to the Amazon server in the hopes of hearing the wake word in all of that. With the Echo, it processes that wake word locally, somehow, not sure how they go about that. But we have no way of doing that.
If it's some scenario where you press a button and talk, or you dial a number and talk, then that's one thing. But to just start a conversation, that's another thing altogether. I could watch for what appears to be just low level noise and then watch for a short burst of audio followed by a short period of low level noise, grab that and send it off. But it would be constantly doing that, and what if there's other sounds going on. I'm not remotely likely to be able to get that right. And of course you pay for these audio transmissions as well. It's not much, but you don't want it sending off many hundreds a day.
So that basically makes it completely useless for as a real replacement for the Echo or to replace local speech recognition in CQC Voice. Dang, why didn't I think of that before? From their FAQ:
Quote:Q. Do I need a wake word to invoke an Amazon Lex intent?
Amazon Lex does not support wake word functionality. The app that integrates with Amazon Lex will be responsible for triggering the microphone, i.e. push to talk.
I mean I guess I could use local speech recognition just for the wake phrase. Given that that would be the only thing in the entire grammar, ambiguity wouldn't be much of an issue. I'd have to then quickly shut it down and start collecting audio myself to hand off to the lex bot. Then when the conversation completes, quickly set it back up again. I guess that would work. We'd lose the better ability to pull text out of background noise for the wake word.
It would though again put us back to only being able to run in a tray app. We lose all of the nice ability I was counting on of moving all this stuff into a service.
Sigh...
Dean Roddey
Explorans limites defectum
Posts: 40,483
Threads: 491
Joined: Aug 2002
Well, thinking obsessively about this more last night as I was trying to go to sleep, the opposite issue of recognizing a wake-up phrase or word came to me. How do we know when speech has ended? It's the same problem, but without a solution unless we do our own sensing. At best that would only work relative to a basically silent room, else we'd be getting into some doctoral thesis signal processing.
Not sure how I failed to foresee these issues. Using local recognition as a hack for catching a wake-up phrase could have been done, but knowing when speech has ended really isn't practical. If it wouldn't work at all when the TV is on, or the vacuum clean is running, then it wouldn't be of much use.
I assume that the Echo works differently, i.e. similar to how local recognition does. Once it sees the wake-up word, it just starts streaming audio to the server until the server tells it it either has recognized an utterance or failed to. So the Echo itself is not responsible for having to figure these things out. It doesn't have to care what the local noise level is, it just passes audio to the server which is responsible for trying to pull a recognizable signal from the noise. The same with local recognition, we just feed audio to the engine and it will just tell us if it ever recognized something. We don't have to figure out when the user started/stopped talking.
Doing that with this API puts all that responsibility on the client, where (lacking some sort of 'push and talk' type mechanism), it's unlikely to be possible. If we had the local signal processing and language recognition capabilities to do that sort of stuff, we probably wouldn't need Amazon anyway.
So, in short, I think that this is a wash. Too bad I wasted a few days on it, but hindsight is always more acute than foresight. At least it drive and tested some useful new cryptographic features that will probably come in handy at some point.
So I guess I should go back to the previous plan of improving the recognition capabilities of CQC Voice via local recognition capabilities.
Dean Roddey
Explorans limites defectum
Posts: 40,483
Threads: 491
Joined: Aug 2002
I did think of one potentially clever way to use this guy. If you wanted to use their really nice voices for TTS, just set up a dummy lex bot that just has one intent with one utterance that is just an open ended literal value so that it will accept any text. Set up a simple javascript handler on the AWS side, which can tell the bot what to return to the caller, and have it just echo back the text sent to it, and set up the bot to send back the text as audio.
So, to do TTS with their voices, just send it the text you want to speak, and it'll stream back the converted text as audio, which you then play. I think that would work.
Dean Roddey
Explorans limites defectum
Posts: 2,407
Threads: 244
Joined: Oct 2007
If your going to do that can we actually get a text bot?
Also the below got shutdown after a couple days. I can only assume by amazon.
https://bespoken.io/blog/silent-echo-for...w-in-beta/
|Z-Wave|Sonos|Tivo|Hue|Plex|Roku|MyMovies|Echo|
Nest|Harmony|Neeo|LG TV|Smarthings|