Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Throwing another (sort of hybrid) voice control option out there
#21
With audio input and this guy, is it still pc based mics or can this be implemented in WebRIVA?
|Z-Wave|Sonos|Tivo|Hue|Plex|Roku|MyMovies|Echo|
Nest|Harmony|Neeo|LG TV|Smarthings|
Reply
#22
It wouldn't be doable currently in WebRIVA. The audio still has to be gotten to CQC. If it was going to be supported it would be more likely via your just doing a SIP based phone call to CQC. That's an existing mechanism for getting audio from a phone to somewhere else, so that would be the better way to do it.
Dean Roddey
Software Geek Extraordinaire
Reply
#23
And incredibly obvious issue just hit me. How do we deal with a wake up word or phrase? There's no way to do that at all, which is a huge hole in this entire scheme. We can't just sit there and continuously stream audio to the Amazon server in the hopes of hearing the wake word in all of that. With the Echo, it processes that wake word locally, somehow, not sure how they go about that. But we have no way of doing that.

If it's some scenario where you press a button and talk, or you dial a number and talk, then that's one thing. But to just start a conversation, that's another thing altogether. I could watch for what appears to be just low level noise and then watch for a short burst of audio followed by a short period of low level noise, grab that and send it off. But it would be constantly doing that, and what if there's other sounds going on. I'm not remotely likely to be able to get that right. And of course you pay for these audio transmissions as well. It's not much, but you don't want it sending off many hundreds a day.

So that basically makes it completely useless for as a real replacement for the Echo or to replace local speech recognition in CQC Voice. Dang, why didn't I think of that before? From their FAQ:

Quote:Q. Do I need a wake word to invoke an Amazon Lex intent?

Amazon Lex does not support wake word functionality. The app that integrates with Amazon Lex will be responsible for triggering the microphone, i.e. push to talk.

I mean I guess I could use local speech recognition just for the wake phrase. Given that that would be the only thing in the entire grammar, ambiguity wouldn't be much of an issue. I'd have to then quickly shut it down and start collecting audio myself to hand off to the lex bot. Then when the conversation completes, quickly set it back up again. I guess that would work. We'd lose the better ability to pull text out of background noise for the wake word.

It would though again put us back to only being able to run in a tray app. We lose all of the nice ability I was counting on of moving all this stuff into a service.

Sigh...
Dean Roddey
Software Geek Extraordinaire
Reply
#24
Well, thinking obsessively about this more last night as I was trying to go to sleep, the opposite issue of recognizing a wake-up phrase or word came to me. How do we know when speech has ended? It's the same problem, but without a solution unless we do our own sensing. At best that would only work relative to a basically silent room, else we'd be getting into some doctoral thesis signal processing.

Not sure how I failed to foresee these issues. Using local recognition as a hack for catching a wake-up phrase could have been done, but knowing when speech has ended really isn't practical. If it wouldn't work at all when the TV is on, or the vacuum clean is running, then it wouldn't be of much use.

I assume that the Echo works differently, i.e. similar to how local recognition does. Once it sees the wake-up word, it just starts streaming audio to the server until the server tells it it either has recognized an utterance or failed to. So the Echo itself is not responsible for having to figure these things out. It doesn't have to care what the local noise level is, it just passes audio to the server which is responsible for trying to pull a recognizable signal from the noise. The same with local recognition, we just feed audio to the engine and it will just tell us if it ever recognized something. We don't have to figure out when the user started/stopped talking.

Doing that with this API puts all that responsibility on the client, where (lacking some sort of 'push and talk' type mechanism), it's unlikely to be possible. If we had the local signal processing and language recognition capabilities to do that sort of stuff, we probably wouldn't need Amazon anyway.


So, in short, I think that this is a wash. Too bad I wasted a few days on it, but hindsight is always more acute than foresight. At least it drive and tested some useful new cryptographic features that will probably come in handy at some point.

So I guess I should go back to the previous plan of improving the recognition capabilities of CQC Voice via local recognition capabilities.
Dean Roddey
Software Geek Extraordinaire
Reply
#25
I did think of one potentially clever way to use this guy. If you wanted to use their really nice voices for TTS, just set up a dummy lex bot that just has one intent with one utterance that is just an open ended literal value so that it will accept any text. Set up a simple javascript handler on the AWS side, which can tell the bot what to return to the caller, and have it just echo back the text sent to it, and set up the bot to send back the text as audio.

So, to do TTS with their voices, just send it the text you want to speak, and it'll stream back the converted text as audio, which you then play. I think that would work.
Dean Roddey
Software Geek Extraordinaire
Reply
#26
If your going to do that can we actually get a text bot?

Also the below got shutdown after a couple days. I can only assume by amazon.

https://bespoken.io/blog/silent-echo-for...w-in-beta/
|Z-Wave|Sonos|Tivo|Hue|Plex|Roku|MyMovies|Echo|
Nest|Harmony|Neeo|LG TV|Smarthings|
Reply


Possibly Related Threads...
Thread Author Replies Views Last Post
  Control of Epson Projector Using IR Jnetto 6 338 08-08-2017, 05:34 PM
Last Post: Jnetto
  Announcing CQCVoice - CQC's All Local Voice Control System Dean Roddey 66 6,079 06-28-2017, 10:03 AM
Last Post: Dean Roddey
  Any way to control Wolf and Subzero brand appliances ghurty 8 941 06-29-2016, 06:53 PM
Last Post: znelbok
  A/V Component Control rbejr 20 1,836 03-25-2016, 09:03 AM
Last Post: Bugman
  A 'Find Widgets' option Dean Roddey 14 966 01-15-2016, 02:20 PM
Last Post: Dean Roddey
  IV remote control dlmorgan999 5 1,001 07-04-2015, 08:23 AM
Last Post: dlmorgan999
  Minor sort issue with states dlmorgan999 2 719 06-03-2015, 07:52 AM
Last Post: dlmorgan999
  Generic app control - 4.5.0 Mark Stega 33 2,954 10-26-2014, 05:07 PM
Last Post: Dean Roddey
  CQC/USB-UIRT to control Comcast STB pbohannon 4 1,211 09-08-2014, 07:56 AM
Last Post: zaccari
  How to tell a popup light switch which one to control ghurty 1 817 05-06-2014, 07:43 PM
Last Post: Dean Roddey

Forum Jump:


Users browsing this thread: 1 Guest(s)