Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Throwing another (sort of hybrid) voice control option out there
#1
[So, though promising, in more ways that were eventually realized over the course of this thread, technical limitations of this system means that it is of no use to us in terms of a replacement for the speech recognition in something like CQC Voice. See the post below for details:

http://www.charmedquark.com/vb_forum/showthread.php?tid=10371&pid=148359#pid148359]

So Mike Potts has been after me for a while to look into Amazon's chatbot system. I read through some of the docs today. It does offer an interesting option. The sort version of it is:
  • It is essentially the intent system that the Echo uses, but we get to ship our own audio off to the servers, instead of it having to go through the Echo.
  • It has an API that would let me build up the intents and utterances and such, so that I could do something like CQC Voice does, i.e. create an easy to use system based on the auto-gen data, and update it any time you change the auto-gen configuration.
  • It inherently is conversation oriented, unlike the Echo, so it could work like CQC Voice does on that front, asking for information not initially provided.
  • And of course it's cloud based like the Echo, not purely local like CQC Voice.
So it would be CQC Voice, but cloud based. It would have the more complex setup like the Echo does, but it would have the high level of speech recognition that the Echo does. It would be simpler than the Echo (in that I'd automatically build up the utterances and intent stuff) but more difficult than CQC Voice since you'd have to create the AWS account and do some some setup, as you do with the Echo now.

It wouldn't be configurable like the Echo stuff is. If you want that, use the Echo as is. This would be for CQC specific, predefined commands like CQC Voice does, but with the Echo's high quality speech recognition. You'd still want to use an array mic like the Kinect or something of that sort. But we wouldn't be limited by the quality of speech recognition available on PCs at this time. I don't think you could use the Echo itself as the mic. If you want to do that, just use the existing Echo stuff probably.


I dunno. I'm iffy on it. It clearly has the huge advantage of massive processing power for audio to text conversion. But, if it was done, I think it would have to replace CQC Voice, since I just can't see continuing to support three different schemes. It's hard enough as it is. So it sort of comes down to, do we believe that we can get CQC Voice up another couple notches in speech recognition quality. If so, I'd argue for keeping it instead. If not, and folks will always be sort of underwhelmed by CQC Voice's local processing capabilities, then maybe it should be replaced and we just accept that cloud based processing is a necessity at least for the time being.

So I'm thinking let's see what I can do on the CQC Voice front with accepting multiple microphones for multiple, parallel speech recognition engines and taking the best of each. If that still isn't solid enough to get people excited, them maybe we dump it for something based on this chat bot technology.

But, I'm open to opinions. Is the security and privacy of all local more important to you than likely better speech recognition?
Dean Roddey
Software Geek Extraordinaire
Reply
#2
Oh, and one other point... I need to read more, but I think this one also has the advantage over the Echo in that it doesn't require calling back into your home network. I think it's all done by HTTP, and the stuff to speak just comes back in the HTTP responses. If so, then it only requires an outgoing connection, with no need for an open port. So that would be a fairly substantial advantage given then setting up the secure open port is a big part of the Echo setup complexity.
Dean Roddey
Software Geek Extraordinaire
Reply
#3
Here is the developer doc.
http://docs.aws.amazon.com/lex/latest/dg/lex-dg.pdf

I am under the impression that it could operate just as echo does but with text input instead of voice but dean read the docs differently.
|Z-Wave|Sonos|Tivo|Hue|Plex|Roku|MyMovies|Echo|
Nest|Harmony|Neeo|LG TV|Smarthings|
Reply
#4
It could do that. You can send it text. But that's of minimal interest to me, for the amount of work involved and the ongoing maintenance of it. But you can also send it audio and it provides more or less the same back end as the Echo, so you can do Echo-like things but with your own microphones, and without the extra complications of having to open ports to get the responses back.

OTOH, once it was set up, it's your account and it's just an HTTP interface. You could send it text commands if you wanted. As long as you sent it a full utterance with no missing or incorrect bits, it would accept that as though it were spoken and you wouldn't have to deal with any back and forth to get more information.
Dean Roddey
Software Geek Extraordinaire
Reply
#5
Text is my only interest, I'll use the Echos for voice.
|Z-Wave|Sonos|Tivo|Hue|Plex|Roku|MyMovies|Echo|
Nest|Harmony|Neeo|LG TV|Smarthings|
Reply
#6
Another option from Microsoft.

https://dev.botframework.com
|Z-Wave|Sonos|Tivo|Hue|Plex|Roku|MyMovies|Echo|
Nest|Harmony|Neeo|LG TV|Smarthings|
Reply
#7
Another big advantage I just thought about is that this would allow us to get away from the tray app. We have to use a tray app for CQC Voice now because speech recognition and TTS aren't supported in the background. If we didn't have to use the speech recognition engine, then it could be moved into a background server that would easily handle multiple simultaneous incoming audio streams and send them off to the server to be processed. It could be configured to send any resulting reply text to whatever you want, to be spoken or displayed. So that could be a tray app or an IV or something else that can take text and speak it.

That would get us away from the need for one machine for one mic and one output type of thing we currently have. Any way you can get the audio stream to the server, and any way you can get the text back to a speaker, would be possible. Of course if want to use a phone or something, then that gets into a whole other area of telephony that would have to be dug into. But, that would also be a lot easier under this scheme.
Dean Roddey
Software Geek Extraordinaire
Reply
#8
Have you looked any further at the chat/text bot thing? between Echos and the Microsoft voice stuff I don't see an additional need there.

Interestingly I saw today that you'll be able to call up cortana on an echo and alexa in cortana desktops in the future.
|Z-Wave|Sonos|Tivo|Hue|Plex|Roku|MyMovies|Echo|
Nest|Harmony|Neeo|LG TV|Smarthings|
Reply
#9
I guess one thing we could consider as a first use of this API is to update our current Echo support to use it. It shouldn't, I don't think, require any changes in how you use it, i.e. it would be completely backwards compatible with our current Echo support. But it could let us get rid of the requirement to expose the web server and set up certificates and such. So it could make the current Echo support easier to setup and safer.
Dean Roddey
Software Geek Extraordinaire
Reply
#10
I think it's tough to say what direction is best until it was tested out.  CQC voice does likely have a bit of room to get better perhaps and the current 1 mic per pc and locating a PC/mic combo etc is too constraining when compared to Alexa/Google home.

For me I think it wasn't as much the local recognition power, but the amplitude of the speech captured from distances.  Lower amplitudes just didn't get recognised properly. When listening to the recorded files via the debug code I could recognise what was said very well always with my ears, some recordings were just a bit fainter as would be expected.  Perhaps the more powerful cloud recognition engine would handle that better?  Would need to try with the same mic. same speech and see if the Amazon engine recognised it better I guess with a lower amplitude voice.  If it did then it would be well worth it to use in spite of the cloud requirements IMO.

That all said, Google homes are going 2 for $250cdn right now.  I just don't know if the dollars for fancier array mics. and PC's just for voice can compete with that.  Might be best to add Google Home support?  Ideally using that somehow (or Alexa) as the Mic. for CQC voice I think would be fantastic... but perhaps impossible.  That'd be my vote.   No Alexa support up here yet which is why I bring up Google, I bet Alexa is quite economical too.
Reply


Possibly Related Threads...
Thread Author Replies Views Last Post
  Control of Epson Projector Using IR Jnetto 6 593 08-08-2017, 05:34 PM
Last Post: Jnetto
  Announcing CQCVoice - CQC's All Local Voice Control System Dean Roddey 66 8,850 06-28-2017, 10:03 AM
Last Post: Dean Roddey
  Any way to control Wolf and Subzero brand appliances ghurty 8 1,280 06-29-2016, 06:53 PM
Last Post: znelbok
  A/V Component Control rbejr 20 2,393 03-25-2016, 09:03 AM
Last Post: Bugman
  A 'Find Widgets' option Dean Roddey 14 1,116 01-15-2016, 02:20 PM
Last Post: Dean Roddey
  IV remote control dlmorgan999 5 1,187 07-04-2015, 08:23 AM
Last Post: dlmorgan999
  Minor sort issue with states dlmorgan999 2 806 06-03-2015, 07:52 AM
Last Post: dlmorgan999
  Generic app control - 4.5.0 Mark Stega 33 3,439 10-26-2014, 05:07 PM
Last Post: Dean Roddey
  CQC/USB-UIRT to control Comcast STB pbohannon 4 1,360 09-08-2014, 07:56 AM
Last Post: zaccari
  How to tell a popup light switch which one to control ghurty 1 894 05-06-2014, 07:43 PM
Last Post: Dean Roddey

Forum Jump:


Users browsing this thread: 1 Guest(s)