Thread Rating:
  • 1 Vote(s) - 1 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Project X Revealed!
OK, you guys have been listening to me talk about Project X for a while now. Here is a first demo video. I hope you are as excited about it as I am. I think it will make a huge difference to CQC's capabilities.

As of Jan 24, this is the latest, superseding the original post above.
Dean Roddey
Explorans limites defectum
OK, so that looks very cool!  I'm quite excited to learn more details.  

What are the system requirements to get that new functionality working? Just a microphone or how does it hear me? Voice training?  Is there a fronted to configure what it can interact with as far as CQC drivers/fields or is everything on the table automagically?
Hoping that my surface pro 4 supports this.
Nest|Harmony|Neeo|LG TV|Smarthings|
Cool! I'm guessing that  there is some sort of natural language back end processor ike that you are tying into?

Will this work with any CQC driver, how does it get mapped? 

Input devices,  we have the Amazon Echo and  perhaps Google Home  as current options,  not to mention Cortana, Seri, and Android voice recognition,  how does Project X accept input?

When can we start playing with your new creation?
god damnit you had to do this while I was on a mega-travel project. Which is going really well, but could easily last for ANOTHER 12 months.

At least someone else will get to beta test this and shake it out first.
Some of my devices: Sonos, Aeotec zWave, Nest, Rain8Net, Various H/T
What's next: CQC-Voice, Brultech GEM
My vlogs:
[Image: WKdPOVCG5LPaM.gif]

I'm assuming since we've been asking the question about Alexa that this will know which room a command was given so will know which lights to turn off when the command is given to turn off the bedroom lights. Super cool Dean! Interested to hear about the hardware to be able to put this into use around the house.
Seems interesting!  I am sure you don't have the bandwidth to write an AI from scratch, so you must be using a cloud based AI.  I assume it must be cloud based, otherwise we might need to install some serious hardware to make it work.  Let's hope that its Alexa as they are already the market leader and there has been a lot of work here on using it with CQ.  I now have an Echo a and 3 dots in the house, though I only use Weunch's hue simulation driver.

An important issue will be to make it work out of the box, rather than the current performance of loading bits of code to different Amazon web sites.  Of course, giving the flexibilty to load your own utterances would be great, but it really do some basic stuff when you install it.

Personally, I think this is an end run around the Apple RIVA viewer.  While you have said that supporting the non PC folks is the most significant business issue for CQC, why have an HTML IV always playing catch up with the real PC version, when you can just talk to CQC on yor phone or tablet.  Way to go. Big Grin Big Grin

PJG an
Is this going to require a Kinect sensor?
Nest|Harmony|Neeo|LG TV|Smarthings|
(01-19-2017, 12:30 AM)pjgregory Wrote: Seems interesting!  I am sure you don't have the bandwidth to write an AI from scratch, so you must be using a cloud based AI.  I assume it must be cloud based, otherwise we might need to install some serious hardware to make it work.  Let's hope that its Alexa as they are already the market leader and there has been a lot of work here on using it with CQ.
while I suspect you are right about the cloud, I really hope you are wrong...
I really don't want my house bugged by who knows who...

Alexa, is Law Enforcement Listening?

and I suspect that is just the beginning, as it seems hackers are good at hacking, I don't want everything that is said within range of my HA becoming public... even if it is just sent to Russia... (not sure why they would care what I had to say, but hey, Russians!)

I would much rather have it run locally and have everything stay 100% local...

that said, it is cool...
needs a better name than Zira though...
maybe "Red Queen", as HAL has already been taken...and Dave's not here...  (I need better movie references...)

on the other hand, it is about time... I have been asking for a CQC-AI driver for like a decade now I think...
so does this mean the CQC-Psychic interface driver will be coming soon too?
NOTE: As one wise professional something once stated, I am ignorant & childish, with a mindset comparable to 9/11 troofers and wackjob conspiracy theorists. so don't take anything I say as advice...
OK, here's the skinny...


The hardware is a Kinect for XBox One sensor (with USB adapter cable to make it usable on the PC.) This is the V2 Kinect (not the V1 which was for the XBox 360.) The reasons for using the Kinect are that it's a microphone array, not just a microphone. And it's got some nice functionality beyond what we are currently using that we can add over time, such as facial and skeletal recognition. And hand/body gestures for control. The former could allow for things like privileged commands by proving that you are who you are.

It can also know where you are in the room, relative to other things, which could be used in interesting ways. I saw a demo where a guy set it up so that he could point to something he wanted to control and use the other hand to do a gesture to control it. It doesn't cost much more than a full sized Echo, but will allow for much more functionality over time.

The speech software is just Microsoft's Speech Platform 11, which provides the text to speech and speech recognition.

Purely Local, No Cloud

This means also that we don't need any cloud processing. The AI stuff is my own, we aren't using any outboard processing for that either. So no worries about the internet connection going down or your conversations being listented to. Well, they could break into your network and do that maybe, but if that happens you have bigger problems to worry about.


What 'thugh cloud' gets you is a lot of processing power, which is required to do open ended, general language recognition. The Echo doesn't have to do much, just a few words in a row, the rest being fixed text as part of the utterances that is just has to match. But it can do that much, without training, which we cannot. Even with all that server power we know it often doesn't get it right, because it has to choose from all possible words without any guidance or context.

So, in the Echo, when you have an utterance like: "Please turn off the {target|kitchen light}". All it knows is that it has to match the first, fixed part, and it has to see two words, which it has no idea what they mean. But it does translate that speech and gives us the two words.

In this scheme, we cannot do that. We don't get the text spoken. Everything spoken has to match pre-defined grammar rules. So the entire grammar is 'fixed'. That doesn't mean it's shipped as is, though. The applications can have rules in the grammar which it dynamically updates at runtime. So there can be a rule for lights, for playlists, etc... that will be updated when the program runs. But, in the end, it just matches that (fixed there afterwards) grammar, and all we get is what it thinks it matched and a confidence level for each rule that defined the spoken phrase.

So, in the type of utterance above, we'd get that it was the turn off a light rule and the light name rule. For the light rule, we would get what it considered the best match along with a confidence level. That's how it asks you for confirmations, if the confidence level is not high enough. So, in the video, when I said, "Please load the music playlist 'classic rock'", I don't get any of that text. So it wasn't me looking at 'Classic rock', seeing that it doesn't exist and finding a 'Roots Rock' and asking if that's the one. The recognition software did the closest match (which was Roots Rock) but indicated it was low confidence, so I got confirmation.

So we can't have any open ended text in the grammar. That requires some clever work to get around as much as possible, but it still works out fairly well.

You do also have to enunciate reasonably well. Slurring words together is the biggest cause of misfires. I do that a lot, and I don't even need any alcohol to do it. OTOH, unlike the Echo, there are recognition files for US, Canadian, British, Australian, and Indian English. So it may be better at recognizing those accents than the Echo does.
Dean Roddey
Explorans limites defectum

Possibly Related Threads...
Thread Author Replies Views Last Post
  Squint'z Theater Project Discussion Squintz 2 1,814 01-06-2007, 09:42 AM
Last Post: zaccari

Forum Jump:

Users browsing this thread: 2 Guest(s)