Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Announcing CQCVoice - CQC's All Local Voice Control System
#31
The important thing is the levels in the mixer, as you say there's no volume knob internally. For me, they are peaking more up around 50 to 70%'ish or so usually. So that may well be a a big difference. I assume the 25% if when you are back at a distance? You should definitely see higher levels when up closer, or something would seem to be awry.

The input gain hasn't been automatically forced back down, has it? I notice it that it often gets pushed back down behind your back.

Some of the nicer mics also have auto-gain, which is useful. You don't get distortion when up closer but it cranks up more when the levels are low. And some of them have reverb cancellation type processing as well, which is going to be more significant as the room gets larger.

It would be awfully nice if there was some mechanism to have multiple mics. Of course a mic array could be an array of mics around the room, but that would have to be done at the hardware level, so that it shows up as a single audio stream to the recognition engine.

Vivek has a Kinect and was talking about getting one of the higher end ones, so he could possibly provide us with a useful results comparison.

A quick search on "distributed microphone arrays" turns up a number of research oriented papers. One of them is from 2013 and contains a reference to the 'emerging field of distribute microphone array processing', so apparently it's a fairly new thing.

Do you have the possibility of trying some alternate positions of the Kinect? It could make a significant difference. One thing you learn quickly if you ever set up a music studio room and try to tune it is that the interactions in a room wrt to frequency content are crazy, and wildly fluctuating frequency response may confuse the recognition engine as much as volume fluctuations.
Dean Roddey
Software Geek Extraordinaire
Reply
#32
Distributed mic arrays are certainly not new but are starting to emerge for the home market. I regularly work with top end ceiling and in table Mics for the corporate conference room space. Something like a triple element clear one mic has a range of about 20' from center. That style mic requires an expensive DSP device and a high level of tuning out noise and bad frequencies for proper performance. Each mic sucks up three inputs. A simple one mic system would cost in excess of $5k which is a bit much for the home market. Each additional mic is another $1k.
To get true ability to literally walk through your home and talk to the system would require multiple mics. This is the same as both Amazon and Google as you really need multiple devices for a comprehensive multi room system. That is why they both offer a smaller add on device to a larger main system.
Thanks,
Dave Bruner
Cool
Reply
#33
(05-05-2017, 09:33 PM)Dean Roddey Wrote: The important thing is the levels in the mixer, as you say there's no volume knob internally. For me, they are peaking more up around 50 to 70%'ish or so usually. So that may well be a a big difference. I assume the 25% if when you are back at a distance? You should definitely see higher levels when up closer, or something would seem to be awry.

The input gain hasn't been automatically forced back down, has it? I notice it that it often gets pushed back down behind your back.

Do you have the possibility of trying some alternate positions of the Kinect? It could make a significant difference. One thing you learn quickly if you ever set up a music studio room and try to tune it is that the interactions in a room wrt to frequency content are crazy, and wildly fluctuating frequency response may confuse the recognition engine as much as volume fluctuations.

Yes, the 25% is more from a moderate distance 5-7ft.  When I sit at my desk with the Kinect in front of me the meter is more in the top 50% range and up depending on how loud I talk. If only there was a loudness setting for the Kinect I think it would help to a point... Tried to see if there was a hack or something but I think it's deliberately missing as the Kinect is a higher end mic and has its own processing I imagine built in.

The gain is sitting at 100% constantly, no issues there. (unchecked exclusive mode)

I have tried different mic. positions.  I think your comments on that are spot on.  There's a lot of variables on performance at a distance.  I find it room and position dependent.  My best results are in my office with door closed.  About 15x15 and the basement is pretty good in some areas more than others.  It's a very large open space but has acoustical ceiling tiles and a low ceiling height.  My worst experience is in another large very open area of our house with 20ft ceilings.

One other thing I noticed is that picking the name of your lights or other things is important.  Turn Office light ON (or Bring up) works almost always even at a good distance but turn (or bring up) Kitchen light ON is not good.  The word Kitchen for some reason doesn't work well for me so I may change to something else and see if further tweaks like that may help.  Just one example, there are others that work well or don't work well.  This one seems to get reflected in your accuracy numbers (0.1 vs.95). I do see times when it recognized what I said, but the accuracy is very low so discards.  One thing may be to allow us to tweak that ourselves..like a slider of very accurate to terrible and let us live with the consequences?  that may not work, but just a thought.

All in all Dean, I think you've done a pretty amazing job with this and it does work, just have to make sure our expectations are in check with the variables of how people speak and mic technology etc.  Same as how some people are easy to understand and some not so much... that carries through to this technology.

If anyone jumps on this and can compare mics like say Alexa, Google vs Kinect and maybe a more pro model that would be super interesting.
Reply
#34
If you are getting 25% at 5 to 7ft, then I can definitely see why you would be having issues further away. At that point the signal to noise is going to be getting pretty low. Does talking considerably louder, just to test getting the levels up, make any difference? Of course if there are echos and reflections in the room, talking louder just creates more such room modes and may actually make it worse, I dunno.
Dean Roddey
Software Geek Extraordinaire
Reply
#35
Yeah, If I use my Dad voice like the kids are in trouble it's a bit better.  Perhaps my Kinect is not quite as sensitive as some, not sure... Needs more gain but it's maxed.
Reply
#36
I wonder if there's any internal settings. The settings we are changing are Windows level settings. But there's no applet in the Control Panel for the Kinect or anything.
Dean Roddey
Software Geek Extraordinaire
Reply
#37
Yeah I was wondering that too. I'll have to search around. A while back I was reading about program level commands to set gain, but I think it was still just Windows level not hardware but can't recall for sure..
Reply
#38
So the grass is on steroids here these last few weeks so I went out and cut it today. As I was riding around, mind in neutral, I had a brain flash. The first thing you think about when thinking about multiple mics is the doctoral thesis thing, where you take the mics in an array and spread them out and have even fancier DSP software to get rid of echos and delays and all that. But, it doesn't have to work that way, at least not for a command and control style system that is just matching a grammar, though it would for a conference room I guess that is looking to boil it all down to one good version of the speaker's voice.

If Voice was updated to accept multiple mics (still would have to be arrays), it could just run multiple instances of the reco engine, each running independently. When you speak a command, all of them will hear it. Some may not hear it well enough to report anything, the others will likely report different confidence levels. But any good enough to report will all show up at basically the same time.

So, upon getting a reported event, it would wait a short period of time to see how many show up (not long enough to cause an issue since they would also show up really together.) It would then take a 'vote' approach. It would compare what all of them reported for each value. I they all agree on a value, then that's pretty much gotta be it. If they don't, it creates an average score for each reported value and takes the best one. If none of them agree or all of the levels for a given value are low, it treats it as a low confidence value is treated now and asks for clarification.

Of course if the overall score from a given mic is low, it can just discard it and do voting among those that provided reasonable overall command confidences. Out of those, of course it would make sure that they are all reporting the same command. If some percentage agree, the others could be ignored and it could work out the above process on the ones that agree.

So that would effectively allow you to put three or four mics in a room and get very good coverage and improve the overall accuracy not just in terms of a mic always being closer to you, but by providing a group consensus on what was said.
Dean Roddey
Software Geek Extraordinaire
Reply
#39
Here are some thoughts on using CQC Voice after a few weeks of it being up and running (Kinect Setup), with a comparison to the Echo setup I have in the same room.

* Big Positive - I see more activity on the CQC log monitor with Echo and the secured connection than I would have expected (Meaning someone is making some attempts).  Being all local is a big positive for CQC Voice  to not worry about intruders.

* About 6 feet away, it hits all command given.  It becomes hard of hearing (like me) about 10' away and struggles to hear commands.  Further than that you have to yell.  Echo can definitely pick up further away.

* Having it tied into the AutoGen feature makes it very limited as compared to the Echo.  You have to have a V2 driver, whereas with the Echo, it is driven by global actions (for me) that I can set with any driver.

* CQC Voice struggles with off and on, over about 6 feet away.  It will normally do off, when you want on.  Echo seems to get it right most of the time.

* With a TV nearby, it does respond to certain things it hears (Echo does too but not as much), not a big deal but seems to be more sensitive.

* Volume Control - One thing I like about the Echo is that I can increase or decrease the speaker volume if background surroundings have changed, and it can be done via a voice command - with CQC Voice, you set it, and if it is not where you want it, you can't change unless you go to a windows screen and adjust.

I think if we can get more people using it, trying new microphones, it would only help this.  It's been fun testing this out and I do see great potential.  It would be great if we could set it up to use Global Actions like the Echo so that it can control more than the Autogen setup.
Reply
#40
You know about the 'room modes' right? That's one way you can invoke arbitrary global actions. I could add a voice control for the volume easily enough I guess.

Did you see the discussion above about multiple mics? That could be a way to deal with larger rooms.
Dean Roddey
Software Geek Extraordinaire
Reply


Possibly Related Threads...
Thread Author Replies Views Last Post
  Throwing another (sort of hybrid) voice control option out there Dean Roddey 25 1,268 09-12-2017, 07:49 PM
Last Post: potts.mike
  Control of Epson Projector Using IR Jnetto 6 467 08-08-2017, 05:34 PM
Last Post: Jnetto
  doing math on local vars indygreg 6 586 07-12-2017, 05:51 AM
Last Post: indygreg
  SOLVED: Media System Config Issue agarden 3 606 04-29-2017, 07:25 AM
Last Post: agarden
  System migration pjgregory 2 587 02-19-2017, 05:46 AM
Last Post: pjgregory
  System slowly dying Ron Haley 3 850 01-03-2017, 05:29 PM
Last Post: Dean Roddey
  Older system setup znelbok 5 1,009 11-23-2016, 02:10 PM
Last Post: znelbok
  Upgrade to 5.0.1: Local config object store Jnetto 2 658 11-20-2016, 06:12 PM
Last Post: Dean Roddey
  Any way to control Wolf and Subzero brand appliances ghurty 8 1,133 06-29-2016, 06:53 PM
Last Post: znelbok
  Problem with System Volume? pjgregory 10 1,920 05-13-2016, 10:53 AM
Last Post: Dean Roddey

Forum Jump:


Users browsing this thread: 2 Guest(s)