I use dotNET to handle the voice recognition, I use it at work, so I like to leverage that technology.
The Google idea is interesting. I wonder if the internet latency would be a problem.
You see, it's not just recognizing words that is the problem. When you deploy this in the real world, you will find all kinds of background noise, kids, dog barking, the blender going in the kitchen, people watching YouTube videos in the background, music playing....
These distractions cause havoc in the voice recognition system. THAT'S the big problem.
I have built some linguistic rules so that my program only responds to syntax that it is expecting.
Such as "set master bedroom lights to 75 percent".
And it doesn't listen to anything you say unless you say "computer" first, that opens up the phrases.... Just like on the Enterprise.
After you have not spoken to the computer for more than 30 seconds, you have to use "computer" again to get its attention.
I'm still nowhere close to satisfied. To really make this work will require AI that will probably be available in 10 - 15 years. You would need AI for the computer to be able to discern the speech intended for the computer vs background conversation.