Fastest rule-processing for automation?

wkearney99

Senior Member
For me the single biggest impediment to more elaborate home automation is delays incurred by rule processing.  As in, you want the motion sensor to properly activate the lights, but do so based on a set of conditions.  As in, it's after 10pm, only bring the lights up to 25%, or don't bother bringing them up to 100% if there's already enough daylight from the skylights and it's after 6:30am on a work day vs 9am on a weekend/holiday.  Meanwhile, fire an alarm event if there's not supposed to be anyone in the space at all.  Lots of variables/conditionals there.
 
Same thing for handling dimming changes done with a remote or other controls.  If it takes too long to do interactive changes commands tend to get stacked up and you end up not being able to make the desired changes without a lot of fiddling.
 
Indeed, adjusting dimming levels is probably something that can be better handled by pre-programmed scenes, but there's usually a lot of time between using the space and knowing what kind of scenes would be most useful.
 
Automation has come a long way but recent detours to using cloud-based services has not done us any favors when it comes to responsiveness (or reliability for that matter).   But processing times for hardware have dramatically improved in the last decade.  Which (if any) automation systems taken effective advantage of this?
 
There's always the complication of "you can't do it all within one framework" and then it becomes a hodge-podge of lashed together hacks.  Which very quickly fails the WAF.  So don't think I'm not aware of that.
 
I ask this more as a discussion starter rather than a quest for a specific answer.  But I'm curious as to what, if any, emphasis the different automation schemes have put on response times and interactive-ness?  Which ones are already known to be quick, or too slow?
 
A lot of it will depend on the actual devices under control. There are sort of two ways that information can flow:
 
1. Event driven
2. Polled
 
If the latter case, if the device has to be polled, it can only be polled so fast, and if it has a fair number of bits of info that have to be gotten it could half a second to a few seconds before the automation system can even know that something happened. Or, more to the point, that the driver for that device knows it. If the part of the automation system that invokes those sorts of rules "triggered events' we'd call them, also has to poll the devices to watch for changes, then that adds more time to it. 
 
If the whole thing is event driven, then the device immediately sends an event to the driver. The driver sees that the value has changed and, if it's something that is needs to report, maybe it sends out a datagram over the network reporting the change. The part of the system that handles those triggered events puts it into a queue and processes it as soon as it can. If the product is multi-threaded, then multiple handler threads are available to grab things out of the queue and handle them.
 
 
In that latter sort of case they can happen quite quickly, almost instantly. In the former scenario, they can be quite piggy. Given a reasonably well designed automation system, I'd think that the bulk of the issues comes from devices. Many devices just aren't well designed for automation system integration. They have to be polled, and they aren't quick to respond, and maybe they provide a lot of information which makes it even worse, because that means it's longer and longer before the driver gets back around to the bit of info that might need to cause something to happen. 
 
Systems like Z-Wave are a good example. If all your modules aren't of the asynchronously reporting type, then they have to be polled. Z-Wave is slow, so it could be five or ten or thirty seconds (in a large system of such modules) before the automation system even knows that something happened. So no matter how efficient the automation system is, it's going to be slow. If the devices report asynchronously, then the notification goes out as soon as it happens, and things are much faster. Of course it's not as fast as something that has a wired internet connection, since Z-Wave is still slow in comparison to those, but the async reporting makes a huge difference.
 
Yeah here didn't bother dealing with pure software for rule processing as it got way too convoluted and literally time consuming; such that I went to hard loops for the sure bet of making things work.
 
I think one of the great overlooked scenarios in the event vs polled debate is having a hybrid.  As in, it's dumb to execute logic when the underlying conditions haven't changed.  The trick is how to know if that's true or not.  That and knowing whether it being true has any real impact on the desired outcome.  
 
As in, when dealing with holidays and other calendar related items it's far too expensive (processing-wise) to bother checking an internet-based calendar every time.  Better, perhaps, to have an asynchronous process babysitting that separately.  Sort of a cache-hit kind of mentality.  It does raise the resource requirements of keeping that data locally, and the babysitting, but it keeps it out of the loop for local event-based triggers.  It's not like we're talking about managing data for thousands of users and events.  Sure, there's lots of edge cases, but in a residential setting it doesn't strike me as being too big of a problem. 
 
Trouble is the managing systems of a lot of HA systems I've encountered don't have much in the way of prioritizing and scheduling asynchronous (thankfully Chrome spell-checking gets that right for me)  activities.  An awful lot gets done over and over in the main event code for a device/activity and that doesn't seem entirely effective, at least with regard to response times.  Sure, it's great for being able to write simple code (grossly generalizing here) but lots of that stacked up lead to poor performance.
 
Like tailoring lighting activities with calendars.  There's a lot of 'moving parts' involved.  But you don't always need to know the data is 100% accurate.  As in, school calendars, they're not going to change much once the school year starts.  At least not for regular holidays.  But you do need to factor the potential for weather-related or other interruptions and how that might ripple.  Lunch menus are predictable, until there's a snow day or other closure, then things get bumped.  The questions there are 'for how long' and 'when do they reset back to normal'.  Same thing for bus schedules, after-school and sporting events.  Factoring those into wake-up alarms or other 'daily assistance' activities benefits from being 'accurate' but they're not quite 'critical enough' to require resource-hungry confirmation every single time.  Yeah, it's nice if they're right, but there's some leeway to be allowed if some part of the puzzle isn't accessible.
 
I'm not taking aim for or against any one particular HA framework, just mulling over where things have been, where they are now and where it might be useful to see them moving toward.  
 
That and I recognize that what I might find logical doesn't necessarily translate into someone else's idea of worth doing, or at least not profitably.  It's annoyingly difficult/expensive having personal assistants/staff in the real world.  It's not really much less difficult having virtual ones.  Depending on staff to show up is just as annoying as having internet uplinks or servers going offline.  There's a trade-off to be made between paying for good help vs being Scrooge.  That and not having your soul sold out for a quick buck by staff or cloud-based server management schemes.
 
Dean Roddey said:
Software based can be plenty fast. It just requires that all of the bits cooperate. 
 
Agreed.  But there's also the matter of whether or not the framework is designed to just slog through the tasks methodically and robotically (ie, slowly), or if there's a degree of learning possible.  

In one sense I get what something like the Nest thermostats wants to accomplish.  But on the other hand I realize that chasing efficiency for something like stable space conditioning doesn't usually need much more than methodical scheduling.  Thus spending more for a Nest seems pretty dumb compared to using a decent programmable unit.  Or the 'reward' of their leaf scheme rings hollow, making the expensive purchase look bad in retrospect. 
 
People want to make leaps of faith when they see things being automated.  They're quite often disappointed to discover there's no magic behind the curtain and what was advertised can't actually rise to their level of expectations.  There's plenty of fingers to point there, between unrealistic expectations and outright lies masquerading as marketing.  But it still falls back on stuff not being thought out and implemented in ways that effectively address user expectations.  
 
For triggered type events, there's little overhead in and of itself. As long as the device reports state changes asynchronously, that gets rid of the primary issue. That then goes to the automation system. All it has to do is decide whether that particular reported state change needs to make something happen. For CQC there are a set of 'filters' that are used to look at the incoming report of a change and decide whether a given event needs to be triggered. That doesn't take very long. Any events that pass that test are queued up, and one of a set of worker threads grabs the next queued up event and processes it. There's no much overhead involved beyond how quickly the device can make the change apparent to CQC.
 
For scheduled events, CQC pre-calculates the next runtime for each event each time it's invoked. It maintains a list of them, sorted by next run time. All it has to do is look at the item on the top of the list. If it's not time to run that, it can't be time to run any of them. If the top one's time has come, it queues that one up on the same queue as above, and one of the worker threads grabs it and runs it. 
 
Some simpler systems may not be multi-threaded, so they have to finishing doing one thing before they can do another. CQC is highly multi-threaded, and maintains 'thread farms' that always waiting for something to do, They take almost zero systems while waiting but can very quickly wake up and grab new work to do, and since there are a set of them, they can process multiple events in parallel. And of course CQC is also multi-process as well, since it's modular and there are multiple programs running, each of which is typically managing a considerable number of threads actively working on things or waiting to do so.
 
It may also be the case that GUI based programs are not going to react as fast because they are window message based, which is not a very high performance environment, and can be affected by your own interactions with the GUI. Client/server based systems like CQC keep the processing in the background, in that true multi-tasking/threaded environment where it can be done efficiently, regardless of what the user is doing in the foreground.
 
Protocols comms are usually the delay for Insteon. CPU processing is very fast and decisions cause very little delay, imperceptible to the human eye.
 
I tend to use direct scene connections between my MS units and lights, while ISY watches and turns the lights off again based on logic. Lights are on full before the blink of MS LED is done.  This precludes that decisions need to be made in advance.
 
For light levels that are based on time-of-day this is quite easy by presetting the lamp controller to the brightness level and ramp speed based on the time-of-day or ambient brilliance of the room. Also easily done with ISY.
 
For manually entered triggers, like wall switches etc. a slight program delay is not as evident, especially if  0.5 - 2.0 second ramp is added to the perception.
 
Insteon doesn't do anything by polling as the complaints with that style of protocol never ends. Even the TCP/IP protocols have turned to push technology after the original concept has failed us in certain usage styles.
 
In the electrical grid system world most of the SCADA and RTU systems I dealt with still use polling information gathering.
OTOH: huge processors are solely dedicated to this multi-comm job in a MAN system and the lines are all dedicated for nothing other than that purpose. This is not the case in a shared comm line we call the Internet or a single RF/powerline  channel for HA. If the master control was polling every half second what are the chances of a trigger getting through without traffic collisions? 
 
Well, I dunno about the half second thing being an issue. With modern local LANs, that would be an imperceptible amount of traffic. I doubt it would hardly ever be an issue. I imagine quite a lot of things could be polled at half second interval without any real problems, particularly if it was at the UDP level. And of course there are various efficiency tricks that can be played like a serial number thing so that the device can just send back a trivial 'nothing has changed' response, instead of just redundantly returning the data over and over.
 
But, still, even a half second poll interval, even assuming you can get all of the relevant info in a single calls (which is often not the case) could mean noticeable delays, since that doesn't include response time from the device or downstream delays (even if small.) So the average latency could be, say, 300 to 350ms, and the max could be from half a second to three quarters maybe, depending on device response time. Those are noticeable delays, when added to the small amount that inevitably occurs down stream as things are being processed.
 
Another thing you have to consider as well though is that most devices cannot be sent commands while there's an existing command or query outstanding (waiting to be responded to.) So most device drivers have to hold up outgoing commands until the current query completes. Since all triggered events (and human driven actions) require outgoing commands to devices, or almost all do, those also have to get counted in the time taken. The more often a device polls, the more likely it is that, when your event is triggered and you try to get it to do something, that it has to delay that action to wait for the current poll call to complete. There's nothing that can be done about this, other than changing the device to be non-polled of course. When it's async reported, this issue doesn't come up, other than an occasional poll that's often still required to keep the connection alive to make sure the device is still responding.
 
In a modern, multi-user system, other user's commands also have to be processed completely before another command can be sent, or more polling can be done, so those can also sort of randomly introduce delays as well, though those would be much less common.
 
You base your response on Ethernet traffic, not the HA protocols that operate much slower over powerline or Rf which are usually slower than the slowest  Ethernet, and not the medium of communication for this issue.
 
Now using one of these slow HA protocols try polling every one of my 100 devices and their multiple nodes on each device every half second and get a response, Ack or Nak.  Not happening. The Tx and Rx hardware would be tied up for 20 seconds every 0.5 second poll getting collision free responses. We need an answer to a poll or the status of all is just a guess. :)
 
Why bother playing a numbers game without taking usage into account?  That's the fallacy of a lot of polling systems.  As I mentioned, not everything needs to be accurate or up to date all of the time. I'd venture it'd be more efficient to monitor activity and track status locally rather than polling "just to be sure".  That and learn whether some devices have states that can't reliably be counted on to be accurate.  If they're high use devices, within a given timeframe, there'd be some value in polling them.  But they'd likely be the exception, not the rule.
 
Agreed. Why poll a thermostat more than once per minute. OTOH most of my devices are wall switches and lamp/appliance modules. If somebody turns a lamp on with a keypad and my HA doesn't know about it, and then the MS turns off the light that was supposed to stay on manual we have a failure to communicate.
 
Reporting by exception works much better for keeping things current immediately. This is a requirement for good logic based HA in my book. Many complain of this with Zwave after coming over to Insteon. They just didn't know how bad it was with some controllers until experiencing a difference.
 
Thermostats?  Why poll them AT ALL?  If ever there was a device that's ill-suited to automation, by golly the thermostat is it.  Schedule it, set it and forget it.  If anything it ought to be handled on a separate bus as to avoid congesting traffic for just about anything else.  Let the brains in the thermostat do the job and leave it be.
 
As for registering activity, this again seems ripe for tiered scheduling.  There's likely lots of devices we don't care about for a great portion of the time.  It's only during certain intervals that their status matters.  I may have 100+ devices but based on activity sensing and scheduling it's likely I don't really care about more than a dozen of them at any given time.
 
Unfortunately it's not possible to do what you are suggesting, or not practical. It's impossible to know at what point something might need to know the state of device X. If they need to know it, it's not useful for them to get the state it was in 30 minutes ago. They need to know the state it's in now. And of course even if you think that a thermostat isn't important, it's guaranteed that some percentage of users don't agree. They may want something to happen if the temp changes by a certain number of degrees, or track how often it's on for power usage purposes or react to the mode being changed from off to something else by checking the windows and warning the user if any are open and so forth.
 
There's almost no device for which plenty of folks won't come up with a good argument for having up to date status, double negative I know. Ultimately you really need to have up to date status on everything under control. If the device cannot provide it async, then you have to poll it.
 
Of course not all polling is bad. There are plenty of devices that are serial or USB, have fairly small amounts of data, and reply quickly. They don't suffer particularly for being polled. And it does allow the driver for those simple devices to in turn be very simple. No one else is sharing that serial connection so it doesn't matter if it's fairly highly utiliized.
 
I think a lot of automation has suffered at the hands of edge cases.  Without ever fully addressing the realistic needs/desires of the masses.  All kinds of 'knobs on' fetishistic control but failing to address how normal people live.  
 
In that context, where life is pretty predictable, I'd venture it ought to be possible to fine tune things such that the amount of contention for the network would be minimized.  
 
Back
Top