Premise Z-Wave Status using RZCOP/VRCOP help

Motorola Premise
Wonky to say the least! I agree, I don't think the normal serial port issues I have (on several different devices) where I need to delete and re-add the port are normal for Vista or Windows 7. The only thing I know is that even the Onkyo driver I made caused me issues issues to where the port would stop working on the first reboot unless you toggled the transmission received flags.

You are correct, the ONLY user who has ever reported a problem close to mine is Chuck (who has the same Onkyo receiver). For the polling issue, no one else has ever had to reset a port periodically like me and I've also tried reading in the older forums for solutions. Note this problem was different in a big way: I actually had to reset the hardware to fix it! Also port spy never shows anything when I have the issue I normally have, but this time data was shown as being sent. I've never had a hardware failure like this, all of my problems have always been fixed by deleting and re-adding the port.

For this new issue, when I viewed port spy, nothing was being received, but data appeared to be resent several times (due to the drivers architecture is resends a job if there is no response), then after 3-4 tries the job was thrown out and the next job was processed, resent (no reponses), job thrown out etc... The port status showed open and data was shown in port spy as being sent.

Hopefully I can use an old laptop as a test machine and send 10 second polls for a week and never reset the serial port to see what happens.
 
Is there anything special about the existing PC's serial ports? Is it a USB-to-serial-port converter? Is the hardware on the motherboard or on a plugin-card? Is there another app running on that PC that periodically tries to use a serial-port? How many moons have passed since your last sacrificial offering to UART, the god of serial-ports? Grasping at straws here ...
 
The hardware is comprised of two Digi Portservers that are on the network. Then, I use Digi's portserver software to emulate the ports locally on the SYS server. I thought it was hardware too, but drivers using a dll file like the MR26A (on a digi portserver too) have NEVER given me a problem! It's only custom drivers that do this strange serial issue. I also recall trying a usb to serial adapter and still having the problem, but it's been so long that I'm not sure. I can also try a USB adapter when I get back in addition to the laptop test.
 
Good news! I'm using a new mini ITX premise server and after two days of polling I've gotten zero events for 2 consecutive failures. What is also strange is the delay I was experiencing after adding the thermostat is also gone. Now I can reliable count to three after manually toggling a vizia rf switch and see the automation browser update! I did move the RZC0P onto an 8 port usb to serial converter plugged directly into the mini itx box. Do you think that did the trick or was it having a fresh install of Premise on a different PC?
 
Could be...or maybe not. I was having a heck of a time getting John's Insteon driver to work. (and John did one h*** of a fine job of helping me troubleshoot...a true pro!). I finally nuked my server, reinstalled Premise and John's driver, it has been bulletproof ever since
 
Chuck,

Does nuke mean you uninstall Premise, then retinstall and use a backup file to restore everything? Or instead of restoring from a client backup file, did you actually have to re-import the insteon driver seperately then reinstall everything else from a single back up file? I'd like to know the process you used cause maybe I'll learn something ;)

Thanks in advance :)
 
etc6849,

What's your conclusion about the current version's (BETA_9.6) operational stability? Is it good enough for me to post this driver in the Downloads section?

PS
I had proposed a new feature in this post that would identify a Job's type. The idea was to selectively purge the Job queue of low-priority Thermostat commands as opposed to all low-priority commands (or something like that ... heck, it was 2 months ago and the details are foggy). BETA_9.6 has some but not all of the elements needed to implement this feature. However, if BETA_9.6 is working well, I'd rather not tinker with it at this point.
 
123, I think it's a great driver and has always been stable. I've been using it for many months successfully :)

However, the two consecutive error message returned shortly after my post, that's why I was asking Chuck how he nuked his system to see what the proper method for starting over is.

The only thing I can think of at this point is the thermostat has a hardware issue. I'm only polling the thermostat as always. The only constants from my old system are the thermostat, RZC0P and Premise.

Is 120 seconds too short of a poll time for a thermostat? Do non-Leviton thermostats need a longer wait time than Leviton light switches? I don't think wait time matters the way you have designed your driver. I'm turning off "oneway" for the thermostat and setting the RZC0P to ping 4 lights every 20 seconds. I'll report the number of errors in two days. If I have no errors, I'll know with pretty good certainty that it's the thermostat.

PS: I reinstalled 9.6 beta Sunday and had only 3 "ViziaRF experienced 2 consecutive failed Jobs." events since then. It will be interesting to see if polling lights causes errors or not.

EDIT:
I don't know if it helps, but I have attached the results of a few thermostat polls. I need to study how you implemented thermostats, but I'm wondering if this operation is correct:

>N28SE66,2
>N28SE67,2,1
<N028:064,003,001
<E000
<X000

Shouldn't your jobque function wait for <E000 before sending N28SE67,2,1?
 

Attachments

Attached are a bunch of switch polls too.

Note that there are a few cases like this:
>?N2
>?N2
>?N17
>?N17
<E000
<X000
<N002L000
<E000

Where the two lighting nodes are queried consecutively? I'm very confused now.
 

Attachments

... the two lighting nodes are queried consecutively? I'm very confused now.
That makes two of us.

I reviewed the operation of the job-queueing mechanism and I can't explain the clustered transmissions in the PortSpy logs you've posted. What is curious is that the same commands tend to cluster: the 67 and 68 thermostat commands and light status for nodes 2 and 17. There may be clue there but I don't get it.

Summary of how the job-queuing mechanism works:
  1. A "Job" represents a ViziaRF command that waits its turn to be transmitted.
  2. If the queue is empty (i.e. there are no existing Jobs), the first job that is placed in the queue is processed immediately (i.e. command is sent immediately).
  3. If the queue is not empty (i.e. there are existing Jobs waiting their turn to be processed), the next job is placed last in the queue and waits its turn.
  4. The receipt of an E000, from the RCC0P, is what triggers the transmission of the next job in the queue.
  5. If E000 is not received within a few seconds of transmission, the same job will be re-transmitted (up to three tries).
#4 is what introduces a pause between transmission and prevents commands from being transmitted in rapid succession.

So there are only three ways for a job to be sent:
1) If the queue is empty, the command is sent immediately.
2) If an E000 is received, after a command is sent, the next command in the queue is sent.
3) If an E000 is not received, the retry mechanism re-sends the same command.

Case #1 occurs after a period of inactivity.
Case #2 occurs when there are several commands awaiting transmission.
Case #3 occurs when there's a transmission problem.

Based on this operating theory, I can't see how two commands could be sent, in rapid succession, without waiting for at least one E000.

I'd be curious to see what DebugView has to say. BETA_9.6 logs everything to the Windows Debug Console and this log should be compared to what PortSpy has to say.

PS
Despite the inexplicable items in the PortSpy logs, it appears that all requests (thermostat and lighting status) do receive a reply so, overall, the driver is accomplishing its intended purpose.
 
If only I was home I could install debugview and get the results. Don't worry though, I can do this when I get home on Friday. I suspect port spy is lying to us about the exact timing of when commands are received.

I'm not sure how much the period of inactivity should be tweeked (from 1 below). It looks like you have this set to 5 seconds, which should be a reasonable time period for E000. I may try setting this higher and see what happens.

So there are only three ways for a job to be sent:
1) If the queue is empty, the command is sent immediately.
2) If an E000 is received, after a command is sent, the next command in the queue is sent.
3) If an E000 is not received, the retry mechanism re-sends the same command.

Case #1 occurs after a period of inactivity.
Case #2 occurs when there are several commands awaiting transmission.
Case #3 occurs when there's a transmission problem.

EDIT: I received two of the two consecutive job failures since yesterday so I deleted Vizia from custom devices, modified the baud rate to 2400 and increased the command delay to 250ms. It will be interested to see how this works. I'm going to rerun the same test I talked about in the previous post. PS: when I added the a fresh vizia driver under custom devices the funky port spy behavior fixed itself!
 
So changing the port and job time out sounded like a good idea, but still gave about the same number or events. I reinstalled a fresh copy of beta 9.6 to try again. I have set all devices I have to one way and set the polling interval to 20 seconds.

I think I finally see the value to using debug view after trying to use it to find a problem and it looks invaluable if you are making a complex driver. Debugview did spot one possible issue, but it may be by design. It seems after there are no more jobs to process, a bad packet is set. I was also lucky enough to capture an error if you search for consecutive in debug view part 2.zip (found on next post).
 

Attachments

Debug view part 2.zip attached. Part 2 captures a 2 consecutive job failure event. Note that part 1 and part 2 make up an entire debugview log from device discovery all the way to resetting the port.
 

Attachments

I took a very quick look at the Debugview logs and was surprised to discover the large number of "Bad packet" messages. There shouldn't be any "Bad packets"; something is wrong.

"Bad packet" is displayed when "gViziaPacketParser" can't determine what was received from the RZC0P. It is possible that some sort of valid (yet undocumented?) command is being received that is not coded into gViziaPacketParser ... or there is something else that is genuinely wrong.

There are two places in gViziaPacketParser where it sets iResult=0 (indicates it couldn't determine the command). Add a debugout statement after each iResult=0 that displays the contents of the "sPkt" variable. This way, you'll see the command that makes gViziaPacketParser unhappy.
 
Thanks 123 for taking a look!

The Debugview log is attached. I tried just adding debugout sPkt to the global parser, but nothing would display under debugview? Could there be some hex code being received that does not correspond to a valid ascii character?

Anyways, I ended up adding the following to the OnChangeOnNewData script near the bottom:

debugout "rxTextLine: " & this.RxTextLine & " EOT"

Wrapping the invalid ascii character seemed to result in a debugview log entry with no characters between the wrapped text, but atleast it resulted in a log entry! Any ideas on what to try?
 

Attachments

Back
Top