New Firmware 3.10A old comm/time problems back

10th May, 2013 - 0900C
 
Started from scratch this morning.
 
1 - reset EEPROM
2 - reset RAM
 
Using PCA via serial link. Updated Firmware to 3.11C. Started with a new configuration. 
 
1 - I changed IP to what I have been using and keys
2 - uploaded configuration.
3 - didn't work.  No IP connectivity.
4 - I then went to a console (test Omnitouch 5.7)
5 - noticed that console didn't change IP nor keys.
6 - entered IP and keys on legacy Omnitouch (using a spare next to the touchscreen hub)
7 - I then downloaded configuration via PCA/Serial
8 - I can now ping and access via IP using PCA
9 - next I configured Omnistat
10 - stuck here - I can see the Omnistat via PCA serial and network connection. 
11 - Keypad console, Omnitouch 5.7 and Omnitouch 5.7e's do not see the Omnistat (thermostat)
12 - reset ram - reset Omnistat (RC-80)
13 - Seeing the thermostat just fine with Snaplink and PCA (connected either via the network or serially).  I do not see the thermostat though with the keypad, Omnitouch 5.7 and Omnitouch 5.7e's. I have unpowered them and reset the Omnitouch 5.7's
14 - I am currently using an Omnitouch 5.7 spare (connected to the Omnitouch serial hub) for testing. 
15 - just noticed thermostat ambient temp went to zero and is staying there right now.
16 - unplugged power from thermostat.  Still reading zero ambient temperature from PCA
17 - unplugged network cable - temperature readings from omnistat came right up on PCA.
18 - reset RAM with network cable off.  PCA serially is still showing a thermostat temperature.  No console though is showing any thermostat connected.
19 - reloaded old config via PCA - left unplugged rest of serial devices for time being (X10,UPB,Z-Wave, Russound).
During the write of the config - PCA appeared to stop.  Cancelled write and did it again.
20 - Went to "test" Omnitouch and see my thermostat and temperature sensors just fine.
21 - left network cable off for time being.
 
I did notice that the status for the battery never went from 0 to a value during this testing.  When I put my configuration back in noticed that the battery status went to a value of 229 (battery is good and that is where it was at before testing).
 
I am still not sure why my thermostat didn't show up other than some misconfiguration.  That said though I did program my OPII in Florida with no issues (it is simplier though).  I must have missed something; so will try again at another time. (reset eeprom and starting from scratch)
 
Stuck.
 
Its been a few hours running and consoles continue to show ambient thermostat temperature fine.  Network connectivity appears fine.  Time sync'd once so far.  This is without the Russound, UPB, Z-Wave and X10 plugged in.  I did leave PCA connected via one serial port.  Left the IP Omnitouch screens on plus one Snaplink connection and coreserver.
 
10th May, 2013 - 1900C - 10 hours later - X10, UPB, Z-Wave and Russound serial devices unplugged.  RC-80 connected instead of the Omnistat2.  Time is still in sync, network connectivity is fine and ambient temperature is correct on OmniTouch (legacy and IP) screens, Snaplink screens, PCA (serially connected) and Coreserver.  This doesn't make sense to me.
 
Will connect UPB PIM tomorrow and see what happens. 
 
A few months ago switched the X-10 TW-523 for the XTB-IIR.  Its been working fine and I do not think that would be an issue.
 
Time is fine.
 
11th May, 2013 - 0900C
 
Reconnected UPB, Z-Wave, Russound and X10.
 
11th May, 2013 - 1042C
 
Reconnected UPB, Z-Wave, Russound and X10 at around 0800c
 
Watching coreserver communications.  The time could be in sync for hours then it goes off by 5 seconds to minutes at a time with a correction.  Network connectivity either is really good while watching Snaplink or I see wierd disconnects.  The Omnistat mostly is showing ambient temperature but occasionally does show zero degrees on the legacy serial consoles.  I never see this on the IP connected Omnitouch consoles.
 
Then right now with the coreserver time corrections happening; it's kind of fixing the issue but it doesn't make sense to me that I need to do this.  I also tested the network connection going to a switch but not connected to the main house network.  Works this way never losing time.  If I have it connected to the network then it starts with the time thing; then the disconnect from the network thing and then the Omnistat issue.  If I disconnect the network then the time stays in sync looking at it with the keypad consoles and Omnitouch serial consoles.
 
I am here thinking its more of an issue with my configuration  / firmware versus a hardware issue.  That is only a guess. 
 
I am though considering the purchase of another OPII board to swap this one out. 
 
11th May, 2013 - 1427C
 
Time sync has been corrected one time for 5 seconds from 1207C earlier today. 
 
TimeSyncTimer: Controller time 05/11/2013 12:07:55 out of sync by 5.0656419 seconds
 
This morning it was almost an every hour correction up to 30 seconds.  Ambient Omnistat temperature on Omnitouch legacy is fine.  No network issues. 
 
11th May, 2013 - 1627C
No time syncs have occurred all day.  This is getting very wierd.  No changes, no unplugging any devices, nada...Its almost like before when the issue would go away by unplugging the network cable for a day or so.  I have left the older RC80 Omnistat connected along with the HS serial plugin disabled.
 
11th May, 2013 ~ 2300C
Time sync off after 11 PM started every 10 minutes with an average of ~30 seconds off.  Network connectivity is still OK and still seeing ambient Omnistat temps just fine.  Going to disconnect Russound, Z-Wave, X10 and UPB serial connections again today.  I will try adding one at a time to see maybe if one is an issues. 
TimeSyncTimer: Controller time 05/12/2013 03:17:33 out of sync by 27.255793 seconds
TimeSyncTimer: Controller time 05/12/2013 03:28:34 out of sync by 26.113062 seconds
TimeSyncTimer: Controller time 05/12/2013 03:39:35 out of sync by 25.109735 seconds
TimeSyncTimer: Controller time 05/12/2013 03:50:34 out of sync by 26.26501 seconds
 
12th May, 2013 ~ 0400C
Just tried a Snaplink connection from laptop (wirelessly connected).  It is very slow to connect versus testing on and off yesterday.  Disconnected Russound, Z-Wave, X10 and UPB serial connections again.  Also disconnected serial for PCA cable as I am currently not using PCA but rather just coreserver to look.
 
12th May, 2013 ~ 0757C
Unplugged board at PS and battery.  Disconnected one AUX 12VDC out going to multiple devices which were not really related to the panel itself.  Reconnected all serial devices that were earlier disconnected.
 
12th May, 2013 ~ 1122C
One time correction at 0835C and none afterwards.
TimeSyncTimer: Controller time 05/12/2013 08:35:54 out of sync by 6.3490704 seconds
 
12th May, 2013 ~ 1345C
No time corrections seen in coreserver.  Network is fine.  Serial comm to Omnitouch screens appears good.  I might remove long patch network cable and reconnect network as originally configured in the next few hours.  Next day or so if this continues will reconnect Omnistat2.
 
12th May, 2013 ~ 1621C
No time corrections seen in coreserver.   Shut down coreserver. Removed long patch cable and reconnected to local switch.  It did not connect as I could not ping IP.  Reset RAM.  It still wouldn't connect.  I then unplugged it (power and battery).  I also unplugged Omnitouch hub and Omnitouch video hub.  Plugged the OPII back in and then Omnitouch hub.  Network connectivity worked fine after this.  I then restarted coreserver and watched it resync time.  Will watch it for a bit now before reconnecting the Omnistat2.
 
12th May, 2013 ~ 1712C
Started again with the out of sync time.  This time kept seeing a loss of connectivity. Streaming HD movie from NAS to another switch (never really paid attention before though).
 
TimeSyncTimer: Controller time 05/12/2013 17:12:55 out of sync by 5.2240474 seconds
CoreServer: CONNECTION STATUS: Retrying
CoreServer: CONNECTION STATUS: Retrying
TimeSyncTimer: Controller time 05/12/2013 20:40:52 out of sync by 10.2693693 seconds
TimeSyncTimer: Controller time 05/12/2013 20:51:34 out of sync by 26.1075619 seconds
CoreServer: CONNECTION STATUS: Retrying
CoreServer: CONNECTION STATUS: Retrying
TimeSyncTimer: Controller time 05/12/2013 21:02:32 out of sync by 28.404737 seconds
TimeSyncTimer: Controller time 05/12/2013 21:24:34 out of sync by 28.0853262 seconds
CoreServer: Starting up server (21:30)
 
Shut down Homeseer Plugin.  Removed serial cable.  Plugged in patch cable.  Powered down and restarted OPII & Omnitouch hub.
 
TimeSyncTimer: Controller time 05/12/2013 21:45:36 out of sync by 24.151132 seconds
TimeSyncTimer: Controller time 05/12/2013 21:56:30 out of sync by 30.2364328 seconds
 
13th May, 2013 ~ 0800C
Time sync'd every 10 minutes up to 30 seconds off all night.  Network was fine.  Disconnected Russound, Homeseer, Z-wave UPB, and X10.  Rest RAM, disconnected power.
 
13th May, 2013 ~ 1443C
No time syncs, network or comm issues.  Reconnecting UPB PIM.  Continue testing.
 
13th May, 2013 ~ 1854C
No time syncs, network or comm issues.  Reconnected Omnistat2.  Continue testing.
 
As discussed in other posts I have been having similar network connection related issues.  My OPII worked nicely for 6 or 7 years, then the network connection started to get flakey.  Loading changes to the panel automation would routinely fail so I just stopped making changes.  Then I wanted to use HaikuHelper and it became an issue again as HH would frequently report comm failures.  Finally I purchased an Omni 2e just to have something to swap in while getting the OPII repaired.  Made the required programming changes, put in the Omni 2e; has been working fine.
 
Great I thought, now I will send off the OPII for repair and swap them back.  Ooops-  I bench test the OPII and the network is back functioning (it had been totally non responsive for months).  So now what do I do...  If I send it in for repair will HAI charge me the $200 and return it "no problem found"?  How can they fix a problem they can't reproduce?
 
I see at least three posters here who are routinely having problems with their boards that they are trying to resolve themselves.  Are there any installers here (or HAI) with hints for tracking down intermittent problems of this sort?  I have bought and installed four Omni panels and influenced the purchase of many more over the years, but these ongoing issues are so frustrating I am hesitant to keep doing so until I better understand the underlying cause.
 
Does HAI have any sort of escalation process for troubleshooting (presumably only available to dealers)?  Since they appear to only repair and return the actual board you send in, if they think it is repaired but hidden problems may remain I would guess that installing dealers are more likely to install an entirely new board (which they can immediately swap in at the customer site) than to pull, repair and return the board to the customer.  Expensive for the customer, but very sensible for the dealer.
 
I would also think that the bad PR value of ongoing issues of this sort being discussed would be worth some enhanced assistance from HAI/Leviton to offset potential buyers choosing alternatives if they think they could run into similar problems in the future.  Having attended multi-day dealer training covering HAI and other products and interacting with professional installers who put these in routinely I know that Pete_C and others are taking a far more in-depth step by step approach to trying to track down the source of these problems than the typical dealer can afford to.  I would suggest that it would be a good investment of time on the part of HAI to directly (privately) assist him in determining once and for all what is going on with his problem so they can benefit from what they find to make it less likely to happen in the future, so that if it DOES happen again to another customer, the details will now be available in their FAQ/Knowledge Base to quickly resolve the problem.
 
In my case, I will leave the Omni 2e running my house, leave the OPII talking just to HH and see if one or both show signs of problems in the coming weeks.
 
I had a Ethernet port randomly stopping as well, and problem turned out to be a bug in the automation programming.  I would not ever declare the hardware bad until you clear the programming and see if that fixes the problem.  The fact of the matter is that HAI's programming is very flexible and at the same time, it contains no tools to detect problems.  It is very easy to make a programming mistake, where say, Flag X changes Flag Y and then Flag Y changes Flag X, but not know it.  For whatever reason, the only indication of a problem is the Ethernet port stops working.  There are no other error codes or messages to tell you about this problem.  PC Access should do a better job at finding errors but it doesn't. 
 
So if you are having random Ethernet failures, rule out a programming bug first.
 
Among my problem isolation steps was to keypad clear all RAM and configuration.  It is possible that I forgot to power cycle the board after one of those tests; could that leave it stuck in a loop perhaps?  I had assumed that a full reset occurs following those functions.
 
If my OPII keeps running OK standalone my next step will be to put it back into production and see what happens.
 
This morning I cleared the ram.  I then again unplugged the serial port to homeseer, UPB, Z-Wave, Russound and X10. 
 
Unplugged the power/battery.  Restarted again. 
 
I have not seen any time corrections since around 7 AM this morning. 
 
I am not seeing any comm errors on the ambient temperatures from the Omnitstat RC80 nor any network drop outs.
 
Now this is bugging me. 
 
I did originally test with no programming lines at all.  I will now add one serial connection at a time (UPB first) and let it run for a bit (days?).
 
pct88 said:
Among my problem isolation steps was to keypad clear all RAM and configuration.  It is possible that I forgot to power cycle the board after one of those tests; could that leave it stuck in a loop perhaps?  I had assumed that a full reset occurs following those functions.
 
If my OPII keeps running OK standalone my next step will be to put it back into production and see what happens.
 
Reset it, THEN disconnect the battery, then unplug it, and wait a minute. It will ask you to set the time. Until you do that, its not really reset.  You have to cycle the power. For me, nothing short of that reset the Ethernet port.  I believe its best to disconnect the battery first then power, and repower AC first, then connect the battery, in that order.
 
I am wondering if I maybe have a bad Z-Wave PIM?
 
Its a first gen PIM and I only have lamp modules / appliance modules on it.  They are working fine though.
 
Here I reset the RAM, disconnect the battery, disconnect power, disconnect same for Touchscreen hub, reconnect panel , reconnect touchscreen hub then do another reset of the RAM for good measure, then set the time.
 
I am still running fine over 12 hours now and earlier (noon) reconnected the Omnistat2 (removing the RC-80).  IP Omnitouch and legacy Omnitouch are doing fine.  Coreserver is doing fine with no time syncs. 
 
Started from scratch again.
 
1 - reset RAM
2 - reset EPROM
3 - configured IP / access
4 - nothing else configured.
 
Just going to watch to see if time goes out of sync via IP connection/core server. 
 
I have disconnected all of the serial consoles (keypads and Omnitouch).  No serial devices (Russound, UPB, Z-Wave, X10, Homeseer) are plugged in.
 
Next going to start disconnected all hardware on the board.  What a PITA this is turning out to be. 
 
I am still seeing the time sync issues with core server.
 
Now looking at my old threads to see when this intermittent issue came up.  Downgrading FW now to the 2.XX as I did notice it with the 3.10A firmware (even though it was very intermittent).  I am doing another eeprom/ram reset after this.
 
Just have coreserver connected via IP.  It would be nice if core server said whether the time correction was back or forward when syncing.  I am particular though about time in general and utilize an internal NTP server generating time from GPS satellites.
 
With firmware 2.XX running today.  Time issue cropped up right away.  Then coreserver started to disconnect with resets. 
 
Initially it was about once an hour; then the network connectivity to the panel was no more.  LED's still show network connectivity but I can no longer ping the interface.
 
During the day added the thermostat to the non configured new configuration.  PCA, Coreserver, Snap-link all saw the the thermostat fine.  The keypad and Omnitouch serial consoles did not.  Reset ram a couple of times and the consoles never did show the thermostat. 
 
Upgraded firmware back to 3.11c just to see what happens tonight.  (again doing the eprom reset along with the ram reset et al stuff).  I had a problem talking via the serial link to upgrade the firmware until I unplugged the network cable.
 
The thermostat not showing up on the serial consoles and showing up on the Snap-link bugged me.  I did leave it at the default name of TSTAT 1.  I changed the name to Thermostat and it showed up on the consoles (Omnitouch and Keypad).  Still just using PCA via a serial connection.  Watching my spare console thermostat plugged in the network cable.  The ambient thermostat temperature went to zero.  I have coreserver shut down for a bit.
 
Called HAI/Leviton support and had a long chat this morning relating to my issues.  I have a service ticket open right now. 
 
It was a very enlightening conversation.  There is no repair necessary to the board nor will a new board fix the issues.
 
I am going to test the "fix". 
 
That said will test out solution and write about it here shortly. 
 
I am reundoing what I undid yesterday so it take me a bit.  Removing the RC-80 and reinstalling the Omnistat2 (really do like it much better).
 
Worked. 
 
The following is what I did to correct my issues relating to time sync, serial com to Omnitouch 5.7's (ambient temp being zero) and networking disconnects et al.
 
Early this morning called HAI Tech support; opened a ticket relating to said issues.
 
I won't say this is the correct way; just that it works for me.  This is a test set up and will probably change in the near future.
 
1 - I reconfigured the HAI OPII network IP to using a different 3rd octet.  IE: I just changed the third number in the IP address.
2 - connected it to a Gb switch by itself
3 - connected a POE switch to said switch in #2 for the Omnitouch 5.7E screens
4 - connected a small combo firewall router LAN side to the Gb switch. 
5 - configured the IP, subnet mask on the internal pieces of the firewall such that the IP subnet matched the new IP of the HAI OPII panel
 
This is an autonomous isolated network with connectivity to the main house network via the firewall WAN port on the combo firewall box.
 
I currently have two setups for the security cameras.  One is based on the HAI legacy Omnitouch hub and the other is the ZoneMinder (http://zoneminder.com) (http://zoneminder.com) (http://zoneminder.com) (http://zoneminder.com) (http://zoneminder.com) (http://zoneminder.com) (http://zoneminder.com) (http://zoneminder.com) (http://zoneminder.com) (http://zoneminder.com) box with 8 analog and 8 IP cameras (2-4 more or less on line).  They are interconnected.
 
In order to see the cameras on the IP Omnitouch 5.7e's or testing with Snaplink I will need to put them on the same aforementioned LAN or via the WAN port on the firewall.
 
You can also configure a 2nd or 3rd internal port/NIC on the firewall (I do this today with my Smoothwall box).  Attach the network to another interface and create rules on the firewall between the internal LANs.
 
The above/aforementioned stuff above is working now fine for me.  I am not seeing any network drop outs, no zero ambient temperatures for my thermostat on the Omnitouch 5.7's and the time is not going out of sync.  I have not set up coreserver such that it goes thru the firewall but can get to the panel via PCA IP or PCA serially connected.
 
Here is the "why" piece.
 
The network interface on the HAI OPII is many years old.  The technology utilized in the firmware is also many years old. 
 
Simply put; the increase in the amount of traffic on my network (now some 70 plus devices) actually is taking down the network interface because it sees everything (its very promiscuous). Its making it work so hard that it shuts down the NIC and has an effect on the rest of the HAI OS in the embedded firmware.   Lately I have been streaming from many boxes connected to my LCD TVs to the NAS boxes increasing the amount of traffic on my LAN.
 
Per HAI Technical support; there will not be a firmware upgrade to fix this issue mostly based on the network hardware and technology utilized.  There will not be a hardware change or add on to the HAI OPII panel to fix said issues.  (redesign of the board?).  Someone did mention maybe a DIY or HAI add on NIC to serial board could be a solution.
 
I understand that this is a bit blunt and you will need to rethink and restructure your network infrastructure to accomodate/correct these issues if you are having them today.
 
I am not too familiar with HAIKU but this issue may directly affect said application mechanisms of the software network transport stuff or methodologies utilized.
 
In one way though it does create a more secure network for the HAI OPII panel.  With today's home networking LANs though it does change the way that the OPII will be implemented if you are using LAN/WAN connectivity to access or control your panel (touch screens, tablets etc).  You will need basic LAN / WAN / Firewall / routing understanding; but it is basic stuff.
 
BTW - the bouncing / "saturated" HAI NIC can take the switch down.  I have seen this once with an older switch.
 
pete_c said:
Here is the "why" piece.
 
The network interface on the HAI OPII is many years old.  The technology utilized in the firmware is also many years old. 
 
Simply put; the increase in the amount of traffic on my network (now some 70 plus devices) actually is taking down the network interface because it sees everything (its very promiscuous). Its making it work so hard that it shuts down the NIC and has an effect on the rest of the HAI OS in the embedded firmware.   Lately I have been streaming from many boxes connected to my LCD TVs to the NAS boxes increasing the amount of traffic on my LAN.
 
Per HAI Technical support; there will not be a firmware upgrade to fix this issue mostly based on the network hardware and technology utilized.  There will not be a hardware change or add on to the HAI OPII panel to fix said issues.  (redesign of the board?)
 
Sounds like the OPII interface can't handle excessive multicast / broadcast traffic on the network. As long as it is connected to a switch (not a hub) even if the NIC is in promiscuous mode it should only see traffic for itself and multicast / broadcast traffic.
 
If you get a chance open Wireshark on one of your computers, set the capture filter options to 'multicast or broadcast' and see how many results come in. I wouldn't think that normal ARP broadcasts from 70 devices would take down the OPII.
 
If you find a large amount of multicast traffic, which might be from video streaming, a switch with IGMP snooping should fix this. Switches without IGMP snooping transmit multicast packets to all connected devices. With IGMP snooping only devices that subscribe will receive the packets.
 
If you get a chance open Wireshark on one of your computers, set the capture filter options to 'multicast or broadcast' and see how many results come in. I wouldn't think that normal ARP broadcasts from 70 devices would take down the OPII.
 
Thank-you RSW686.  Yup playing with an old copy of Sniffer Pro a couple of days ago.  Will install wire shark on laptop.  I guess too I can just generate a bunch of traffic and see what happens.  The dinging of the interface has an immediate effect on what I see on the serial console (spare one by the panel).  I can also see the effects (as noted about) just by removing the network interface (almost instantly)
 
Just now did a quickie visual of the the LEDs on the panel and noticed there are hardly blinking compared to before.  Right now though just have the Omnitouch IP screens connected (only IP).  I have to configure the firewall such that coreserver can do email because it just exits when I run it.
 
There must be some wierd thresold cuz my problem was very intermittent initially and it did make sense that I could disconnect the NIC for a day and the next day it would be fine and work for months.
 
16th May, 2013 ~ 0640C
 
So far no issues relating to time sync, comm to serial touch and IP connectivity. 
 
Redoing the LAN side such that I shut off DHCP (?) and change the bit mask to 27 or 28.
 
I haven't connected the WAN side of the firewall yet.  Want to test coreserver and Snaplink on the main network.
 
Thinking here of just dedicated IP connected devices on the HAI network. 
 
Watching the network tonight for about an hour I saw two broadcast storms.
 
I am seeing network issues though with multiple wintel touchscreens connected to Homeseer via the logging on HS.
 

Attachments

  • pic-1.jpg
    pic-1.jpg
    40.7 KB · Views: 11
  • pic-2.jpg
    pic-2.jpg
    114.7 KB · Views: 12
Back
Top