OmniPro II "sudden" hard failure - SYSTEM RESETS and generally going berserk

jcd

Active Member
I've searched and read a number of posts regarding SYSTEM RESET issues without resolution, so I'm perhaps just adding my own story to the fray here with tiny hopes that maybe, just maybe I can leverage this hivemind to prevent me from otherwise having to scrap the totality of my OmniPro II system which I've had running mostly trouble-free for >10 years.

In short, one day I ran a firmware update on a few of my UniFi WAPs and PoE switch and that afternoon, my OPII system went utterly crazy and died. Correlation does not equal causation and I cannot imagine how these two events would conspire, but there you have it: that was the "trigger event". Prior to this, nothing else had changed or recently been updated or modified on my system. I'm using ~1300 lines of code, and this has been my codebase effectively for years with little change (i.e., I don't suspect any sort of code loop).

The OPII will not hold an Ethernet connection long enough to stay connected and download from PCA - this has ALWAYS been largely the case, and I blame the weak Ethernet port on the OPII, and the fact that I have heavy heavy traffic on my LAN: about 200 IP devices, 25+ 4MP IP cameras, 12+ iPads used as touchscreens (via Myro:Home), multiple PoE WAPs and other devices, and a ton of general network activity via PCs, laptops, servers, Synology storage servers, etc.

Since I'm talking about the LAN: A Ubiquiti USW-Pro-48-PoE powers the PoE side (600W budget, using about 200W of it), plus a few Cisco SG-200-48 switches, running through a Peplink multi-WAN router fed off Verizon LTE as primary ISP and a CenturyLink DSL as backup (we live in the sticks). HAI gear includes the OPII and a HiFi2 (I had the e-mail notifier board but it was junk so took it out). I have tried to connect just my PC and HAI on a small switch separated from the rest of the network traffic and this enables me to get PCA to download a bit better, but did not otherwise obviate the problems of the OPII failing.

So first I get the string of scores of SYSTEM RESET messages and I ignore it because otherwise the system is running just fine. Then I start getting error messages with my wireless sensors connected via the HAI 42A00-2 wireless received but they are just sporadic trouble signals that "go away" after a time. Then one night (it's always at night) at 4am every wireless sensor reports a trigger, not a trouble code, so all heck breaks loose, alarms go off, etc.

I start disconnecting devices one-by-one trying to get down to a stable system, but no luck. The OPII will no longer connect to several of the serial ports (HiFi2, Somfy URTSII, and the Z-wave interface) as well as the Ethernet port. SOME events run, but some don't...

I reset the EEPROM and started from scratch, nothing improved... I reached to Leviton SARA to see if I can send the board in and they said they're done with HAI. So I started looking for replacements (and what a daunting ugly and expensive job that has been so far).

Have I literally gone wire by wire and disconnected, tested, disconnected, tested? No - because I have literally every single expansion, every single zone, every single everything in use and I'm not even sure where to start in a methodical way. Given the errors "seemed" to begin with the wireless receiver, I could start with the RS485 bus and disconnect that and see if it helps??

@pete_c kindly nudged me here from a tangentially-related thread in the HS forums, thinking it would be worth maybe a last-ditch effort to troubleshoot before I started from scratch.

What else could I provide to any kind/brave souls who might graciously be willing to wade in here?
 
Oh, and the keypad consoles keep dropping date/time... so maybe it is something on the bus... just not sure how to test that out.
 
So to me, this feels like you're either dealing with a bad cap issue, which can cause all sorts of havoc, or a a network configuration issue.

From a network perspective, I would try the isolated network switch approach again, but pay closer attention to your link properties. Not knowing the max. capabilities of the Ethernet port myself, I'd play with the speed and duplex settings (go all the way down to 10Mbit).

Check for Ethernet errors using Get-NetAdapterStatistics or netstat -s on the Windows PC side, and try Wireshark to make sure you aren't dealing with collisions/out of order type packet issues.
 
@electron thank you - yes, let's set the bad cap aside for a moment and focusing on the network side. A couple questions just to help me get my head around it please:

1. Isolating to a switch off to the side, WHAT should I connect here to start? Simply the OPII and a PC running PCA to try to maintain a persistent connection between the two, absent any other traffic?

2. I ask the above because I've kinda tried that (will be more methodical, and try to employ some of the utilities you note) but it was hard because EVERYTHING needs to talk to everything else. For example, I tried to put all my IP cameras on their own switch... but then they need to talk to my Synology servers for recording, and they needed to talk to Myro:Home to display on the touchscreens and so the touchscreens all had to be on the same switch, and then my HifI2 needed to see my OPII and so on and so forth. But I get it - let me start without all of the interconnected stuff and see if I can keep OPII and a single PC talking.

3. Would it make sense that an overload of IP traffic on the OPII network port would also cause the other serial interfaces to not be able to communicate? Does all the IP traffic somehow seep into the main "brain" of the OPII and just paralyze it?

Will go setup #1 tonight and report back.
 
PS: Just for giggles, I ran netstat -s on my PC "as is" with everything presently connected, and I got this (but have no idea what I'm looking at):

Microsoft Windows [Version 10.0.19045.2673]
(c) Microsoft Corporation. All rights reserved.

C:\Users\Jcd> netstat -s

IPv4 Statistics

Packets Received = 3525339005
Received Header Errors = 0
Received Address Errors = 248236
Datagrams Forwarded = 0
Unknown Protocols Received = 1
Received Packets Discarded = 3804226
Received Packets Delivered = 3522391549
Output Requests = 1996484583
Routing Discards = 0
Discarded Output Packets = 78818
Output Packet No Route = 29
Reassembly Required = 102046
Reassembly Successful = 50981
Reassembly Failures = 0
Datagrams Successfully Fragmented = 0
Datagrams Failing Fragmentation = 0
Fragments Created = 0

IPv6 Statistics

Packets Received = 162611604
Received Header Errors = 0
Received Address Errors = 0
Datagrams Forwarded = 0
Unknown Protocols Received = 0
Received Packets Discarded = 38090
Received Packets Delivered = 163223468
Output Requests = 455570665
Routing Discards = 0
Discarded Output Packets = 402
Output Packet No Route = 0
Reassembly Required = 54577
Reassembly Successful = 27258
Reassembly Failures = 0
Datagrams Successfully Fragmented = 0
Datagrams Failing Fragmentation = 0
Fragments Created = 0

ICMPv4 Statistics

Received Sent
Messages 41267 83161
Errors 0 0
Destination Unreachable 31809 72062
Time Exceeded 1687 33
Parameter Problems 0 0
Source Quenches 0 0
Redirects 48 0
Echo Replies 409 7314
Echos 7314 3752
Timestamps 0 0
Timestamp Replies 0 0
Address Masks 0 0
Address Mask Replies 0 0
Router Solicitations 0 0
Router Advertisements 0 0

ICMPv6 Statistics

Received Sent
Messages 168734 35172
Errors 0 0
Destination Unreachable 803 4309
Packet Too Big 0 0
Time Exceeded 0 20
Parameter Problems 0 0
Echos 0 0
Echo Replies 0 0
MLD Queries 0 0
MLD Reports 65 0
MLD Dones 0 0
Router Solicitations 0 100
Router Advertisements 0 0
Neighbor Solicitations 16598 14110
Neighbor Advertisements 151386 16630
Redirects 0 0
Router Renumberings 0 0

TCP Statistics for IPv4

Active Opens = 502820
Passive Opens = 567386
Failed Connection Attempts = 104750
Reset Connections = 61861
Current Connections = 190
Segments Received = 3477131332
Segments Sent = 2000991401
Segments Retransmitted = 6935906

TCP Statistics for IPv6

Active Opens = 13277
Passive Opens = 3221
Failed Connection Attempts = 433
Reset Connections = 1578
Current Connections = 6
Segments Received = 147291010
Segments Sent = 455583164
Segments Retransmitted = 724106

UDP Statistics for IPv4

Datagrams Received = 120171575
No Ports = 3726498
Receive Errors = 211533
Datagrams Sent = 10768890

UDP Statistics for IPv6

Datagrams Received = 33311563
No Ports = 31709
Receive Errors = 6389
Datagrams Sent = 630820

C:\Users\Jcd>
 
One more thing: I did change the port settings to 10Mbps half-duplex, and also tried to "tuck" the HAI and HiFi2 onto a small 100Mpbs mini-switch to see if I could "throttle down" the traffic they were seeing - but neither helped and that arrived about at the end of my network knowledge.
 
@jcdavis you have been around a while...

Active Member
Joined Dec 27, 2013

First test....easy button test...Ethernet hiccups will mangle the serial bus on the panel. They do not play well together.
I saw this issue when I went over 100 devices on my home LAN a few years back.

Here have the 2 Leviton/HAI 45A00-1 and the other model transceiver. I used them for a bit then wired everything. There have been glitches reported sometimes using the 45A00-1. Search on the forum here.

Looks like your Ethernet port on the OP2 panel is "constipated". Have you tried the following?

A - disconnect the Ethernet port, reboot and only utilize your serial connections to the panel. Your serial devices will not hiccup and your time will stay stable with no Ethernet connectivity (if it is an Ethernet issue).

B - connect an old 10Mbs hub to the Ethernet port then on to your main lan. Or if you have a managed L2/L3 switch, set it to 10 Mbs and route the traffic from the port to your main network. Years ago tried this with a managed L2 switch and it did not work for me. Maybe it will for you.
Thinking that Leviton used one for last Expo in LAS that they were attending before discontinuing the HAI panels. IE: while demoing the OP2 panel it kept disconnecting from the Ethernet network there and trashing the serial bus.

C - route the traffic from the OP2 (personally works best)

Any old SOHO combo router will work to test. Guessing you may one in a junk pile (even a 20 year old SOHO wireless router will work fine)
1 - configure the WAN port to the same static IP you are using for your OP2 panel.
2 - configure the LAN port to a tiny subnet...using this ==> Online IP subnet calculator
3 - here is an example:

/29 subnet mask

Gateway IP of 192.168.0.1
Subnet mask 255.255.255.248

You will have IPs for 6 hosts.

Configure your OP2 panel for IP 192.168.0.2

Put the IP 192.168.0.2 in a DMZ or open port 4639.

See if that works. Here utilize a micro travel router with OpenWRT on it. (years now).

Microrouter DIY for use with the Omni Pro Panel Ethernet port

@dwalt is the Obi Wan HAI guru here and more than likely will add to this post...

There is also another support site here ==> Tech support for your HAI / Leviton Omni
 
Last edited:
Looks like you're dealing with a large number of errors. I would reboot that laptop and right after the boot, take a look at the netstat -s command (take a screenshot or paste in notepad). Then run it again couple of hours later (or until things break), take another screenshot just to see how quick these values are going up. (all while connected to the OP.
 
I jumped in here to this thread to respond to Nick way above, but now I seem to have hijacked it into the central theme of the thread I started which is really about the same topic, so I'm thinking I/we should migrate over to THAT thread perhaps? @electron I have a follow-up question about using netstat -s to troubleshoot, so I'll tag you there (in the other thread) if that's okay...?
 
@pete_c your comment/confirmation that the Ethernet port could botch the entire set of comms ports was enlightening... and oddly reassuring.
So maybe this could be as "simple" as a LAN problem, which would elate me like you cannot imagine (in a relative way ;) ) I tried your "reboot WITHOUT Ethernet" guidance, and it's only been an hour, but gosh if that doesn't seem to be an improvement so far... will keep at it.

@electron from the other thread I wanted to follow-up with a question about using netstat -s to troubleshoot: You said the quick log I posted showed a lot of errors. I presume this to mean simply that the # of packet errors in the various columns is high - don't know if this means "relative to the overall packet traffic", or if the expectation is that this should be zero, or close to it - i.e., I don't know how much "bad traffic" is simply a function of how an Ethernet network and/or adapter works.

That said, would an approach to be to disconnect EVERY SINGLE IP device from the network, then run netstat -s and check it, connect one device, run it again, connect another device, run it again... etc.? If I know what the general "expected" level of bad/discarded packets or errors should be, could I then "see" when I added a device that was spiking this figure up? If so, should I be looking in the IPv4 Statistics section, the TCP section, the UDP section, or any/all?

I'm a networking neophyte: my approach has been to try to buy good hardware and "plug-n-play" but I do acknowledge that 200+ high-traffic devices is probably pushing things towards "plug-n-pray"!
 
@jcdavis,
I'm just throwing my 2 cents in here. There have been many posts about the OmniPro II and busy lans... I will defer to @pete_c and the others to address that. My history with the OmniPro II has been on small lans. Your typical vacation home with some smart TVs, a few smart phones, etc. The OmniPro II has been very reliable in those cases. Stays on the network, PC Access works reliably, etc. So, I would zero in on the suggestions above to get the OmniPro II on a quiet network segment. The OmniPro II has a 20+ year old ethernet implementation. This was ground breaking at the time. At this point in time, I think its best to help the OPII out and give it a simple lan subnet to live on.
Yesterday, I remotely accessed a system I installed over 8 years ago. PC Access worked fine.
You said:
"The OPII will not hold an Ethernet connection long enough to stay connected and download from PCA - this has ALWAYS been largely the case, and I blame the weak Ethernet port on the OPII,"

I'd suggest one test. Leaving the OPII off the house network, plug an ethernet cable between the OPII and a laptop. Put your laptop on the same subnet using a static ip. See if PC Access works. We can provide steps if you need more detail.

@dwalt
 
  • Like
Reactions: jcd
@pete_c your comment/confirmation that the Ethernet port could botch the entire set of comms ports was enlightening... and oddly reassuring.
So maybe this could be as "simple" as a LAN problem, which would elate me like you cannot imagine (in a relative way ;) ) I tried your "reboot WITHOUT Ethernet" guidance, and it's only been an hour, but gosh if that doesn't seem to be an improvement so far... will keep at it.

@electron from the other thread I wanted to follow-up with a question about using netstat -s to troubleshoot: You said the quick log I posted showed a lot of errors. I presume this to mean simply that the # of packet errors in the various columns is high - don't know if this means "relative to the overall packet traffic", or if the expectation is that this should be zero, or close to it - i.e., I don't know how much "bad traffic" is simply a function of how an Ethernet network and/or adapter works.

That said, would an approach to be to disconnect EVERY SINGLE IP device from the network, then run netstat -s and check it, connect one device, run it again, connect another device, run it again... etc.? If I know what the general "expected" level of bad/discarded packets or errors should be, could I then "see" when I added a device that was spiking this figure up? If so, should I be looking in the IPv4 Statistics section, the TCP section, the UDP section, or any/all?

I'm a networking neophyte: my approach has been to try to buy good hardware and "plug-n-play" but I do acknowledge that 200+ high-traffic devices is probably pushing things towards "plug-n-pray"!
So I messed up and responded to the wrong thread you posted in (my mistake). It also looks like Pete already pointed you into the right direction, but here's what I was going to post (but forgot to click submit).

The # of errors seem high, which is why I want you to isolate the OmniPro and that Windows machine, take note of the counters before, let it sit there and observe if it still crashes. If it does, take another look at the counters so we can check how quick those values are going up.

Couple of other things to check/be aware of:
  • Try replacing the Ethernet cable going to you HAI if you haven't already. Cables can go bad, which would result in errors/weird connectivity behavior.
  • If you're using DHCP, switch to a static IP on the OmniPro (which I'm guessing you have to do anyways for isolated testing). Double-check the subnet masks!
  • Confirm your link speed/duplex parameters, and drop to 10Mbps (but full duplex if not on a hub).

Since this Ethernet stack is very old it's possible it just can't handle some of the network traffic it's seeing (IPv6, multicast, malformed packets, jumbo frames, ...) so in theory it's possible that you recent network changes are the root cause.

Unfortunately, without isolating the system, it's going to be difficult to troubleshoot this which does mean you'll have some downtime. If it stays stable, I recommend you just move the HAI components to their own isolated network as others have suggested.
 
  • Like
Reactions: jcd
@jcdavis,
I'm just throwing my 2 cents in here. There have been many posts about the OmniPro II and busy lans... I will defer to @pete_c and the others to address that. My history with the OmniPro II has been on small lans.
@dwalt
I have about 65 LAN devices in my house. Lots of traffic of all types, VPNs, servers, SSL-VPNs, and the HAI ethernet and email board connect in without any isolation of any type, and it has worked great for many years. The "problem" does not seem universal.
 
I have about 65 LAN devices in my house. Lots of traffic of all types, VPNs, servers, SSL-VPNs, and the HAI ethernet and email board connect in without any isolation of any type, and it has worked great for many years. The "problem" does not seem universal.
I'm guessing it may depend on many variables including type of network traffic and manufacturer/model of these devices. I've seen basic pings take down enterprise grade telco equipment, networked office printer taking down the entire network, switch port counters hitting a certain value resulting in a locked up port until power cycle, etc. All of these were older devices with firmware bugs not properly handling certain conditions.

Either way, since he mentioned there was a major network change that day, network isolation is still my recommended first step after trying new Ethernet cables.
 
  • Like
Reactions: jcd
Back
Top