120khz signal storm, switchlinc's to blame?

PeterW

Active Member
I went a bit crazy and have way too many 2-way devices. As could be predicted, I'm having the usual reliability problems.

However, In addition to the 'lost command' problem, I've run into something serious that my switchlinc devices seem to be implicated in.

In a nutshell, everything is working fine, and suddenly there is extreme 120KHz noise being spewed onto the line, totally hosing all X10 communication.

If I walk around the house, and turn on and off each switchlinc and watch what happens, I'll find the culprit.

A caveat.. I've had an X10 RF receiver device be the apparent culprit too. But for the most part, it seems to be the switchlincs. It is happening roughly once or twice every two to three days.

Usually, the culprit is one of the devices that I issue commands at specific times via a CM11A. But not always. Usually, it is one of the devices with the built-in repeater turned on, but again, not always.

At this point I'll rewind and give some more background info.

At the breaker box, I have a leviton 2-phase coupler/repeater. Yes, the $60 one that seems to get some folks here so excited - the HCA02). I also have a smarthome 3-prong dryer coupler/repeater to try out, but have not done so yet - I dont have the room for it behind the dryer :-(

Most of the devices are on one phase, with some on the other phase. Some circuits are seriously overloaded with X10 transmitters.

Specifically, I have on each circuit:
Phase 1: 8 transmitters (including computer controllers). There are a lot of computers on this circuit, all behind X10 noise filters.
Phase 1: 10 transmitters. Half of master bedroom, hall, bathroom. one fluro light with an instant-on electronic ballast (no filter on its ballast yet, but I have them ready). It is on a switchlinc relay controller.
Phase 1: 3 transmitters. garage/bedroom
Phase 1: 8 transmitters. other half of master bedroom, tv room lights, kitchen lights (!). 6 fluro lights, using switchlinc relay controllers.
Phase 1: 1 transmitter (an appliancelinc for a fishtank with fluro lights)
Phase 1: 1 transmitter (another fishtank with fluro lights and appliancelinc, and a whole bunch of tv electronics behind an x10 noise filter/block).
Phase 2: 4 transmitters (2 fluro lights on switchlinc relays, two regular switchlincs). Another stack of computer gear behind a noise block.
Phase 2: 8 transmitters. (includes a wgldesigns wireless RF receiver, and about 30 fish tanks, all with fluro lights. the lights are on appliancelincs, I have noise blocks but have not installed them yet).

That list is slightly out of date, there are a couple of other 2-way devices scattered around as well. But today, there are 37 transmitters on phase 1, and 13 on phase 2.

For the most part, I get signal across the house. Having the fluro lights on or off doesn't seem to make much difference, so I don't think they are destroying too much signal. Under normal circumstances, there is very little 120khz noise. According to my testerlinc's 120khz activity log, there are 1 or 2 cycles of 120khz that blip every couple of seconds or so. It is generally fairly quiet. Mind you, I do have a lot of transmitters to suck in any noise....

If I poll all the devices (they all respond to status requests), I'll usually get a response to all, except one of the fishtanks which is on the end of a very long run with 9 other transmitters between it and the breaker box. In this case, very long means that it literally travels from the breaker box on one corner to the house, around the edge, to the diagonal corner where the tank is, with two other local transmitters. It then has a long run, and hits 7 others that are fairly close to the breakers. I'm not suprised I'm having trouble with that one.

Anyway.. besides the expected signal attenuation problems, I am having occasional problems with locked up modules. Appliancelinc and lamplinc in particular. Their local control stops, and they ignore commands, even when sent locally by a test controller plugged into the same outlet. They have reset to A1 several times now. I had two spare appliancelincs, and swapped out the ones that kept resetting to A1 and it seems to have stopped. Bah. I now send A1 status requests every few hours to see if anything arrives unexpectedly.. so far so good.

And then there is the X10 "system deadlock".

All of a sudden, X10 commands stop working. The software driving my CM11A (danlan.com's x10d) reports "unexpected poll 55" or the like. If I fire up my smarthome testerlinc, it reports BBK and BSC (bad block and bad start code) with a "quality" of 120+. If I look at the leviton coupler, it has a status light.. It's 'error' light is blinking once every second. While I'm in the garage at the breaker box, I usually isolate the coupler at this point.

When I walk around the house, looking at the status lights, they are generally either all off, or blinking once every second (or every half second). It is regular, with no variation.

If I press a button on a keypadlinc, its light flashes for quite some time while it tries to transmit the codes. They give up.

At this point, I start pressing the 'off' side of the rocker on each switchlinc and watch the lights or the testerlinc. Eventually I'll get to a switchlinc and when I press the rocker, suddenly the X10 status light changes behavior and the BSC/BBK stops on the testerlinc.

Next comes a whole stream of X10 commands that have been backlogged. HCA fires off a bunch, lights change according to delayed schedules etc.

So far, the "culprit" has been:

Switchlinc relay: three times (Phase 1, fluro light, master bedroom, booster mode enabled)
Switchlinc relay: twice (Phase 1, fluro light, kitchen, booster mode enabled, same circuit as the one above)
Switchlinc dimmer: once (Phase 2, regular light bulb, on the 8 transmitter circuit including my mountain fish tanks)
Switchlinc dimmer: once (Phase 1, regular light bulb, in an otherwise unused room. this switch is quiescent 99.999% of the time, does not send or receive commands)
X10 RF receiver (Phase 1, master bedroom, same circuit as the first two in this list.. unplugging it caused the system to unjam)
Keypadlinc-6: once (Phase 1, in my son's computer room, on the same circuit as the computer controllers (8 transmitters total))

At first, I thought it might have been the 'boosterlinc' feature they recently added. As near as I can tell, this is just a single-phase repeater that detects the first of the repeated commands and retransmits over the second of the repeat.

But, that doesn't seem to be it, because it has turned up on devices without boosterlinc mode.

And the X10 RF receiver really messes up my theories too.

Now, I do have multiple repeaters, in addition to leviton HCA02 at the breaker box.

Smarthome say that their boosterlinc stuff is safe to use on multiple devices. I've been lucky to have devices on one long run where it was convenient to turn the repeater on at about 1/3 of the distance from the breaker to the end, and again at 2/3 of the way to the end.

In some of smarthome's examples, they give examples of using multiple freestanding boosterlinc modules along with a passive coupler at the breaker box. They say to get them in an outlet as close as possible to the breaker box.

However, I'm beginning to doubt the safety of this. I'm starting to wonder if I'm seeing a repeater loop storm or the like.

I don't recall for sure which devices have the integrated boosterlinc featuer enabled. I'm just about ready to go through and reprogram every single device that has the capability to make sure it is off. Then plug in the freestanding devices at outlets. What this enables me to do is to yank all the repeaters (and isolate the leviton HCA02 at the breaker box) and see for sure if it is a repeater storm or not.

Oh, that reminds me.. I noticed something very curious on the X10 log right after breaking the last logjam.. The first few commands that my computer saw was something like... "M16 P16 O16". This is highly suspicous.. I *know* that I have nothing in the house that generates those codes. And, I've just realized that the smartlinc devices use codes like this for programming them! What the heck? Perhaps the jammed device is reporting some sort of diagnostic code? I'll write it down if it happens again.

The other thing that I am wondering is whether the repeaters are implicated in getting one (or more) of the devices initially into a loop but are not needed to maintain the chaos. Having the repeaters entirely isolatable would probably answer this.

Anyway... I can deal with the signal attenuation problems. I can have extra circuits run to get the number of transmitters right down, etc etc. But the 120khz storm is driving me insane. Actually, it is driving my wife insane - I'm getting frustrated to the point that I was seriously considering giving the UPB system a road test and ripping all of the X10 devices out for one hell of a fire sale.

Does this 120khz storm problem ring bells for anybody? Is the HCA02 a pile of junk? Does it sound like I've triggered a repeater storm? Should I try out the ACT coupler/repeaters instead? (eg: CR-234) Is it worth investigating something else?

I ran across the lightolier compose firewalls.. they look like a pretty extreme solution, but would probably work.. or at least isolate the problem to a single circuit. I don't think I'm quite desperate enough to try this yet.

Any other ideas or pointers?

BTW: I ran across the insteon stuff today for the first time. Aargh! If I'd known that was coming, I might have waited. Or maybe not.. I read what happened with switchlinc's when they were first introduced...
 

DavidL

Senior Member
Should I try out the ACT coupler/repeaters instead? (eg: CR-234)?

yes. I tried em all and the ACT was lots better for my install.
 

electron

Administrator
Staff member
Hi Peter, welcome to CocoonTech!

I have heard from several experts and vendors now that the HCA02 is not a good coupler. I have a new one on my desk (X10 Pro) as I am ready to yank the HCA02 since I seem to be having problems which I don't have when the coupler isn't involved. From what I hear, the ACT is indeed a much better coupler.

As for the new protocols, the ones to look out for are UPB, Insteon and ZigBee, UPB. UPB is the most 'mature' protocol out there, but the Insteon stuff looks promising (since it's backwards compatible with X10), as does ZigBee.
 

AutomatedOutlet

Senior Member
Hello Peter,

Welcome to the board. I too would be suspicious of the coupler/repeater.

I think the two best ones made for installation into the breaker box are the ACT CR234 and the X10 Pro XPCR.

The Smarthome plug-in coupler repeater also works well though. This is what I use in my house because I didn't want to go into the breaker box.

In another building at my house though I do use the Lightolier Firewall box. Although a retrofit is a lot of work, and not cheap, that is going to be the best reliability you can get.
 

PeterW

Active Member
Thanks for the suggestions! I don't normally like asking for help, but this was such a strange thing and I've seen folks post here about switchlinc problems (that I've seen too). I've written most of my own control software (for fun), but I still have a HCA running - I find it helpful for prototyping things etc.

I saw another storm a few minutes ago.

At 3:05:02pm, I send D1 D2 D3 D4 DOFF and D1 D2 D3 D4 DOFF again immediately afterwards. These are in the master bedroom, on the same phase but two different circuits. D1 is a keypadlinc-6 with incandescent load, 2 and 3 are lamplinc's. Alll were off already. On the other circuit, D4 is a switchlinc relay with a fluro load (also turned off), and I have had it tagged as a repeat offender. I think D4 is a repeater. It is on the circuit with the long run, and has kitchen fluro lights and a second repeater on it at the kitchen.

So..The first grouped command was sent ok. With the second command, the CM11A wedged after transmitting the D2. The grouped command was "open" with D1 and D2 (and possibly D3) selected. It recovers, but doesn't appear to transmit.

Right before this was starting, I went to the bathroom. I would have turned on a fluro merely seconds before the meltdown, which just happened to be on the same circuit as D1-D3. (It was on a switchlinc relay). My CM11A saw the F02 FON at 3:04:20pm

I do not know exactly when I turned the lights off again, because it was not received by the CM11A or my HCA + powerlinc.

At 3:06:02pm, another command was attempted to be transmitted. It was addressed to devices on Phase 2 (stairway lights). The CM11A could not send it, and returned shortly afterwards "unexpected poll: 55" (3:06:08pm).

I saw this on my computer screen and went and looked, and sure enough, the status lights were all blinking.

So, I know that at 3:06pm, the system was dead.

I retraced my steps. I went downstairs and turned the bathroom lights on/off again. Nothing happened as far as X10 was concerned. The system was still dead. Note that I did NOT isolate the coupler/repeater this time. The lights were still blinking like clockwork.

On a wild hunch, I walked to the kitchen and pressed the 'off' on the switchlinc relay there (C2) for its fluro lights. I *know* this one has a repeater enabled. I've had this one implicated as the "culprit" before. I do not recall for sure if the lights were on or off to start with though. Note that this particular switch is on the same circuit as D4 in the 3:05pm command. D4 had boosterlinc repeater mode turned on. C2 does too.

The system unlocked. According to x10.log on my CM11A, I saw three unaddressed "COFF" commands.

Hmmmmmm. I wonder if D4 and C2 got in a repeater fight? Note that I did not touch the rocker on D4 this time to break the storm. I just happened to luck out on C2 first go.

I think its time to turn off the boosterlinc repeater mode, and fast. I'll take occasional lost commands over a system deadlock any day.

I saw something in HCA's registry to have it log address and command instructions seperately. It would have been really nice to see a second opinion for what the CM11A actually transmitted.

I'm also going to zoom down to the hardware store to get a 240V outlet. As an experiment, I'll connect it to the breaker panel instead of the HCA02 and plug in the smarthome coupler/repeater there instead of behind the dryer. It just so happens that there is a convenient breaker pair available for all this.. There was a run to a subpanel in the back yard where they a previous owner had a pool or spa or something. It was removed, but it left a convenient ganged 240V breaker pair - just right for a second 240V dryer outlet. :)

(Hmm. While I'm there, I might connect up a 240V australian power outlet, my wife has some .au power tools in the garage that she'd like to use without a step-up transformer. I think I'd want a 220V GFCI on it though because what should be neutral on the .au outlet would actually be 120V hot.)
 

BraveSirRobbin

Moderator
Peter:

Don't think for a minute that just because you have a "noise" source behind a noise block, that it will no longer be a problem source.

I have two UPS systems and a computer that I had behind noise blocks and are/were still causing problems in my X-10 system. The computer problem in particular was so bad I yanked its power supply and replaced it with an Antec (fixed that problem).

I think you need to somehow unplug or eliminate your devices/sources and start troubleshooting a much simpler setup.

Regards and welcome to Cocoontech.

BSR
 

electron

Administrator
Staff member
Btw, I just figured out what the source of my 6 month old constant power line noise is, a cheap surge protector! I finally moved my computers to a new rack, and when I cleaned up my wiring, I ended up with an extra surge protector. Then I noticed that I no longer have any problems. The moral of the story is that the problem could be something as stupid as a dumb surge protector or even a computer power supply as BSR mentioned.
 

PeterW

Active Member
I got a reply from the smarthome folks. They think it is repeater chatter.

Here's what I've done this evening..

1) done a walk-through and reprogrammed every single switchlinc with repeater capability to make sure it is off. (change to a different HC:UC, test, then change it back again to make sure the programming sticks)

2) ripped out the HCA02

3) the breaker box is in the garage. I installed a 240V dryer outlet near the box. I have a smarthome signalinc coupler-repeater (#4826b) in there now.

4) walked around the house and collected a bunch of transmitters that I forgot about (IR, IO, temperature, etc).

5) Polled every device from my computer. I can "hear" every single pollable device in the house, except for the two at the end of the long run I described above.

At this point it means I have a single coupler/repeater in the house and appear to have X10 communication between all but two of them. Both endpoints are in worst-case scenarios. My computer controller is in a hell-hole of computer power supplies, the other end is a long run with the largest X10 transmitter installation in the house. The problem devices have fluro lights attached, and are currently turned on. I suspect the electronic ballasts are absorbing some signal. I'm betting that once my wife is out of there and I can test it, I predict that turning the lights off will improve things there.

I have some standalone boosterlinc devices that I can plug into outlets. I'm going to tempt fate and see if the strategic placement of it can improve the reachability of those two distant devices.

The risk of a repeater storm is still there, but being able to unplug the damn things should help from a diagnostic perspective. Smarthome say that the standalone boosterlinc devices will shut down for a while if they detect a storm, so in theory that means they should all cooperate to break it, IF that is what is going on.

I also have two dryer plugs. I wired up my spare passive coupler and the old leviton HCA02 to them, and I can plug them into the new dryer socket as an additional diagnostic aid.

BTW: the only source of 120khz noise I've found is my fisher & paykel clothes washing machine. It doesn't use a regular motor and gearbox. Instead, it has a direct drive stepper motor type device. When the machine is turned on, but the motor not active, it generates massive 120khz noise on the power. I assume its power supply is rather noisy. Putting a filter on it solved that problem.

I've checked a lot for noise, but have not found any. I *have* found signal absorbers all over the place! My wife's computer power strip was a big problem. A signal filter/blocker restored X10 comms to the rest of the room.

I've just discovered that my fridge is eating a whole bunch of signal! Unplugging the fridge restored communication to a fish tank on the same circuit. Damn, where is my spare 10-amp filter??

BTW2: If you have an X10 motion sensor in the same room while you're trying to program a switchlinc, be careful! When the light turns on in response to "set mode", the motion sensor transmits a 'light/dark' code. Guess what happens next? Hint: the switchlink picks up the address. (It then turns on. The motion sensor detects light and sends the "unit-code off" for daylight. Light turns off. motion sensor detects dark -> "unit-code on". Repeat. Get confused. Get headache. Watch family laughing at you while the light turns on/off every 2 seconds). Unplug your RF receiver while programming these things in rooms with RF motion sensors. :)
 

PeterW

Active Member
Ahh crap! So much for that theory.

I turned off the breaker for the lights that I can't reach via X10 from my computer. I wanted to double-check which outlets were on the circuit. I was actually trying to figure out a good location for the standalone repeater.

When I turned the breaker back on, the X10 system instantly went nuts in the usual fashion!

There are now no repeaters besides the one smarthome #4826b at the breaker box. The repeater in the breaker box was sitting there with red error lights blazing away, for both circuits! (How curious! There were no 240V appliances running that I know of)

I toggled all the switches on the circuit, figuring that one of them must have powered up badly. No effect!

So, I started walking around the house and discovered which switchlinc was the culprit this time. It turned out that it was one in a bathroom that I previously have not had problems with. This was on a different circuit, and was previously programmed to not be a repeater!

Argh! What the heck??

Why would powering on/off another circuit cause an unrelated switchlinc to go crazy and hose the system? And it was a "new offender" too..

(I'm just about to go and try to provoke it again...)
 

BraveSirRobbin

Moderator
Wow, you certainly seem to have your hands full there! Just a thought, can you program ALL of your switches to NOT be repeaters and then take out the booster links from your outlets and see if that solves the "storm"?
 

PeterW

Active Member
Yes, that is what I did.

I walked around to every single device that is capable of built-in boosterlinc mode and programmed it to A10 first (to make sure the programming stuck, checking that A10 ON/OFF works) and then back to its normal code.

I never got around to plugging in any standalone boosterlinc devices. I was verifying which outlets were on the troubled circuit by turning the breaker off and then back on (after a few minutes). BAM! instant storm! (Aside: aargh! the problem circuit is a lights-only one.. not a single outlet to add a booster!)

There was only one single repeater in the system... the #4826b coupler/repeater at the breaker box.

The bad news (or good news, depending on your perspective at the time) is that I can reproduce this easily. Turning the breaker off, waiting for about 10 seconds, and then back on again, triggers the storm in about 1 in 3 cases (with the coupler/repeater active), and in about 1 in 10 cases (with the coupler/repeater isolated).

To be clear, with zero repeaters in the system, it still happens.

Here's the curious thing. I have 4 switchlinc-relay devices on that circuit. I also have a keypadlinc-6, a plain switchlinc dimmer, and a switchlinc-relay-timer.

When I power up that circuit, in the non-storm case, I see 4 X10 command codes that match the switchlinc-relay states. eg: DOFF DOFF CON CON. What the heck? Why are these devices transmitting this command at powerup? Why only the switchlinc-relay devices?

This is smelling more and more like a firmware bug to me. Imagine what would happen if all the devices powered up close enough to try and transmit in unison on the same AC cycle? I could imagine that they'd all see the collision in unison, back off in unison (using the same linear or "pseudo-random" backoff times) and retry in unison. That would make for one HELL of a storm.

What is making me even more suspicious is that I still see some of the CON/OFF/DON/OFF commands after breaking a storm that has been running for a while. Perhaps the particular firmware in these switches is retrying that startup code forever? Even more curiously, the switchlinc-relay devices come in the smarthome old-style (big) boxes, while the relatively new relay-timer devices come in the newer smaller boxes. I'm thinking that their switchlinc-relay devices are from an older production run with old firmware (perhaps buggy), perhaps because they dont sell many of them.

Hmm.

Another curious thing.. I've noticed that one of the devices "powers up" with the load on, but without turning the indicator lights on properly (The load is on, but the load light is not). I think I'll start by pulling that one out.

Anyway, today's task is to trigger the storm and then disconnect the switchlinc-relay devices one at a time to try and find if it is a single switch or the group that is the cause. (see? I said above that being able to trigger it at will could be a good thing)
 

PeterW

Active Member
Except when you tempt fate by saying that you can do it at will. It is being extremely difficult to provoke today.

Oh well, at least it made me go and do a walkthrough and install all the noise blocks that I've been meaning to...
 

PeterW

Active Member
ok, here's the kicker. I have 5 switchlinc-relay devices on this circuit. D4, D5, D6 and the two distant ones are C2 and C3.

If I send "D4 D5 D6 DSTATUSREQUEST" (ie: the one command is group addressed to three switches), BAM! Instant 120khz storm.

Under normal circumstances, one wouldn't do something silly like this. But I figured it would be interesting to cause several switches to transmit simultaniously, to test the theory that the simultanious startup transmission was causing problems. I wasn't expecting it to be quite *that* spectacular.

If I address two switches - causing two to try and transmit their reply simultaniously, everything is fine. If I address three, boom!

If I put it in this state and disconnect any of the three addressed switches, the storm stops. It doesn't seem to matter which one I disconnect. I've observed it with all of the lights on that circuit off, all of them on, and various combinations in between.

If anybody else has a bunch of these on a circuit, nearby each other.. can you try it? I think this is a firmware bug in the switches themselves.

The *other* bad news is that the signalinc coupler/repeater seems to be dying after 12 hours of use. It has been progressively loosing phase-B today.

I know it is a figment of its imagination and not real noise, because swapping the wiring doesn't change the 'B' flickering red light, and it stops seeing commands on the other phase. Swapping back to the leviton HCA02 brings things back to normal. aargh.

Crap. I knew I should have ordered a CR-234 in advance.
 
Top