Possible intermittent bug?

rossw

Active Member
Version: v03.02.17a
 
I have a program that is running fine.
It does its thing in "automatic" mode which takes a few minutes to complete (by design), then returns to the head of the loop.
The first thing it does is to read the state of  VAR1  -  this is how I can "insert" manual commands or if you like an "override" function.
When VAR1 is set non-zero, the code branches and does one of currently 13 different things instead of the "normal" loop.
VAR1 is not written to ANYWHERE in the program. It is a "read-only" variable. It gets written from another computer which wants to "queue up" something different to be done once the current cycle is complete.
 
It all works. And it works fine.
 
Mostly!
 
Somehow, intermittently, VAR1 is being re-written to a previous value. It usually seems to be after about 5 to 30 minutes.
I've run tcpdump looking for anything that is re-sending the command - nothing!
 
It's particularly annoying, because the "core" routine (the automatic one) is what actually does the logging. (Uses webset to push data to a server), but when it's in an override mode, it doesn't - so I lose my logging when this happens.
 
The "fix" is for me to just send the WC board another "api/setvar.cgi?varid=1&value=0" command to reset VAR1, and away it goes. But frequently it will reset itself to the last non-zero value it had! 
 
Anyone else observed anything even remotely similar? It's happened about 15 times in the last 24 hours, resulting in a lot of missed data (and a solar tracker that hasn't been tracking!)
 
 
 
 
OK, so I've modified the code now to try to nail this down.
Once I detect VAR1 has a non-zero value, I send an email,
then once the specified task has been completed, I send a second email and force VAR1 to 0.
I actually wanted it to do the manual task until told otherwise - but this randomly restoring a value is causing operational dramas, so will have to do for now.
 
So, I sent the command "70002" and shortly afterwards got an email:  (time is UTC)
 
WEBCONTROL     Sent at: 01:49:20 On the 04/15/2013
VARs:
VAR1=70002 VAR2=0 VAR3=0 VAR4=430081 VAR5=255 VAR6=255 VAR7=7 VAR8=540
 
The command "70002" is correct, and the next email confirms the register has been reset.
 
WEBCONTROL     Sent at: 01:49:53 On the 04/15/2013
VARs:
VAR1=0 VAR2=0 VAR3=0 VAR4=430081 VAR5=2550255 VAR6=2550255 VAR7=7 VAR8=540

 
NOTHING else was sent to the webcontrol (I was working on something else, unrelated) and got another email!
 
WEBCONTROL     Sent at: 02:16:09 On the 04/15/2013
VARs:
VAR1=70002 VAR2=0 VAR3=0 VAR4=440090 VAR5=255 VAR6=255 VAR7=7 VAR8=540
 
(and the correct "completion" mail)... and then another, and another!!
 
 
WEBCONTROL     Sent at: 02:26:19 On the 04/15/2013
VARs:
VAR1=70002 VAR2=0 VAR3=0 VAR4=440093 VAR5=255 VAR6=255 VAR7=7 VAR8=540

 
WEBCONTROL     Sent at: 03:14:35 On the 04/15/2013
VARs:
VAR1=70002 VAR2=0 VAR3=0 VAR4=420110 VAR5=255 VAR6=255 VAR7=7 VAR8=540
 
The times between are not constant (or even close).
 
I've just done a refresh of the status screen (the popup window from in the PLC PROGRAM page) (I have web polling disabled) at 03:40 UTC and watching to see if it happens again.  (10 minutes so far and still nothing!)
 
Unless you have non-blocking delay set on VAR1, which could restore the previous value, there should not have anything else could know its previous value.
From your description, it sounds like you are doing WEBSET and getting server return value, then set that value into VAR1.  Have you tried to set the value into another VAR to see if you got same problem?
 
CAI_Support said:
Unless you have non-blocking delay set on VAR1, which could restore the previous value, there should not have anything else could know its previous value.
From your description, it sounds like you are doing WEBSET and getting server return value, then set that value into VAR1.  Have you tried to set the value into another VAR to see if you got same problem?
 
 
I am using webset to get data from the server (sun position), and to push data from the webcontrol to the server (tracker positions, actuator current etc), but VAR1 is ONLY used when I want to issue a manual command (eg, to send calibration data, or to query various operational parameters of the downrange equipment). When I want to do one of these "manual commands", I use the (above-mentioned) 192.168.1.15/api/setvar.cgi?varid=1&value=70002 (for example) commands with fetch or wget to sent VAR1.
 
It has done it again 4 more times overnight.
 
I am not using the non-blocking command anywhere in this code, and except from the (newly added single location where I clear the flag), there is NOWHERE in the code that actually SETS VAR1 to anything.
 
The tcpdump I have had running has still seen *NO* traffic being sent TO the webcontrol board despite multiple cases of VAR1 being "restored".
 
WebControl internally has a timer update all I/O and vars every 100ms. If the firmware setup VAR1 a value by itself, it will be a lot frequent to see.
Only VAR variables store their last value on an internal register, that is for non-blocking delay to restore the value back at the time out.
One way to isolate the problem is to change in your code from VAR1 to VAR8 or some other VAR, to see if the problem persistent with VAR1.
 
We have not heard anyone else reporting VAR1 by itself restore the previous value.  If that is a problem, we will want to find how to duplicate that and fix it.
You probably should send some older boards back to get firmware update, so that you can test WEBSET with more boards, increase the chance to find the problem.
 
OK,  does someone else have a V3.02.17a firmware board lying about for a test?
 

START    

rw:
    MOD VAR1 10000 RAM2
    DIV VAR1 10000 VAR7
    ANDB VAR7 15 VAR7
    ROTL VAR7 4 RAM1
    SET VAR2 RAM1  
    ADD RAM1 RAM2 RAM1
    SET VAR8 RAM1  
    delay 500
    goto rw
 
end

 
Don't be concerned that the code seems obscure, it's actually remarkably simple.
Load this code into your WC board.
Watch on a status page the var values.
 
VAR2 and VAR8 are counting up.
HOW CAN THIS BE?
OK, so VAR8 changes because it's set to the value of RAM1.... sure, but RAM1 is calculated from VAR7, which is calculated from VAR1, which isn't changing!
And var7 wasn't changing, so it looks like it's the ROTL that's introducing the problem???
 
Can anyone else duplicate/confirm this??
 
Rossw,
 
I have a v03.02.17 at my bench, it is not an "a" rev but decided to load your program and test. It has been running for 10 minutes and all var's are sitting at 0 so far.
i will check to see if I have a .17a in stock, if so I will hook up and test and get back to you.
 
Tim
 
edit:
Well, I thought I had a new spare but I don't.
It has been about an hour and the var's are still at 0. Will let run for the rest of today.
 
Ross,
 
We have a board loaded up with 3.02.17a firmware and pasted your test PLC code and running. All VARs are still zero after 15 minutes running.
I wonder if you could put the board on an isolated network, maybe you already doing that, to see if you can still get the problem showing.
 
+++++++++++
update: the board running 3.02.17a firmware for over 1 hour with your PLC code, VARs are all zero no change.
 
Well, it's been misbehaving all night....
 
 

START    
    SET VAR1 1  
RW:
    TSTEQ VAR1 0  
    GOTO MAIN   
    MOD VAR1 10000 RAM2
    DIV VAR1 10000 VAR7
    ANDB VAR7 15 VAR7
    MUL VAR7 16 RAM1     <--- THIS WORKS PROPERLY
    ROTL VAR7 4 RAM1     <--- THIS DOES NOT
    SET VAR2 RAM1  
    ADD RAM1 RAM2 RAM1
    SET VAR8 RAM1  
    DELAY 500   
    GOTO RW 

 
If I change the ROTL 4 to a MUL 16  the problem of the "self-incrementing" registers completely disappears.
I'm now waiting to see if the "periodic restoration of VAR1" is fixed or not...
 
Ross,
 
Your code is not complete. where is goto main goto?  I did not see the label in your PLC code.  Also missing END.  You most cut portion of your code to here....When I paste your code into my testing board, it gave errors.
 
Todster, that MOD is for getting the reminder, so that MOD   VAR1 1000  RAM2, when VAR1 is 1, RAM2 will have 1.    Are you getting increment on first program on this last program?  We tested first PLC code Ross posted, that does not increment.  This is a different program, its VAR1 was not zero, so that it supposedly increment, I think.
 
Ross,
 
I modified your code to remove GOTO Main, and added END as last line, so far VAR1 stays at 1, no other VAR change value
START   
 SET VAR1 1 
RW:
 TSTEQ VAR1 0 
 MOD VAR1 10000 RAM2
 DIV VAR1 10000 VAR7
 ANDB VAR7 15 VAR7
 MUL VAR7 16 RAM1
 ROTL VAR7 4 RAM1
 SET VAR2 RAM1 
 ADD RAM1 RAM2 RAM1
 SET VAR8 RAM1 
 DELAY 500  
 GOTO RW  
 END  
 
todster, could you please copy and paste this program into your board and see if any other value change?  Which version firmware do you run?
 
CAI_Support said:
Ross,
 
Your code is not complete. where is goto main goto?  I did not see the label in your PLC code.  Also missing END.  You most cut portion of your code to here....When I paste your code into my testing board, it gave errors.
 
Todster, that MOD is for getting the reminder, so that MOD   VAR1 1000  RAM2, when VAR1 is 1, RAM2 will have 1.    Are you getting increment on first program on this last program?  We tested first PLC code Ross posted, that does not increment.  This is a different program, its VAR1 was not zero, so that it supposedly increment, I think.
 
The code is indeed incomplete - there's a whole bunch of code after it, because I only have one board, I need a QUICK way to get it back in service doing its job.
 
The test
    TSTEQ VAR1 0  
    GOTO MAIN  
lets me simply write 0 to VAR1 to return the board to active duty. Once it gets out to "main", it never returns.
 
Changing from ROTL to MUL has completely eliminated the second issue I was having.
The intermittent restoration of VAR1 is still happening - although not so much. (Only once in the last 3 hours).
A section of the code running now is:
 


MAIN:
    TSTNE VAR1 0  
    GOTO MANUAL   
    SET WSRPLY 0  
    WEBSET URL1 12  

.....
MANUAL:
    EMAIL EM1   
    MOD VAR1 10000 RAM2
    DIV VAR1 10000 VAR7
    ANDB VAR7 15 VAR7

    MUL VAR7 16 RAM1
    ADD RAM1 RAM2 RAM1
    SET VAR1 RAM1  
 
So when I write (say) "120001" into var1, the main loop picks up that it's non-zero, branches to "manual"
In manual, it takes the "command" (12) and the unit address (0001) and separates them out.
The Command (12) is written to VAR7, the unit address (1) is in RAM2.
The command is ANDed with 0x0F (since the command word is only 4 bits long).
The command is then shifted left 4 bits (MUL 16) to align it in the most-significant 4 bits of the word
Then the unit address is added (in the lower 4 bits), making an 8-bit-word that I send downrange.
The "recalculated" (ie, bit-aligned rather than decimal-aligned) word is re-written to VAR1.
 
This is the interesting thing - the occasional restored value of VAR1 - is this NEW VALUE.
So it's absolutely, definately, being restored by SOMETHING INSIDE THE WEBCONTROL, and not by a packet being re-sent or reprocessed somehow.
 
Wayne, do you want to chat about this in IRC?
 
CAI_Support said:
Ross,
 
I modified your code to remove GOTO Main, and added END as last line, so far VAR1 stays at 1, no other VAR change value
START   
 SET VAR1 1 
RW:
 TSTEQ VAR1 0          <-- you have to remove this one too.
 MOD VAR1 10000 RAM2   <-- or this line will be skipped.
 DIV VAR1 10000 VAR7
 ANDB VAR7 15 VAR7
 MUL VAR7 16 RAM1
 ROTL VAR7 4 RAM1
 SET VAR2 RAM1 
 ADD RAM1 RAM2 RAM1
 SET VAR8 RAM1 
 DELAY 500  
 GOTO RW  
 END  
 
todster, could you please copy and paste this program into your board and see if any other value change?  Which version firmware do you run?
 
Back
Top