On Tue, 2014-04-29 at 23:37 +1000, Steve at Digitronics wrote:
Typically, a device works ok for a while (up to a month) but then starts logging kernel faults and exhibiting other weird symptoms such as script failures, and vanishing scripts. Sometimes only a reboot or a power cycle will get a failed unit going again. The chances of their being actual faulty devices is now so vanishing small as to be discounted, so we are trying to figure out what it is at this site that could be causing the same persistent failures on the series of devices.
Radiation - is the unit in the path of somebody else's microwave or similar? Heat: Could the unit be above or near an intermittent heat source? Or an "accidental lens" focusing heat on the unit, such as a concave metal or glass panel? Vibration: Is the unit mounted on something that could be subject to intermittent vibration, such as a poorly mounted aircon unit? Vandalism: Is the unit anywhere where an aggrieved person might be playing silly buggers with it? Some people are really touchy about wireless devices nearby. Non-human pest activity: Rats, mice, insects etc. You'd expect visible damage if they were attacking the unit directly, but perhaps they are somehow causing vibration or overheating. If not to the unit, then possibly to the POE injector, the PSU or some other component. Swap the PSU on the suspect end to with the PSU on the other end. Does the problem follow the PSU? Swap the units at the two ends. Does the problem follow the unit, or does the previously good unit that has "never had a failure of any kind" start exhibiting the same symptoms? Is the POE source rated to provide enough power for the aggregate load of this device and any others it is supplying? Long shot, but is there any chance that the method of attaching the unit is in some way penetrating or deforming the unit in some way? Try mounting it differently just for a week or two to see if it makes a difference. Has the CAT5 cable been checked *for POE use*? Some equipment checks only that the cable is good for data... In situations like this, I also I recommend my two debugging mantras: Ain't no such thing as magic. When you have eliminated the impossible, whatever remains, *however improbable*, must be the truth. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@nullarbor.com.au) work +61 2 64957435 http://www.nullarbor.com.au mobile +61 428 957160 GPG fingerprint: 231A B066 CF91 1216 4F0F F2AC CE25 B8AA 46DC CC4F Old fingerprint: 1DB8 0599 13F0 E774 3811 6CA6 D6D0 AFA9 D91A 004C