Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices failing at one site
On Tue, 2014-04-29 at 23:37 +1000, Steve at Digitronics wrote:
Typically, a device works ok for a while (up to a month) but then starts logging kernel faults and exhibiting other weird symptoms such as script failures, and vanishing scripts. Sometimes only a reboot or a power cycle will get a failed unit going again. The chances of their being actual faulty devices is now so vanishing small as to be discounted, so we are trying to figure out what it is at this site that could be causing the same persistent failures on the series of devices.
Radiation - is the unit in the path of somebody else's microwave or similar? Heat: Could the unit be above or near an intermittent heat source? Or an "accidental lens" focusing heat on the unit, such as a concave metal or glass panel? Vibration: Is the unit mounted on something that could be subject to intermittent vibration, such as a poorly mounted aircon unit? Vandalism: Is the unit anywhere where an aggrieved person might be playing silly buggers with it? Some people are really touchy about wireless devices nearby. Non-human pest activity: Rats, mice, insects etc. You'd expect visible damage if they were attacking the unit directly, but perhaps they are somehow causing vibration or overheating. If not to the unit, then possibly to the POE injector, the PSU or some other component. Swap the PSU on the suspect end to with the PSU on the other end. Does the problem follow the PSU? Swap the units at the two ends. Does the problem follow the unit, or does the previously good unit that has "never had a failure of any kind" start exhibiting the same symptoms? Is the POE source rated to provide enough power for the aggregate load of this device and any others it is supplying? Long shot, but is there any chance that the method of attaching the unit is in some way penetrating or deforming the unit in some way? Try mounting it differently just for a week or two to see if it makes a difference. Has the CAT5 cable been checked *for POE use*? Some equipment checks only that the cable is good for data... In situations like this, I also I recommend my two debugging mantras: Ain't no such thing as magic. When you have eliminated the impossible, whatever remains, *however improbable*, must be the truth. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@nullarbor.com.au) work +61 2 64957435 http://www.nullarbor.com.au mobile +61 428 957160 GPG fingerprint: 231A B066 CF91 1216 4F0F F2AC CE25 B8AA 46DC CC4F Old fingerprint: 1DB8 0599 13F0 E774 3811 6CA6 D6D0 AFA9 D91A 004C
Hello Karl, A good list. Radiation is still something that is theoretically almost possible, but it would need to be transient and intentional and seriously powerful enough to cause hardware like problems. A detailed inspection of the 3dB points zone from the antenna shows zero fixed sources of possible interference. There is one possible fixed source at about -15dB to the antenna beam path, but it is not pointing at our device, and appears to be operating at 5GHz or thereabouts. There are no fixed industrial installations within cooee of the site or the path. Heat was an early contender, and we monitored the device operating temperature carefully against ambient. Although it did on occasion get a little (<10deg) above what we would have theoretically expected it never reached a point where it would cause the hardware any concern. Also, the problems seem to bear no resemblance to time of day or ambient temperature. It is as likely to play up in the middle of a cold night as it is during a stinking hot day. The unit is mounted quite firmly on a shortish pole (<2m) near the peak of the tile roof of a 2 storey domestic dwelling. The RB is attached to the pole by a pair of stainless hose clamps that are not over-tightened. We use this same method everywhere else and have had no dramas anywhere else. There is no adjacent aircon, and if I were a vandal I would have no idea how to get to it :-) Being on a roof, it is possible that Possums may get to it, but they would probably have difficulty negotiating the pole to get to the unit or the antenna. Birds would be able to access the antenna easily enough, and they have been seen perching on the antenna, but there is no evidence of damage to the high spec UG CAT5 leading to the unit or the RG513 from the unit to the antenna. It is also unlikely that birds would be near the gear in the middle of the night (no owls or bats seen). The unit is powered by a single plugack PSU through a single POE injector (supplied by Mikrotik). Both the injector and PSU have been changed more than once, and the problem remained and did not follow the PSU/injector. The device reported voltage remains well within operating tolerances and the plugpack is overrated for the single unit it is supplying. We have tried different voltage plugpacks and have settled on the 24V unit currently attached. We have not moved any of the troubled units to the other end. The other end is "difficult" to change. One of the troubled units worked fine as a WAP in another environment prior to installation at the problem site. I am with you and Sherlock Holmes on this, but it is really starting to stretch it. It is now second on my life long list of improbable problems, right below the 1990's instance of a CPU that would not work unless illuminated. Steve. -----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Karl Auer Sent: Wednesday, 30 April 2014 08:53 To: MikroTik Public Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices failing at one site On Tue, 2014-04-29 at 23:37 +1000, Steve at Digitronics wrote:
Typically, a device works ok for a while (up to a month) but then starts logging kernel faults and exhibiting other weird symptoms such as script failures, and vanishing scripts. Sometimes only a reboot or a power cycle will get a failed unit going again. The chances of their being actual faulty devices is now so vanishing small as to be discounted, so we are trying to figure out what it is at this site that could be causing the same persistent failures on the series of devices.
Radiation - is the unit in the path of somebody else's microwave or similar? Heat: Could the unit be above or near an intermittent heat source? Or an "accidental lens" focusing heat on the unit, such as a concave metal or glass panel? Vibration: Is the unit mounted on something that could be subject to intermittent vibration, such as a poorly mounted aircon unit? Vandalism: Is the unit anywhere where an aggrieved person might be playing silly buggers with it? Some people are really touchy about wireless devices nearby. Non-human pest activity: Rats, mice, insects etc. You'd expect visible damage if they were attacking the unit directly, but perhaps they are somehow causing vibration or overheating. If not to the unit, then possibly to the POE injector, the PSU or some other component. Swap the PSU on the suspect end to with the PSU on the other end. Does the problem follow the PSU? Swap the units at the two ends. Does the problem follow the unit, or does the previously good unit that has "never had a failure of any kind" start exhibiting the same symptoms? Is the POE source rated to provide enough power for the aggregate load of this device and any others it is supplying? Long shot, but is there any chance that the method of attaching the unit is in some way penetrating or deforming the unit in some way? Try mounting it differently just for a week or two to see if it makes a difference. Has the CAT5 cable been checked *for POE use*? Some equipment checks only that the cable is good for data... In situations like this, I also I recommend my two debugging mantras: Ain't no such thing as magic. When you have eliminated the impossible, whatever remains, *however improbable*, must be the truth. Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@nullarbor.com.au) work +61 2 64957435 http://www.nullarbor.com.au mobile +61 428 957160 GPG fingerprint: 231A B066 CF91 1216 4F0F F2AC CE25 B8AA 46DC CC4F Old fingerprint: 1DB8 0599 13F0 E774 3811 6CA6 D6D0 AFA9 D91A 004C _______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.au
Hi Steve, do you have the opportunity to swap the cat5 with another which is at the same location to see if the problem follows the cable ? I remember a long time ago in my cb radio days it was common for not so nice people to push pins through your coax which would provide a painful problem to find, everything would still work but your swr on the antenna would be woeful, I'm not saying that you have pins in the cable :-) but a fault in the cable might display some strange issues and probably still happily supply power successfully Regards Paul
On 30 Apr 2014, at 12:26 pm, "Steve at Digitronics" <steve@digitronics.com.au> wrote:
Hello Karl,
A good list.
Radiation is still something that is theoretically almost possible, but it would need to be transient and intentional and seriously powerful enough to cause hardware like problems. A detailed inspection of the 3dB points zone from the antenna shows zero fixed sources of possible interference. There is one possible fixed source at about -15dB to the antenna beam path, but it is not pointing at our device, and appears to be operating at 5GHz or thereabouts. There are no fixed industrial installations within cooee of the site or the path.
Heat was an early contender, and we monitored the device operating temperature carefully against ambient. Although it did on occasion get a little (<10deg) above what we would have theoretically expected it never reached a point where it would cause the hardware any concern. Also, the problems seem to bear no resemblance to time of day or ambient temperature. It is as likely to play up in the middle of a cold night as it is during a stinking hot day.
The unit is mounted quite firmly on a shortish pole (<2m) near the peak of the tile roof of a 2 storey domestic dwelling. The RB is attached to the pole by a pair of stainless hose clamps that are not over-tightened. We use this same method everywhere else and have had no dramas anywhere else. There is no adjacent aircon, and if I were a vandal I would have no idea how to get to it :-)
Being on a roof, it is possible that Possums may get to it, but they would probably have difficulty negotiating the pole to get to the unit or the antenna. Birds would be able to access the antenna easily enough, and they have been seen perching on the antenna, but there is no evidence of damage to the high spec UG CAT5 leading to the unit or the RG513 from the unit to the antenna. It is also unlikely that birds would be near the gear in the middle of the night (no owls or bats seen).
The unit is powered by a single plugack PSU through a single POE injector (supplied by Mikrotik). Both the injector and PSU have been changed more than once, and the problem remained and did not follow the PSU/injector. The device reported voltage remains well within operating tolerances and the plugpack is overrated for the single unit it is supplying. We have tried different voltage plugpacks and have settled on the 24V unit currently attached.
We have not moved any of the troubled units to the other end. The other end is "difficult" to change. One of the troubled units worked fine as a WAP in another environment prior to installation at the problem site.
I am with you and Sherlock Holmes on this, but it is really starting to stretch it. It is now second on my life long list of improbable problems, right below the 1990's instance of a CPU that would not work unless illuminated.
Steve.
-----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Karl Auer Sent: Wednesday, 30 April 2014 08:53 To: MikroTik Public Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices failing at one site
On Tue, 2014-04-29 at 23:37 +1000, Steve at Digitronics wrote: Typically, a device works ok for a while (up to a month) but then starts logging kernel faults and exhibiting other weird symptoms such as script failures, and vanishing scripts. Sometimes only a reboot or a power cycle will get a failed unit going again. The chances of their being actual faulty devices is now so vanishing small as to be discounted, so we are trying to figure out what it is at this site that could be causing the same persistent failures on the series of devices.
Radiation - is the unit in the path of somebody else's microwave or similar?
Heat: Could the unit be above or near an intermittent heat source? Or an "accidental lens" focusing heat on the unit, such as a concave metal or glass panel?
Vibration: Is the unit mounted on something that could be subject to intermittent vibration, such as a poorly mounted aircon unit?
Vandalism: Is the unit anywhere where an aggrieved person might be playing silly buggers with it? Some people are really touchy about wireless devices nearby.
Non-human pest activity: Rats, mice, insects etc. You'd expect visible damage if they were attacking the unit directly, but perhaps they are somehow causing vibration or overheating. If not to the unit, then possibly to the POE injector, the PSU or some other component.
Swap the PSU on the suspect end to with the PSU on the other end. Does the problem follow the PSU?
Swap the units at the two ends. Does the problem follow the unit, or does the previously good unit that has "never had a failure of any kind" start exhibiting the same symptoms?
Is the POE source rated to provide enough power for the aggregate load of this device and any others it is supplying?
Long shot, but is there any chance that the method of attaching the unit is in some way penetrating or deforming the unit in some way? Try mounting it differently just for a week or two to see if it makes a difference.
Has the CAT5 cable been checked *for POE use*? Some equipment checks only that the cable is good for data...
In situations like this, I also I recommend my two debugging mantras:
Ain't no such thing as magic.
When you have eliminated the impossible, whatever remains, *however improbable*, must be the truth.
Regards, K. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Auer (kauer@nullarbor.com.au) work +61 2 64957435 http://www.nullarbor.com.au mobile +61 428 957160
GPG fingerprint: 231A B066 CF91 1216 4F0F F2AC CE25 B8AA 46DC CC4F Old fingerprint: 1DB8 0599 13F0 E774 3811 6CA6 D6D0 AFA9 D91A 004C
_______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.au
_______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.au
----- No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4570 / Virus Database: 3920/7404 - Release Date: 04/27/14
Hello Paul, Nice catch. The CAT5 is the last item on our possible list. We have it scheduled for replacement this coming weekend. The current run is high quality underground rated CAT5. It is only around 15-20m in a domestic environment with no identified likely sources of large electrical transients. However, as pretty much a guess of last resort, we intend to replace it with top spec shielded CAT5 properly grounded to eliminate any possibility of induced transients from the environment. Fingers are utterly crossed. Steve. -----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Paul Julian Sent: Wednesday, 30 April 2014 13:01 To: MikroTik Australia Public List Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices.failing at one site Hi Steve, do you have the opportunity to swap the cat5 with another which is at the same location to see if the problem follows the cable ? I remember a long time ago in my cb radio days it was common for not so nice people to push pins through your coax which would provide a painful problem to find, everything would still work but your swr on the antenna would be woeful, I'm not saying that you have pins in the cable :-) but a fault in the cable might display some strange issues and probably still happily supply power successfully Regards Paul
I sincerely hope that might resolve it, water in a cable or anything can cause so many strange issues. I'll keep thinking anyway :-) Regards Paul
On 30 Apr 2014, at 1:18 pm, "Steve at Digitronics" <steve@digitronics.com.au> wrote:
Hello Paul,
Nice catch. The CAT5 is the last item on our possible list. We have it scheduled for replacement this coming weekend.
The current run is high quality underground rated CAT5. It is only around 15-20m in a domestic environment with no identified likely sources of large electrical transients. However, as pretty much a guess of last resort, we intend to replace it with top spec shielded CAT5 properly grounded to eliminate any possibility of induced transients from the environment. Fingers are utterly crossed.
Steve.
-----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Paul Julian Sent: Wednesday, 30 April 2014 13:01 To: MikroTik Australia Public List Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices.failing at one site
Hi Steve, do you have the opportunity to swap the cat5 with another which is at the same location to see if the problem follows the cable ?
I remember a long time ago in my cb radio days it was common for not so nice people to push pins through your coax which would provide a painful problem to find, everything would still work but your swr on the antenna would be woeful, I'm not saying that you have pins in the cable :-) but a fault in the cable might display some strange issues and probably still happily supply power successfully
Regards Paul
_______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.au
----- No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4570 / Virus Database: 3920/7404 - Release Date: 04/27/14
Hi Steve, Totally outside the square here.. but here goes... You mentioned an UPS inline to the POE injector. Given your mention of a domestic setup let's assume cheap line interactive ( if that matters ) If the UPS detected mains issues ( brown-out, drop etc ) could the switch over from mains to battery/inverter ( 50ms? ) cause issues downstream to the tik ? I am sure you guys are more well-versed in grounding etc than me, but I have seen electronics so some weird and wonderful stuff because of grounding issues. cheers Paul
-----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Steve at Digitronics Sent: Wednesday, 30 April 2014 11:20 AM To: 'MikroTik Australia Public List' Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices.failing at one site
Hello Paul,
Nice catch. The CAT5 is the last item on our possible list. We have it scheduled for replacement this coming weekend.
The current run is high quality underground rated CAT5. It is only around 15-20m in a domestic environment with no identified likely sources of large electrical transients. However, as pretty much a guess of last resort, we intend to replace it with top spec shielded CAT5 properly grounded to eliminate any possibility of induced transients from the environment. Fingers are utterly crossed.
Steve.
-----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Paul Julian Sent: Wednesday, 30 April 2014 13:01 To: MikroTik Australia Public List Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices.failing at one site
Hi Steve, do you have the opportunity to swap the cat5 with another which is at the same location to see if the problem follows the cable ?
I remember a long time ago in my cb radio days it was common for not so nice people to push pins through your coax which would provide a painful problem to find, everything would still work but your swr on the antenna would be woeful, I'm not saying that you have pins in the cable :-) but a fault in the cable might display some strange issues and probably still happily supply power successfully
Regards Paul
_______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.a u
Hello Paul, The location is domestic, but the equipment isn't. The UPS is supplying a few items as well as the POE plugpack, and it is of the sine wave no-break variety which does not have any gap in supply when becoming active. Grounding can be quite problematic, but hopefully we can get it right. The trick is to avoid circulating currents in the shield. Its no good when you Faraday cage radiates its own energy. Given the mast and the PSU don't share any meaningful sort of ground reference it should not be too difficult. Steve. -----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Paul Arch Sent: Wednesday, 30 April 2014 18:02 To: 'MikroTik Australia Public List' Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices.failing at one site Hi Steve, Totally outside the square here.. but here goes... You mentioned an UPS inline to the POE injector. Given your mention of a domestic setup let's assume cheap line interactive ( if that matters ) If the UPS detected mains issues ( brown-out, drop etc ) could the switch over from mains to battery/inverter ( 50ms? ) cause issues downstream to the tik ? I am sure you guys are more well-versed in grounding etc than me, but I have seen electronics so some weird and wonderful stuff because of grounding issues. cheers Paul
Letting anyone interested know that we changed and earthed the CAT5 last weekend, and are awaiting any further failures ... -----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Steve at Digitronics Sent: Wednesday, 30 April 2014 13:20 To: 'MikroTik Australia Public List' Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices.failing at one site Hello Paul, Nice catch. The CAT5 is the last item on our possible list. We have it scheduled for replacement this coming weekend. The current run is high quality underground rated CAT5. It is only around 15-20m in a domestic environment with no identified likely sources of large electrical transients. However, as pretty much a guess of last resort, we intend to replace it with top spec shielded CAT5 properly grounded to eliminate any possibility of induced transients from the environment. Fingers are utterly crossed. Steve. -----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Paul Julian Sent: Wednesday, 30 April 2014 13:01 To: MikroTik Australia Public List Subject: Re: [MT-AU Public] Problems with multiple RBMetal2SHPn devices.failing at one site Hi Steve, do you have the opportunity to swap the cat5 with another which is at the same location to see if the problem follows the cable ? I remember a long time ago in my cb radio days it was common for not so nice people to push pins through your coax which would provide a painful problem to find, everything would still work but your swr on the antenna would be woeful, I'm not saying that you have pins in the cable :-) but a fault in the cable might display some strange issues and probably still happily supply power successfully Regards Paul _______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.au
participants (4)
-
Karl Auer
-
Paul Arch
-
Paul Julian
-
Steve at Digitronics