[CUWiN-Dev] Re: ath0: hardware error

Bill Comisky bcomisky at pobox.com
Mon Aug 29 18:27:12 CDT 2005


On Mon, 29 Aug 2005, Bill Comisky wrote:

> On Thu, 25 Aug 2005, David Young wrote:
>
>>  On Thu, Aug 25, 2005 at 03:29:16PM -0500, Bill Comisky wrote:
>> >  On Sun, 21 Aug 2005, David Young wrote:
>> > 
>> > >  I added some debug messages as I tried to track down the source of
>> > >  "ath0: hardware error; resetting".  The debug messages have not shown
>> > >  me any obvious problem, but they do seriously slow down some nodes,
>> > >  so I am taking them out.
>> > > 
>> > >  Apply these patches to your NetBSD sources.  They take out the noisy
>> > >  debug messages:
>> > > 
>> > >  Apply ath-undo-1 in sys/dev/ic/, ath-undo-2 in sys/dev/:
>> > > 
>> > > %  cd your-netbsd-sources/src/sys/dev/ic
>> > > %  patch < ath-undo-1
>> > > %  cd -
>> > > %  cd your-netbsd-sources/src/sys/dev
>> > > %  patch < ath-undo-2
>> > > 
>> > >  Dave
>> > 
>> >  Should I be seeing any "ath0: hardware error..." messages after these
>> >  patches have been reverse applied?  I'm still seeing them, sometimes a 
>> >  lot
>> >  of them.  I can see the CPU "% interrupt" in top shoot up when they're
>> >  spewing to about 60% (with hw.ath0.debug=0x80000000 commented out in
>> >  sysctl.conf).  I feel like my build broke somehow, though I deleted the
>> >  build directory and unpacked the source fresh before "patch -R".
>>
>>  You may still see "ath0: hardware error...", but they should not come
>>  with the Rn and Tn lines that you used to see.
>>
>>  I don't know what causes ath0: hardware error.  The author of the driver
>>  tells me that I may need a PCI bus analyzer to figure it out. :-(
>>
>>  I am a PCI novice, but I strongly suspect that the error has something
>>  to do with PCI bus contention:
>>
>>  1) I get a lot more "ath0: hardware error" indications on my Soekris
>>  net4521 when it carries one or two Cardbus WiFi cards in addition to
>>  the MiniPCI Atheros card.  I scarcely get any such indications when the
>>  MiniPCI cards are not in there.
>>
>>  2) The madwifi driver (Linux version of ath, also by
>>  Sam Leffler) sets an unusually large PCI Latency Timer,
>>  <http://cvs.sourceforge.net/viewcvs.py/madwifi/madwifi/ath/if_ath_pci.c?only_with_tag=HEAD&view=markup>.
>>
>>  Madwifi may also set some other parameters differently from NetBSD,
>>  which accepts defaults---I will check a little later.
>>
>>  3) A discussant at
>>  <http://www.broadbandreports.com/forum/remark,9086546~mode=flat> mentions
>>  that changing his PCI Latency Timer from 32 to 64 helped prevent some
>>  system lockups when his Atheros card was activated.
>>
>>  4) The Soekris BIOS does not set PCI minimum grant, maximum latency,
>>  or latency timer that make any sense according to the explanation given
>>  at <http://www.reric.net/linux/pci_latency.html>.
>>
>>  Bill, will you do me a favor, and send me both dmesg(8) and pcictl(8)
>>  output for your Atheros card?  Here is how I got the pcictl info I wanted:
>
> Dave,
>
> After many upgrade/reboot cycles, I've found out a few things about the ath0 
> errors I'm seeing.  I found that only our local builds with our local 
> modifications were going into the endless "ath0: hardware error" loop. After 
> some more experimenting, I found that the changing the SSID can trigger these 
> errors.  We had changed our local cuw_config_ssid setting to 
> 'cntwireless.net', and those builds were getting ath0 errors as soon as the 
> devices were brought up.  I found I could set off the errors and stop them by 
> calling 'ifconfig ath0 ssid' with a new SSID from the command line. 
> 'cuwireless.net' or anything of the same length always seems ok, anything 
> longer triggered the errors.  Sometimes smaller SSIDs would trigger the error 
> as well; though it seemed to depend on what you were changing from. 
> Sometimes the same SSID would cause the error and sometimes not, depending on 
> what you were changing from.  Perhaps some important chunk of memory is being 
> written over somewhere?  The repeatability of errors at boot time wasn't 
> 100%; occasionally (maybe only on 1st reboot after an upgrade?) I would get 
> results that confounded my expectation (errors when not expecting them, or no 
> errors when expecting them).
>
> My first set of experiments I did on source with the ath_undo patches reverse 
> applied.  In this case, I either got no console messages when switching SSIDs 
> (to cuwireless.net for example), or I got repeated groups of the following 
> when changing to a "bad" ssid, such as cntwireless.org:
>
> ath0: hardware error; resetting
> ath_stoprecv: rx queue 0x10f14fc, link 0xc5dff4d0
> [followed by a bunch of R0 lines]
>
> To make sure it wasn't something with our build process or source, I 
> downloaded and installed the CUWiN 0.5.8 release; which I don't think has the 
> ath_undo patches.  In this case, I got ath0: console messages (with more 
> stuff than above, see attachments) whenever I set the SSID, but the "bad" 
> cases set of an endless loop of these messages.  When I repeated the 
> experiments on a net4526 (had been working on a net4511), I couldn't 
> reproduce the endless loop.  I've attached files with console dumps of these 
> last experiments showing dmesg, pcictl dumps, and the output from "ifconfig 
> ath0 ssid".  I noticed that the radio firmware had a different version 3.6 
> vs. 4.6 for the two radios I used from the dmesg output, so I repeated the 
> net4526 experiment after switching the radio.  I still didn't get the endless 
> stream of ath0 errors. For all of these tests, there were no other nodes up.
>
> Can you reproduce the repeated ath0: errors on a net4511 with a "ifconfig 
> ath0 ssid somebigssidhere"?  Does this give you any clues as to where the 
> problem may be?

An addendum:  With my local build (with ath_undo patches and some 
customizations) I do see the same stream of ath0: hardware errors on the 
net4526.  I can stop them by setting the SSID to cuwireless.net and start 
them by setting it to something longer.

bill

--
Bill Comisky
bcomisky at pobox.com


More information about the CU-Wireless-Dev mailing list