[CUWiN-Dev] CUWiN 1.0 wishlist?
Bill Comisky
bcomisky at pobox.com
Mon Jun 27 19:33:26 CDT 2005
On Mon, 27 Jun 2005, David Young wrote:
> On Mon, Jun 27, 2005 at 10:29:29AM -0500, Paul Smith wrote:
>> David Young <dyoung at pobox.com> wrote on 23/Jun/05 at 5:20 PM:
>>
>>> What do we need to do add/change in CUWiN before release 1.0?
>>> Discuss. :-) I am especially hoping for input from Bill & Paul at CNT.
>>
[ .. snip .. ]
>> 2. Guidance on how to test hslsd in a controlled way. What is your build/test
>> environment like? Do you have some sort of test setup that you feed packets
>> to hslsd with? How do you know when it's "ready"? We have a test bed in our
>> lab but it's difficult to control for certain variables, it would be nice if
>> we knew how to run hslsd like a brain in a vat to check for predictable
>> outcomes.
>
> I do not have a controlled test environment. I have a small (6-node)
> testbed inside my office that I am constantly cannibalizing and
> re-building.
sounds familiar :)
> There's been a lot of talk about "brain in a vat" testing. We simply
> do not have the resources to do that.
>
>> In trying to setup a multi-hop test bed we've had hslsd produce some
>> non-intuitive routing tables, had issues with nodes going offline only to
>> find them in the debugger when we check with the serial console (watchdog
>> seems to kick in right after we check serial console), and apparent
>> inconsistencies between the routing table and the routeviz output.
>
> Please file PRs on these issues. I need to know if hslsd is producing
> "non-intuitive" routing tables (what does this mean?). Also, nodes should
> not drop into the debugger. I need for you to send me a stack trace
> (trace/u at the db> prompt) when that happens. The watchdog is no help
> if it does not fire until the serial console is attached; just disable it.
I've disabled the watchdog on our testbed, so we'll see if it happens
again. We'll try to get more systematic now that our testbed is stable
(no more cannibalizing). I can give you some symptoms I've seen in the
last few days. Right now everything seems stable, most these issues were
seen and then went away on reboot of network, not sure what circumstances
triggered them:
- default route periodically disappears and reappears
- no default route --> hsls found to be not running, restarted manually OK
- asymmetric routes.. gateway A shows route to node C through node B, node
C has direct route to node A (this might be fine, but I wanted to check)
- routeviz caching issues: This was pretty misleading in trying to
troubleshoot. First the browser was caching some of the images; a
no-cache <meta> tag should probably be added to the html headers. Then
I found that the /var/db/routeviz_layout.cache file wasn't being updated
(actually the timestamp was new, but the data was old). Manually
deleting it gave me routeviz output that matched the contents of
/var/db/linkstates & /var/db/vizlinks.
Paul already mentioned our attempts to attenuate links in our testbed to
encourage multiple hops. The beacons from 'tcpdump -ne -y
ieee802_11_radio' on our "good" links were 60-70dB, compared with just a
few dB for our "bad" links, yet the default route would flip back and
forth between nodes with good & bad links. I didn't check the routes in
both directions, perhaps there was some asymmetry. I'm going to be
looking at this again now that I'm reading the ETX metric from the hslsd
dumps directly.
A few questions:
- hslsd dumps the linkstates periodically and when sent SIGINFO. Is the
period fixed (& if so what is it)?
- If a node goes down, how long before its route removed from routing
table? How long before it disappears from the linkstate dump?
- Is there a way to compile the system to bypass the debugger db> prompt
and let the watchdog kick in? That is, for nodes we field, not our
testbed nodes.. Though my assumption the debugger having something to do
with the watchdog not rebooting the system may be faulty; maybe it's
related to the serial console or something.
- I saw some of this in the /var/log/daemon earlier, but not now. Is it a
problem?
May 12 07:58:36 cuw hslsd: hsls_af_spf_due: due
May 12 07:58:36 cuw hslsd: hsls_af_spf_due: due
May 12 07:58:36 cuw hslsd: SPF net radius 3 -> 3
May 12 07:58:36 cuw hslsd: LSU mode HSLS_M_UNDEC -> HSLS_M_SLS
May 12 07:58:36 cuw hslsd: cannot set metric on foreign lsu
May 12 07:58:36 cuw hslsd: etx_metric_set: metric set failed: Resource temporarily unavailable
May 12 07:58:36 cuw hslsd: injected metric 320 for fec0::202:6fff:fe21:e938
May 12 07:58:37 cuw hslsd: cannot set metric on foreign lsu
May 12 07:58:37 cuw hslsd: etx_metric_set: metric set failed: Resource temporarily unavailable
May 12 07:58:37 cuw hslsd: injected metric 256 for fec0::202:6fff:fe21:e938
May 12 07:58:38 cuw hslsd: cannot set metric on foreign lsu
May 12 07:58:38 cuw hslsd: etx_metric_set: metric set failed: Resource temporarily unavailable
May 12 07:58:39 cuw hslsd: injected metric 256 for 10.0.146.77
May 12 07:58:39 cuw hslsd: cannot set metric on foreign lsu
May 12 07:58:39 cuw hslsd: etx_metric_set: metric set failed: Resource temporarily unavailable
May 12 07:58:39 cuw hslsd: injected metric 256 for fec0::202:6fff:fe21:f84c
May 12 07:58:40 cuw hslsd: hsls_af_spf_due: due
May 12 07:58:40 cuw hslsd: hsls_af_spf_due: due
>> We'd like to verify that hslsd is functioning properly and learn how to
>> manually check its routing decisions from log or other output.
>
> Log output: activate logging for shortest paths first (SPF) using
> one of -l spf_{any,quiet,loud}. Activate logging of expired-LSA
> purging with -l purge_{any,quiet}. Look at the RIB updates using -l
> rib_{any,bufev,quiet}. Information provided by -l peer_{quiet,any}
> will be useful. You can find more of the -l options by grepping the .c
> files in hsls/, etx/, rib/, etc., for 'LOGLIB_.*SINK' .
>
> hslsd regularly dumps two databases to /var/db/. In /var/db/linkstates
> are all of the linkstates. In /var/db/vizlinks, the linkstates have
> been distilled into information for the visualizer.
The dump files have been helpful already. I've looked through the hsls
documentation before, but I'm afraid I'll have to study a bit to make
heads or tails of the other log files you mentioned. I feel like I need a
big flowchart/diagram of the data flow and decision tree labeled with all
the packets exchanged. If such a thing exists, send me a pointer.
routing 101 :)
thanks,
bill
--
Bill Comisky
bcomisky at pobox.com
More information about the CU-Wireless-Dev
mailing list