[CUWiN-Dev] CUWiN 1.0 wishlist?

Mon Jun 27 19:33:26 CDT 2005

On Mon, 27 Jun 2005, David Young wrote:

> On Mon, Jun 27, 2005 at 10:29:29AM -0500, Paul Smith wrote:
>> David Young <dyoung at pobox.com> wrote on 23/Jun/05 at  5:20 PM:
>>
>>> What do we need to do add/change in CUWiN before release 1.0?
>>> Discuss. :-) I am especially hoping for input from Bill & Paul at CNT.
>>

    [ .. snip .. ]

>> 2. Guidance on how to test hslsd in a controlled way. What is your build/test
>> environment like? Do you have some sort of test setup that you feed packets
>> to hslsd with? How do you know when it's "ready"? We have a test bed in our
>> lab but it's difficult to control for certain variables, it would be nice if
>> we knew how to run hslsd like a brain in a vat to check for predictable
>> outcomes.
>
> I do not have a controlled test environment.  I have a small (6-node)
> testbed inside my office that I am constantly cannibalizing and
> re-building.

sounds familiar :)

> There's been a lot of talk about "brain in a vat" testing.  We simply
> do not have the resources to do that.
>
>> In trying to setup a multi-hop test bed we've had hslsd produce some
>> non-intuitive routing tables, had issues with nodes going offline only to
>> find them in the debugger when we check with the serial console (watchdog
>> seems to kick in right after we check serial console), and apparent
>> inconsistencies between the routing table and the routeviz output.
>
> Please file PRs on these issues.  I need to know if hslsd is producing
> "non-intuitive" routing tables (what does this mean?).  Also, nodes should
> not drop into the debugger.  I need for you to send me a stack trace
> (trace/u at the db> prompt) when that happens.  The watchdog is no help
> if it does not fire until the serial console is attached; just disable it.

I've disabled the watchdog on our testbed, so we'll see if it happens 
again.  We'll try to get more systematic now that our testbed is stable 
(no more cannibalizing).  I can give you some symptoms I've seen in the 
last few days.  Right now everything seems stable, most these issues were 
seen and then went away on reboot of network, not sure what circumstances 
triggered them:

- default route periodically disappears and reappears

- no default route --> hsls found to be not running, restarted manually OK

- asymmetric routes.. gateway A shows route to node C through node B, node
   C has direct route to node A (this might be fine, but I wanted to check)

- routeviz caching issues:  This was pretty misleading in trying to
   troubleshoot.  First the browser was caching some of the images; a
   no-cache <meta> tag should probably be added to the html headers.  Then
   I found that the /var/db/routeviz_layout.cache file wasn't being updated
   (actually the timestamp was new, but the data was old).  Manually
   deleting it gave me routeviz output that matched the contents of
   /var/db/linkstates & /var/db/vizlinks.

Paul already mentioned our attempts to attenuate links in our testbed to 
encourage multiple hops.  The beacons from 'tcpdump -ne -y 
ieee802_11_radio' on our "good" links were 60-70dB, compared with just a 
few dB for our "bad" links, yet the default route would flip back and 
forth between nodes with good & bad links.  I didn't check the routes in 
both directions, perhaps there was some asymmetry.  I'm going to be 
looking at this again now that I'm reading the ETX metric from the hslsd 
dumps directly.

A few questions:

- hslsd dumps the linkstates periodically and when sent SIGINFO.  Is the
   period fixed (& if so what is it)?

- If a node goes down, how long before its route removed from routing
   table?  How long before it disappears from the linkstate dump?

- Is there a way to compile the system to bypass the debugger db> prompt
   and let the watchdog kick in?  That is, for nodes we field, not our
   testbed nodes.. Though my assumption the debugger having something to do
   with the watchdog not rebooting the system may be faulty; maybe it's
   related to the serial console or something.

- I saw some of this in the /var/log/daemon earlier, but not now.  Is it a
   problem?

May 12 07:58:36 cuw hslsd: hsls_af_spf_due: due
May 12 07:58:36 cuw hslsd: hsls_af_spf_due: due
May 12 07:58:36 cuw hslsd: SPF net radius 3 -> 3
May 12 07:58:36 cuw hslsd: LSU mode HSLS_M_UNDEC -> HSLS_M_SLS
May 12 07:58:36 cuw hslsd: cannot set metric on foreign lsu
May 12 07:58:36 cuw hslsd: etx_metric_set: metric set failed: Resource temporarily unavailable
May 12 07:58:36 cuw hslsd: injected metric 320 for fec0::202:6fff:fe21:e938
May 12 07:58:37 cuw hslsd: cannot set metric on foreign lsu
May 12 07:58:37 cuw hslsd: etx_metric_set: metric set failed: Resource temporarily unavailable
May 12 07:58:37 cuw hslsd: injected metric 256 for fec0::202:6fff:fe21:e938
May 12 07:58:38 cuw hslsd: cannot set metric on foreign lsu
May 12 07:58:38 cuw hslsd: etx_metric_set: metric set failed: Resource temporarily unavailable
May 12 07:58:39 cuw hslsd: injected metric 256 for 10.0.146.77
May 12 07:58:39 cuw hslsd: cannot set metric on foreign lsu
May 12 07:58:39 cuw hslsd: etx_metric_set: metric set failed: Resource temporarily unavailable
May 12 07:58:39 cuw hslsd: injected metric 256 for fec0::202:6fff:fe21:f84c
May 12 07:58:40 cuw hslsd: hsls_af_spf_due: due
May 12 07:58:40 cuw hslsd: hsls_af_spf_due: due

>> We'd like to verify that hslsd is functioning properly and learn how to 
>> manually check its routing decisions from log or other output.
>
> Log output: activate logging for shortest paths first (SPF) using
> one of -l spf_{any,quiet,loud}.  Activate logging of expired-LSA
> purging with -l purge_{any,quiet}.  Look at the RIB updates using -l
> rib_{any,bufev,quiet}.  Information provided by -l peer_{quiet,any}
> will be useful.  You can find more of the -l options by grepping the .c
> files in hsls/, etx/, rib/, etc., for 'LOGLIB_.*SINK' .
>
> hslsd regularly dumps two databases to /var/db/.  In /var/db/linkstates
> are all of the linkstates.  In /var/db/vizlinks, the linkstates have
> been distilled into information for the visualizer.

The dump files have been helpful already.  I've looked through the hsls 
documentation before, but I'm afraid I'll have to study a bit to make 
heads or tails of the other log files you mentioned.  I feel like I need a 
big flowchart/diagram of the data flow and decision tree labeled with all 
the packets exchanged.  If such a thing exists, send me a pointer. 
routing 101 :)

thanks,
bill

--
Bill Comisky
bcomisky at pobox.com