[Commotion-dev] [OTI-Tech] LTS Testing Update

Ben West ben at gowasabi.net
Wed Apr 24 23:43:55 UTC 2013


Hi Will,

Glad that an apparent bug with dhcp was caught.

Please see my response about IBSS-RSN issues below in green.

On Wed, Apr 24, 2013 at 5:49 PM, Will Hawkins <
hawkinsw at opentechinstitute.org> wrote:

> A few notes to consider after some testing today with LTS and at the
> office:
>
> - Collectd "stalled" one of the nodes at LTS (where collectd is still
> enabled)
>
> - Stations seem to lose their mind w.r.t IBSS RSN and authorization. I'm
> watching debugging output from wpa_supplicant on one of the nodes in the
> office to see if I can determine the problem. Of course, it's working
> great now :-)
>
>
I've observed a definite issue with nodes not reliably 'authorizing'
themselves when joining an IBSS-RSN adhoc network, at least since r33202 or
so.  No idea on a cause, besides IBSS-RSN simply being buggy.

Besides doing something crude like putting 'sleep 30 ; wifi restart' in the
file /etc/rc.local, I have also written a slightly less crude hotplug.d
script that attempts to restart the wifi interface, if the node finds
itself 'authorized' but not 'authenticated' on the adhoc network.  I've
attached that script to this email, and you can save it on a node as *
/etc/hotplug.d/firewall/20_mesh_auth_check*.

Please note this is not a 100% effective solution, as the problem is not
just with a newly powered on node not consistently authorizing itself, but *
also* with some of the existing nodes in the adhoc network not consistently
authorizing the new node on their end too.  That is, I've seen instances
where node A lists node B as both 'authenticated' and 'authorized', but
where node B lists node A as 'authenticated' and *not* 'authorized.'  Oy.

So, when a new node joins an RSN-encrypted adhoc network, it looks like the
following steps must happen to ensure all nodes are authorized:

A. Newly powered-up node inspects output of 'iw wlan0 station dump' looking
for entries where a remote node is 'authenticated' but *not* 'authorized,'
and if so, restart the wifi and retry test.  Ideally, the node would repeat
this process X times until giving up.  The script I'm attaching tries to do
this, albeit without the option to give up after X times.

B. All existing nodes periodically via cronjob check their own local output
of 'iw wlan0 station dump', looking for new entries where a new remote node
is 'authenticated' but *not* 'authorized,'  If such an entry is found,
restart the *local* node's wifi and retry test.

Having both of these steps occur simultaneously on all nodes clearly could
lead to lots of ugly race conditions, so it's not ideal.  Likewise, its
even less ideal for a particular node with active clients to restart its
wifi just because a new node powered on, but didn't get successfully
authorized.  Maybe an alternate way to achieve step B is just to have the
newly powered-up node repeatedly restart its wifi *until* it can
successfully ping all other nodes that appear on the adhoc network, tho
would be tedious and make startup very slow.

- luci_splash got itself into a nice "wedge" on one of the LTS nodes. I
> am going to do my best to get it unstuck. If we continue to see the
> problem, we'll have to take a hard look at pushing the splash rewrite to
> a higher priority.
>
> Will
>
> On 04/24/2013 04:26 PM, Ben West wrote:
> > Hi Will,
> >
> > I just checked config on a node again (running Attitude Adjustment circa
> > r35xxx), and I found these lines in /etc/crontabs/root which had been
> > commented out:
> >
> > #* * * * *      /usr/sbin/ff_olsr_test_gw.sh
> > #*/5 * * * *    /usr/sbin/ff_olsr_watchdog
> >
> > So, be on the lookout for ff packages that deploy these scripts,
> > although unfortunately it's not clear /which/ package includes these
> > particular files.  Maybe freifunk-common?
> >
> > On Wed, Apr 24, 2013 at 2:28 PM, Will Hawkins
> > <hawkinsw at opentechinstitute.org <mailto:hawkinsw at opentechinstitute.org>>
> > wrote:
> >
> >     Thanks for your response, Ben. We just recompiled an image w/o most
> of
> >     the ff software. We are now testing that image to see if things are
> any
> >     better. We will definitely note how ff-watchdog may be useful and how
> >     ff-gw-check is the likely culprit ;-)
> >
> >     Will
> >
> >     On 04/24/2013 02:17 PM, Ben West wrote:
> >     > The Freifunk watchdog package is actually a rather handy package,
> >     since
> >     > it will monitor any process you want (via periodic cronjob) and
> >     restart
> >     > that service if the active process disappears (aka crashes).  To my
> >     > knowledge, it doesn't directly start/stop any network interfaces.
> >      But,
> >     > ff-watchdog does need to be configured to monitor the processes
> >     you care
> >     > about, and to not conflict with any other watchdog-style task.
>  That
> >     > conflict may be indirectly causing interfaces to go down or even
> olsrd
> >     > to stop in absence of a needed interface.
> >     >
> >     > Its config file is /etc/config/freifunk-watchfog, and here is an
> >     example
> >     > config I've used (for node using coovachilli):
> >     >
> >     > config process
> >     >     option process 'dropbear'
> >     >     option initscript '/etc/init.d/dropbear'
> >     >
> >     > config process
> >     >     option process 'crond'
> >     >     option initscript '/etc/init.d/cron'
> >     >
> >     > config process
> >     >     option process 'olsrd'
> >     >     option initscript '/etc/init.d/olsrd'
> >     >
> >     > config process
> >     >     option process 'chilli'
> >     >     option initscript '/etc/init.d/coovachilli'
> >     >
> >     > Are you sure you weren't having problems with the ff-gw-check
> package
> >     > instead?  I.e. un-installed that package at the same time as
> >     > un-stinalling ff-watchdog?  I think the gw-check package /will
> muck/
> >     > with default routes and possibly also restart active network
> >     interfaces
> >     > if it can't get a successful ping to freifunk.net
> >     <http://freifunk.net> <http://freifunk.net>
> >     > or something.
> >     >
> >     > On Wed, Apr 24, 2013 at 8:07 AM, Dan Staples
> >     > <danstaples at opentechinstitute.org
> >     <mailto:danstaples at opentechinstitute.org>
> >     > <mailto:danstaples at opentechinstitute.org
> >     <mailto:danstaples at opentechinstitute.org>>> wrote:
> >     >
> >     >     Moving this discussion to commotion-dev...
> >     >
> >     >     When I was previously setting the wireless interfaces to use
> >     channel 9
> >     >     instead of channel 5, the freifunk watchdog would routinely
> >     bring down
> >     >     the wireless interfaces. And I have no idea why. The only way
> >     I got it
> >     >     to work was uninstalling ff-watchdog. So see if that may be a
> >     reason why
> >     >     wireless interfaces are unavailable...there should be a note
> >     about it in
> >     >     logread.
> >     >
> >     >     I've also noticed that something is killing olsrd on DR1
> >     nodes, without
> >     >     any clue in the log. The routing table will still have stale
> >     routes in
> >     >     it, indicating that olsrd isn't exiting cleanly. I wonder if
> >     it's being
> >     >     killed by the out-of-memory watchdog. When I was
> >     troubleshooting this
> >     >     before, I wrote a quick script that ran as a cronjob every
> >     minute, and
> >     >     it would pgrep olsrd. If olsrd was running, it would redirect
> >     the output
> >     >     of top into ~/top.out. If olsrd wasn't running, it would move
> >     the last
> >     >     ~/top.out as well as logread into a separate directory. That
> way,
> >     >     whenever olsrd was killed, there would be a record of top the
> >     minute
> >     >     before it crashed, as well as the log. Would this be useful for
> >     >     troubleshooting the LTS nodes?
> >     >
> >     >
> >     >
> >     > --
> >     > Ben West
> >     > http://gowasabi.net
> >     > ben at gowasabi.net <mailto:ben at gowasabi.net>
> >     <mailto:ben at gowasabi.net <mailto:ben at gowasabi.net>>
> >     > 314-246-9434 <tel:314-246-9434> <tel:314-246-9434 <tel:
> 314-246-9434>>
> >     >
> >     >
> >     > _______________________________________________
> >     > Commotion-dev mailing list
> >     > Commotion-dev at lists.chambana.net
> >     <mailto:Commotion-dev at lists.chambana.net>
> >     > https://lists.chambana.net/mailman/listinfo/commotion-dev
> >     >
> >     _______________________________________________
> >     Commotion-dev mailing list
> >     Commotion-dev at lists.chambana.net
> >     <mailto:Commotion-dev at lists.chambana.net>
> >     https://lists.chambana.net/mailman/listinfo/commotion-dev
> >
> >
> >
> >
> > --
> > Ben West
> > http://gowasabi.net
> > ben at gowasabi.net <mailto:ben at gowasabi.net>
> > 314-246-9434
>



-- 
Ben West
http://gowasabi.net
ben at gowasabi.net
314-246-9434
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chambana.net/pipermail/commotion-dev/attachments/20130424/76338124/attachment.html>
-------------- next part --------------
#!/bin/sh

. /lib/functions.sh
. /lib/functions/network.sh

statefile="/var/run/mesh-auth-check.state"
if [ -f "$statefile" ]; then
	state=`cat $statefile`
else
	state="check"
	echo "$state" > "$statefile"
fi
encryption=`uci get wireless.wlan0.encryption`
[ "$encryption" = "psk" -o "$encryption" = "psk2" ] && encryption="psk"

# check if wlan0 successfully authorized with remote radios
if [[ "add" = "$ACTION" && "mesh" = "$INTERFACE" && "wlan0" = "$DEVICE" && "check" = "$state" && "psk" = "$encryption" ]]; then

	# Check for stations that are authenticated but not authorized
	not_authorized=`iw wlan0 station dump | grep -B 1 'authenticated.*yes' | grep no`
	if [ -n "$not_authorized" ]; then
		logger wlan0 didnt auth successfully with remote radios
		sleep 5 
		state="nocheck"
		echo "$state" > "$statefile"
		logger stopping wlan0
		wifi down
		sleep 5
		state="check"                                                                               
		echo "$state" > "$statefile"
		logger restarting wlan0
		wifi up
	fi
fi


More information about the Commotion-dev mailing list