[Commotion-dev] Stress test results

Ben West ben at gowasabi.net
Tue Nov 12 03:05:22 UTC 2013


I recently saw a WasabiNet node whose adhoc interface died due to memory
exhaustion, and I should point out on a node *not* running serval or
commotiond.  (I.e. since this isn't Commotion-OpenWRT firmware I'm writing
about, but very similar.)  The relevant dmesg I've sent to the
OpenWRT-devel list, which you can read here:

https://lists.openwrt.org/pipermail/openwrt-devel/2013-November/022398.html
https://lists.openwrt.org/pipermail/openwrt-devel/2013-November/022399.html

If particular interest is that, since the node didn't spontaneously reboot
or become inaccessible, I SSH'ed in and found a 240Kbyte dump file that
wpa_supplicant had written to /tmp , with about the same timestamp as when
memory errors began appearing in syslog.  Possibly this points to
wpa_supplicant itself as source of intermittent memory leaks?  I'm using
the wpad package, and I haven't yet had a chance to try out the version of
wpad-mini modified to include IBSS-RSN support.  I would be curious if
switching to wpad-mini has any effect on the memory errors you're seeing,
Will.

Besides that, do please note that I run the coovachilli captive portal,
instead of NDS, and coova is definitely a memory hog of questionable
stability.  That is, coova may end up being my problem, making this
unrelated to Commotion.

Finally, check out these recommended kernel tweaks from OpenWRT-devel for
having the node just spontaneously reboot upon OOM error or kernel oops.
Naturally, these wouldn't fix the memory problem itself, but good to know
for reference.

" for routers in production i prefer setting
/proc/sys/vm/panic_on_oom = 2
/proc/sys/kernel/panic = 10

also if you like
/proc/sys/kernel/panic_on_oops = 1 "



On Mon, Nov 11, 2013 at 6:53 PM, Will Hawkins <
hawkinsw at opentechinstitute.org> wrote:

> Using go (yes, that's right!) I was able to create a test program that
> opened enough simultaneous HTTP connections to force a crash.
>
> Thanks to the fact that we were running a serial console that was
> logging Pico station console output, we were able to capture the crash
> information. I am attaching that here.
>
> Overall, it looks like the node simply runs out of memory. The first
> errors are when malloc()s in servald fail Then, when things get really
> bad, there are errors from the wireless driver saying that it cannot
> allocate buffer space.
>
> Obviously the failures from the wireless driver are bad. They are
> probably ultimately what causes the node to reboot.
>
> I wonder, though, about the servald malloc() failures. I'm not sure if
> they are pure symptom (i.e, servald just happens to be the application
> most commonly allocating memory space when the crash happens and so its
> malloc()s fail first), or if it is part of the problem (i.e, servald
> causes memory usage to skyrocket under heavy load and *then* these
> larger memory problems start to occur).
>
> In any event, we got some logs, which is a good first step!
>
> Will
>
> _______________________________________________
> Commotion-dev mailing list
> Commotion-dev at lists.chambana.net
> https://lists.chambana.net/mailman/listinfo/commotion-dev
>
>


-- 
Ben West
http://gowasabi.net
ben at gowasabi.net
314-246-9434
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chambana.net/pipermail/commotion-dev/attachments/20131111/d50e8a52/attachment.html>


More information about the Commotion-dev mailing list