[Commotion-dev] [Commotion-discuss] Memory Issues and Nightly Builds

Dan Staples danstaples at opentechinstitute.org
Wed Nov 6 17:54:10 UTC 2013


Since Dan Hastings has seen this happen with a lot of simultaneous
clients and with high-memory components disabled, it sounds like that is
likely the cause. Do you know exactly where that RAM is used for each
connecting client?

Dan, can you provide any more detailed info on exactly what was
happening when you see the node crashing? How many simultaneous users,
and what were they doing (viewing a webpage on the internet, or viewing
the node's administrative web interface, etc)?

On Wed 06 Nov 2013 12:40:06 PM EST, Ben West wrote:
> Hi Dan,
>
> Thanks for offering more detail, especially that you see the nodes
> spontaneously reboot rather than simple have services crash.
>
> I would again point out that the Picostations will have a finite limit
> for simultaneous clients.  15 to 20 clients is quite a few, each
> client requiring a portion of available of RAM.  It may be a single
> Picostation is not going to be able to sustain all of them.
>
>
>
> On Wed, Nov 6, 2013 at 10:58 AM, Dan Staples
> <danstaples at opentechinstitute.org
> <mailto:danstaples at opentechinstitute.org>> wrote:
>
>     Regarding logging, I'm not sure that will work well since the
>     nodes are
>     spontaneously rebooting themselves (due to OOM conditions), not
>     the user
>     rebooting them. What we're going to try to do is attach a serial
>     console
>     (thanks Will!) and try to slam the router with simultaneous users and
>     traffic.
>
>     Also, I don't think Dan is hosting local apps on the router itself
>     (correct me if I'm wrong), but just advertising them using the
>     Commotion
>     apps portal. And that's just takes a little space for the Avahi
>     service
>     file...so hopefully that's not a problem.
>
>     We'll certainly report what we find with our stress testing.
>
>     Dan
>
>     On 11/06/2013 10:37 AM, Ben West wrote:
>     > I am also seeing sporadic memory consumption issues operating
>     mesh nodes
>     > running AA r38347 in WasabiNet on Nanostation Loco M2.
>     >
>     > That is, using the same ath9k wifi driver and same underlying
>     OS, but
>     > without the Commotion-specific tools like commotiond and servald.  I
>     > will see nodes boot up with ~26Mbytes memory usage and then
>     gradually
>     > increase over the next few days until sporadic nodes start
>     crashing with
>     > page allocation failures (aka memory exhausted).  This all is
>     happening
>     > despite having 3Mbytes of compressed swap space allocated.  
>      When I am
>     > able to log into crashed nodes to inspect, I will occasionally
>     find the
>     > current memory usage to be /less/ than the average observed on
>     bootup,
>     > along with ~500Kbytes sitting in swap.
>     >
>     > This seems to suggest something is very sporadically allocating
>     itself a
>     > large chunk (multiple MBytes), but not residing in memory as
>     such, and
>     > causing other processes to crash in consequence.  I do use the
>     > coovachilli captive portal in WasabiNet, which could be a
>     culprit and
>     > thus unrelated to Commotion, but there could also be an underlying
>     > memory leak in the kernel or wifi driver.
>     >
>     > What are thoughts for having crashed nodes try to collect a
>     debug report
>     > about themselves when a crash condition is detected (e.g. no
>     Internet
>     > access, "page allocation failure" detected in syslog), and then
>     write
>     > that report to flash somewhere before the node get rebooted by its
>     > frustrated user?
>     >
>     > Besides that, do note that nodes with only 32MBytes of RAM, like
>     UBNT
>     > Picostations, are going to have difficulties hosting local apps
>     for many
>     > users.  If Dan Hasting would be able to use an alternate device with
>     > 64Mbytes+ RAM, e.g. a UBNT Rocket, Unifi, or even an indoor TP-Link
>     > router (all of which should be able to run Commotion-OpenWRT),
>     that may
>     > be a viable workaround in cause chasing down memory leaks
>     becomes too
>     > ornery.
>     >
>     >
>     >
>     > On Wed, Nov 6, 2013 at 8:54 AM, Dan Staples
>     > <danstaples at opentechinstitute.org
>     <mailto:danstaples at opentechinstitute.org>
>     > <mailto:danstaples at opentechinstitute.org
>     <mailto:danstaples at opentechinstitute.org>>> wrote:
>     >
>     >     +commotion-dev
>     >
>     >     If your nodes are crashing w/ 15-20 clients, while both
>     serval and
>     >     commotion-splash are disabled, that is very worrisome!
>     >
>     >     I propose to the Commotion dev team that we urgently need to
>     come up
>     >     with a way to simulate network load, so we can identify and
>     fix the
>     >     causes of these types of crashes. Does anyone have ideas or
>     experiences
>     >     with this? Perhaps we can take the technical discussion over
>     to the
>     >     commotion-dev list only.
>     >
>     >     And just an update for you Dan, earlier this week I found
>     and fixed a
>     >     significant memory leak in Serval...not sure how much that
>     will affect
>     >     the instability we've seen, but we'll soon know with some
>     testing. The
>     >     fix will make its way into the nightly builds probably by
>     the end of the
>     >     week.
>     >
>     >     As long as the rest of your network is DR1 or newer, the
>     nightly builds
>     >     should be compatible.
>     >
>     >     Dan
>     >
>     >     On 11/06/2013 04:07 AM, Dan Hastings wrote:
>     >     > I was just checking to see if their had been any progress
>     made on the
>     >     > nightly builds with fixing the memory overload causing the
>     nodes to
>     >     > crash. To try and prevent my node from crashing I disabled
>     serval and
>     >     > the splash page. However, whenever I have 15 to 20
>     students login to a
>     >     > local app at the start of class my node crashes instantly. I'm
>     >     wondering
>     >     > if upgrading to the latest nightly build might fix this
>     issue. Lastly,
>     >     > if I upgrade to the latest nightly build will it still
>     work with the
>     >     > other nodes that do not have the latest build or do I have
>     to or is it
>     >     > recommend that I upgrade all of the other nodes to latest
>     build as
>     >     > well?  Thanks for all the hard work.  Commotion is
>     otherwise working
>     >     > wonders over here in the horn.
>     >     >
>     >     > Dan
>     >     >
>     >     > _______________________________________________
>     >     > Commotion-discuss mailing list
>     >     > Commotion-discuss at lists.chambana.net
>     <mailto:Commotion-discuss at lists.chambana.net>
>     >     <mailto:Commotion-discuss at lists.chambana.net
>     <mailto:Commotion-discuss at lists.chambana.net>>
>     >     > https://lists.chambana.net/mailman/listinfo/commotion-discuss
>     >     >
>     >
>     >     --
>     >     Dan Staples
>     >
>     >     Open Technology Institute
>     >     https://commotionwireless.net
>     >     OpenPGP key: http://disman.tl/pgp.asc
>     >     Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9
>     >     _______________________________________________
>     >     Commotion-dev mailing list
>     >     Commotion-dev at lists.chambana.net
>     <mailto:Commotion-dev at lists.chambana.net>
>     >     <mailto:Commotion-dev at lists.chambana.net
>     <mailto:Commotion-dev at lists.chambana.net>>
>     >     https://lists.chambana.net/mailman/listinfo/commotion-dev
>     >
>     >
>     >
>     >
>     > --
>     > Ben West
>     > http://gowasabi.net
>     > ben at gowasabi.net <mailto:ben at gowasabi.net>
>     <mailto:ben at gowasabi.net <mailto:ben at gowasabi.net>>
>     > 314-246-9434 <tel:314-246-9434>
>
>     --
>     Dan Staples
>
>     Open Technology Institute
>     https://commotionwireless.net
>     OpenPGP key: http://disman.tl/pgp.asc
>     Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9
>     _______________________________________________
>     Commotion-dev mailing list
>     Commotion-dev at lists.chambana.net
>     <mailto:Commotion-dev at lists.chambana.net>
>     https://lists.chambana.net/mailman/listinfo/commotion-dev
>
>
>
>
> -- 
> Ben West
> me at benwest.name <mailto:me at benwest.name>
-- 
Dan Staples

Open Technology Institute
https://commotionwireless.net
OpenPGP key: http://disman.tl/pgp.asc
Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9


More information about the Commotion-dev mailing list