[Commotion-discuss] [Commotion-dev] Memory Issues and Nightly Builds
Ben West
ben at gowasabi.net
Wed Nov 6 18:09:48 UTC 2013
It's relevant to point out that devices like Picostation and Nanostation
are normally intended for use as thin hotspots in high-usage environments,
i.e. DHCP and NAT routing not done on the device itself. So,
Commotion-OpenWRT issuing DHCP leases and performing NAT one or 2 local
LANs onboard does consume memory that otherwise would go to serving 802.11n
clients. This is an inherent limitation of the chosen architecture.
Besides that, I would assume at least these processes need to devote a
portion of available RAM to each client on the public AP in
Commotion-OpenWRT:
- /proc/net/nf_conntrack entries
- nodogsplash (although possibly only on initial portal page viewing)
- uhttpd (again, only on portal page viewing)
- the ath9k driver itself
On Wed, Nov 6, 2013 at 11:54 AM, Dan Staples <
danstaples at opentechinstitute.org> wrote:
> Since Dan Hastings has seen this happen with a lot of simultaneous
> clients and with high-memory components disabled, it sounds like that is
> likely the cause. Do you know exactly where that RAM is used for each
> connecting client?
>
> Dan, can you provide any more detailed info on exactly what was
> happening when you see the node crashing? How many simultaneous users,
> and what were they doing (viewing a webpage on the internet, or viewing
> the node's administrative web interface, etc)?
>
> On Wed 06 Nov 2013 12:40:06 PM EST, Ben West wrote:
> > Hi Dan,
> >
> > Thanks for offering more detail, especially that you see the nodes
> > spontaneously reboot rather than simple have services crash.
> >
> > I would again point out that the Picostations will have a finite limit
> > for simultaneous clients. 15 to 20 clients is quite a few, each
> > client requiring a portion of available of RAM. It may be a single
> > Picostation is not going to be able to sustain all of them.
> >
> >
> >
> > On Wed, Nov 6, 2013 at 10:58 AM, Dan Staples
> > <danstaples at opentechinstitute.org
> > <mailto:danstaples at opentechinstitute.org>> wrote:
> >
> > Regarding logging, I'm not sure that will work well since the
> > nodes are
> > spontaneously rebooting themselves (due to OOM conditions), not
> > the user
> > rebooting them. What we're going to try to do is attach a serial
> > console
> > (thanks Will!) and try to slam the router with simultaneous users and
> > traffic.
> >
> > Also, I don't think Dan is hosting local apps on the router itself
> > (correct me if I'm wrong), but just advertising them using the
> > Commotion
> > apps portal. And that's just takes a little space for the Avahi
> > service
> > file...so hopefully that's not a problem.
> >
> > We'll certainly report what we find with our stress testing.
> >
> > Dan
> >
> > On 11/06/2013 10:37 AM, Ben West wrote:
> > > I am also seeing sporadic memory consumption issues operating
> > mesh nodes
> > > running AA r38347 in WasabiNet on Nanostation Loco M2.
> > >
> > > That is, using the same ath9k wifi driver and same underlying
> > OS, but
> > > without the Commotion-specific tools like commotiond and servald.
> I
> > > will see nodes boot up with ~26Mbytes memory usage and then
> > gradually
> > > increase over the next few days until sporadic nodes start
> > crashing with
> > > page allocation failures (aka memory exhausted). This all is
> > happening
> > > despite having 3Mbytes of compressed swap space allocated.
> > When I am
> > > able to log into crashed nodes to inspect, I will occasionally
> > find the
> > > current memory usage to be /less/ than the average observed on
> > bootup,
> > > along with ~500Kbytes sitting in swap.
> > >
> > > This seems to suggest something is very sporadically allocating
> > itself a
> > > large chunk (multiple MBytes), but not residing in memory as
> > such, and
> > > causing other processes to crash in consequence. I do use the
> > > coovachilli captive portal in WasabiNet, which could be a
> > culprit and
> > > thus unrelated to Commotion, but there could also be an underlying
> > > memory leak in the kernel or wifi driver.
> > >
> > > What are thoughts for having crashed nodes try to collect a
> > debug report
> > > about themselves when a crash condition is detected (e.g. no
> > Internet
> > > access, "page allocation failure" detected in syslog), and then
> > write
> > > that report to flash somewhere before the node get rebooted by its
> > > frustrated user?
> > >
> > > Besides that, do note that nodes with only 32MBytes of RAM, like
> > UBNT
> > > Picostations, are going to have difficulties hosting local apps
> > for many
> > > users. If Dan Hasting would be able to use an alternate device
> with
> > > 64Mbytes+ RAM, e.g. a UBNT Rocket, Unifi, or even an indoor TP-Link
> > > router (all of which should be able to run Commotion-OpenWRT),
> > that may
> > > be a viable workaround in cause chasing down memory leaks
> > becomes too
> > > ornery.
> > >
> > >
> > >
> > > On Wed, Nov 6, 2013 at 8:54 AM, Dan Staples
> > > <danstaples at opentechinstitute.org
> > <mailto:danstaples at opentechinstitute.org>
> > > <mailto:danstaples at opentechinstitute.org
> > <mailto:danstaples at opentechinstitute.org>>> wrote:
> > >
> > > +commotion-dev
> > >
> > > If your nodes are crashing w/ 15-20 clients, while both
> > serval and
> > > commotion-splash are disabled, that is very worrisome!
> > >
> > > I propose to the Commotion dev team that we urgently need to
> > come up
> > > with a way to simulate network load, so we can identify and
> > fix the
> > > causes of these types of crashes. Does anyone have ideas or
> > experiences
> > > with this? Perhaps we can take the technical discussion over
> > to the
> > > commotion-dev list only.
> > >
> > > And just an update for you Dan, earlier this week I found
> > and fixed a
> > > significant memory leak in Serval...not sure how much that
> > will affect
> > > the instability we've seen, but we'll soon know with some
> > testing. The
> > > fix will make its way into the nightly builds probably by
> > the end of the
> > > week.
> > >
> > > As long as the rest of your network is DR1 or newer, the
> > nightly builds
> > > should be compatible.
> > >
> > > Dan
> > >
> > > On 11/06/2013 04:07 AM, Dan Hastings wrote:
> > > > I was just checking to see if their had been any progress
> > made on the
> > > > nightly builds with fixing the memory overload causing the
> > nodes to
> > > > crash. To try and prevent my node from crashing I disabled
> > serval and
> > > > the splash page. However, whenever I have 15 to 20
> > students login to a
> > > > local app at the start of class my node crashes instantly.
> I'm
> > > wondering
> > > > if upgrading to the latest nightly build might fix this
> > issue. Lastly,
> > > > if I upgrade to the latest nightly build will it still
> > work with the
> > > > other nodes that do not have the latest build or do I have
> > to or is it
> > > > recommend that I upgrade all of the other nodes to latest
> > build as
> > > > well? Thanks for all the hard work. Commotion is
> > otherwise working
> > > > wonders over here in the horn.
> > > >
> > > > Dan
> > > >
> > > > _______________________________________________
> > > > Commotion-discuss mailing list
> > > > Commotion-discuss at lists.chambana.net
> > <mailto:Commotion-discuss at lists.chambana.net>
> > > <mailto:Commotion-discuss at lists.chambana.net
> > <mailto:Commotion-discuss at lists.chambana.net>>
> > > >
> https://lists.chambana.net/mailman/listinfo/commotion-discuss
> > > >
> > >
> > > --
> > > Dan Staples
> > >
> > > Open Technology Institute
> > > https://commotionwireless.net
> > > OpenPGP key: http://disman.tl/pgp.asc
> > > Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9
> > > _______________________________________________
> > > Commotion-dev mailing list
> > > Commotion-dev at lists.chambana.net
> > <mailto:Commotion-dev at lists.chambana.net>
> > > <mailto:Commotion-dev at lists.chambana.net
> > <mailto:Commotion-dev at lists.chambana.net>>
> > > https://lists.chambana.net/mailman/listinfo/commotion-dev
> > >
> > >
> > >
> > >
> > > --
> > > Ben West
> > > http://gowasabi.net
> > > ben at gowasabi.net <mailto:ben at gowasabi.net>
> > <mailto:ben at gowasabi.net <mailto:ben at gowasabi.net>>
> > > 314-246-9434 <tel:314-246-9434>
> >
> > --
> > Dan Staples
> >
> > Open Technology Institute
> > https://commotionwireless.net
> > OpenPGP key: http://disman.tl/pgp.asc
> > Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9
> > _______________________________________________
> > Commotion-dev mailing list
> > Commotion-dev at lists.chambana.net
> > <mailto:Commotion-dev at lists.chambana.net>
> > https://lists.chambana.net/mailman/listinfo/commotion-dev
> >
> >
> >
> >
> > --
> > Ben West
> > me at benwest.name <mailto:me at benwest.name>
> --
> Dan Staples
>
> Open Technology Institute
> https://commotionwireless.net
> OpenPGP key: http://disman.tl/pgp.asc
> Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9
>
--
Ben West
http://gowasabi.net
ben at gowasabi.net
314-246-9434
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chambana.net/pipermail/commotion-discuss/attachments/20131106/f18dd5a7/attachment-0001.html>
More information about the Commotion-discuss
mailing list