[CUWiN-Dev] memory-hungry/starved nodes
tom
tom at anotherwastedday.com
Tue Mar 14 07:02:14 CST 2006
Daniel and I went by Mike's memory-hungry node yesterday and, at Dave's
advice, poked around to see what the problem was.
The most troubling thing we found was, of all things, two rather large
instances of ntpd running. Consider the following:
Upon node boot:
# du -ks /mfs
2400 /mfs
# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/wd0a 30M 25M 3.4M 87% /
tmpfs 192K 192K 0B 100% /dev
tmpfs 2.2M 2.2M 0B 100% /mfs
/etc 30M 25M 3.4M 87% /permanent/etc
/home 30M 25M 3.4M 87% /permanent/home
/tmp 30M 25M 3.4M 87% /permanent/tmp
/var 30M 25M 3.4M 87% /permanent/var
/mfs/etc 2.2M 2.2M 0B 100% /etc
/mfs/home 2.2M 2.2M 0B 100% /home
/mfs/tmp 2.2M 2.2M 0B 100% /tmp
/mfs/var 2.2M 2.2M 0B 100% /var
# top
load averages: 0.66, 0.47, 0.21 up 0 days, 0:03
02:42:18
29 processes: 1 runnable, 27 sleeping, 1 on processor
CPU states: 5.9% user, 0.0% nice, 18.2% system, 0.0% interrupt, 75.9%
idle
Memory: 13M Act, 1984K Inact, 3816K Wired, 4300K Exec, 1400K File, 4732K
Free
Swap:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
1749 root 18 0 1092K 3428K pause 0:00 0.00% 0.00% ntpd
1724 root 18 0 1092K 1168K pause 0:00 0.00% 0.00% ntpd
2612 root 2 0 616K 848K select 0:00 0.00% 0.00% dhclient
1817 nobody 2 0 508K 1148K kqread 0:00 0.00% 0.00% thttpd
1677 root 2 0 480K 1052K select 0:00 0.00% 0.00% zebra
1128 root 2 0 316K 2404K netio 0:00 0.00% 0.00% sshd
2923 twiltziu 2 0 316K 2048K select 0:00 0.00% 0.00% sshd
1848 root 2 0 248K 1716K select 0:05 0.00% 0.00% sshd
1929 root 10 0 248K 900K wait 0:01 0.10% 0.10% sh
1870 root 2 0 204K 900K fifor 0:00 0.00% 0.00% sh
1690 root 10 0 172K 784K wait 0:01 0.05% 0.05% sh
6333 root 28 0 168K 1020K CPU 0:00 1.54% 0.34% top
2930 twiltziu 10 0 152K 760K wait 0:00 0.00% 0.00% sh
2928 root 10 0 144K 732K wait 0:00 0.00% 0.00% sh
1703 root 2 0 136K 1256K kqread 0:03 1.03% 1.03% hslsd
1750 root 2 0 112K 836K kqread 0:00 0.00% 0.00% syslogd
1911 root 10 0 108K 816K nanoslee 0:00 0.00% 0.00% cron
# pkill ntpd
# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/wd0a 30M 25M 3.4M 87% /
tmpfs 400K 192K 208K 48% /dev
tmpfs 11M 2.2M 8.7M 20% /mfs
/etc 30M 25M 3.4M 87% /permanent/etc
/home 30M 25M 3.4M 87% /permanent/home
/tmp 30M 25M 3.4M 87% /permanent/tmp
/var 30M 25M 3.4M 87% /permanent/var
/mfs/etc 11M 2.2M 8.7M 20% /etc
/mfs/home 11M 2.2M 8.7M 20% /home
/mfs/tmp 11M 2.2M 8.7M 20% /tmp
/mfs/var 11M 2.2M 8.7M 20% /var
# top
load averages: 1.42, 0.71, 0.31 up 0 days, 0:04
02:43:13
27 processes: 26 sleeping, 1 on processor
CPU states: 8.9% user, 0.0% nice, 18.8% system, 0.0% interrupt, 72.3%
idle
Memory: 10M Act, 1984K Inact, 484K Wired, 3900K Exec, 1800K File, 11M Free
Swap:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
2612 root 2 0 616K 848K select 0:00 0.00% 0.00% dhclient
1817 nobody 2 0 508K 1148K kqread 0:00 0.00% 0.00% thttpd
1677 root 2 0 480K 1052K select 0:00 0.00% 0.00% zebra
1128 root 2 0 316K 2404K netio 0:00 0.00% 0.00% sshd
2923 twiltziu 2 0 316K 2048K select 0:00 0.15% 0.15% sshd
1848 root 2 0 248K 1716K select 0:05 0.00% 0.00% sshd
1929 root 10 0 248K 900K wait 0:01 0.10% 0.10% sh
1870 root 2 0 204K 900K fifor 0:00 0.00% 0.00% sh
1690 root 10 0 172K 784K wait 0:01 0.05% 0.05% sh
7227 root 28 0 168K 1020K CPU 0:00 0.88% 0.20% top
2930 twiltziu 10 0 152K 760K wait 0:00 0.00% 0.00% sh
2928 root 10 0 144K 732K wait 0:00 0.00% 0.00% sh
1703 root 2 0 136K 1256K kqread 0:04 1.32% 1.32% hslsd
1750 root 2 0 112K 836K kqread 0:00 0.00% 0.00% syslogd
1911 root 10 0 108K 816K nanoslee 0:00 0.00% 0.00% cron
1905 root 2 0 72K 884K kqread 0:00 0.00% 0.00% inetd
1 root 10 0 68K 740K wait 0:00 0.00% 0.00% init
Killing other programs gave up some more memory as well, but none were as
large as ntpd.
Interestingly, about 5 minutes after we kill ntpd the system locks. We're
not sure if this has anything to do with that process or not, since it was
alwasy about 5-10 minutes after we turned the node on.
We don't know whether this might be related to the node not having any free
space to finish bootup properly:
# tail /var/log/messages
Aug 23 02:40:30 cuw cuw_config: Creating pipe /var/run/cuwconf_pipe
Aug 23 02:40:40 cuw /sbin/dhclient-script: reason PREINIT
Aug 23 02:40:44 cuw syslogd: /var/log/daemon: No space left on device
Aug 23 02:40:44 cuw syslogd: /var/log/daemon: No space left on device
Aug 23 02:40:45 cuw /sbin/dhclient-script: reason BOUND
Aug 23 02:40:45 cuw /sbin/dhclient-script: Routers: 192.168.1.1
Aug 23 02:41:04 cuw su: twiltziu to root on /dev/ttyp0
Aug 23 02:42:50 cuw hslsd: send LSU on interface
fdb4:542d:dc11:b792:202:6fff:fe01:b792 failed
Aug 23 02:42:50 cuw hslsd: send LSU on interface
fdb4:542d:dc11:1461:200:24ff:fec1:1461 failed
Aug 23 02:42:50 cuw hslsd: send LSU on interface
fdb4:542d:dc11:1460:200:24ff:fec1:1460 failed
Daniel and I thought perhaps we could do something like edit rc.d to not
cause ntpd to run on startup, which might give us enough breathing room to
be able to "downgrade" the node to an older version until a solution is
found for nodes with less memory. I realize this isn't terribly useful
development-wise, but it would help the network in that area (since Mike's
house is a gateway).
Tom
More information about the CU-Wireless-Dev
mailing list