[IMC-Tech] Re: why does the site stall all the time?

Zachary C. Miller zach at chambana.net
Fri Nov 4 10:33:31 CST 2005


Danielle Chynoweth wrote:
> with folks coming to the barnraising, we are hearing lots that they can't 
> access the site (not just last night, but often).  i have seen this prob 
> for months.  what's the reason?  is there someone we can hire to fix?  is 
> it little or big?

I'm Cc'ing this answer to Danielle's question to imc-tech and
chambana-future for everyone's general knowledge.

There is a bug somewhere in dada (we're 90% sure it is in dada, but
the nature of the bug is that we can't tell what software it is in,
could be in anything hosted on any of our over 80 websites) that has
been eating our server alive for about a year now. I haven't had the
time to figure it out. Arun has spent some time looking into it and
hasn't been able to figure it out.

The problem is that certain web hits cause the apache process serving
them to go into an infinite loop, consuming CPU time but not ever
returning to the browser. The hits never get logged to the server log
because they never run to completion (so it is impossible for us to
figure out what software is causing the problem). Apache operates as a
collection of parallel server processes, these hits tie up one process
and the browser accessing it seems to hang, hitting reload generally
accesses another different server process and goes to completion. The
"stuck" processes build up like plaque gradually slowing the entire
server down as they consume all available CPU time. They do not
consume extra memory, it's just some kind of busy-wait infinite
loop. If a user goes crazy hitting reload over and over and over and
happens to be unlucky enough to keep causing stuck processes then the
whole server can crash. We have a cron job that runs every 5 hours and
restarts the apache server which gets rid of all the plaque of stuck
jobs and restores normal server functionality for a while (but for the
15 seconds that the apache server is restarting, all hits to the
webserver are rejected).

It is possible that a dada upgrade would fix things. We have a TON of
untracked local patches made to our software (not by me!). It would
probably be fruitful for someone to talk to the dada developer about
this and find out if it is a known problem with old versions. I have
never been involved in dada installation or maintenance so I am not
the person to do this. I know nothing about what local patches we have
applied. I know nothing about the current state of Dada. The reason we
think it is Dada is that all the other websites run either well
maintained packaged software that gets timely updates or they run
non-embedded CGI scripts which could not break the webserver like
this. Or so I believe. I could be wrong. There could be some other
rogue or custom software on the site that is causing this
problem. Also it seems, anecdotally, that most/all of the people
reporting problems with hung page loads report them about the ucimc
site and not about other sites hosted on the server (they report the
resulting slowness for other sites, but not the infinite hangs).

It is possible that an upgrade to Apache2 would fix things. This is a
crapshot. But the thing is that it is definitely a bug in PHP or
Apache that a bug in some PHP code somewhere could break the server
this badly. This problem has persisted across many different versions
of apache and the linux kernel. The bug is such that even if I set the
server timeout for requests very low these processes don't die. The
server should be able to protect itself from rogue poorly written PHP
code, but this bug seems to sidestep those protections. My hope is
that even if the bug is in Dada, an upgrade to the latest Apache2
might give us server code robust enough to keep that bug from taking
down the whole server.

The transition to Apache2 needs to be done with some guidance from
me. If someone with good Linux apache sysadmin experience is hired to
deal with it I can give them some instructions and oversight and they
could handle it. This task is probably a 6-8 hour one.

The upgrade of Dada needs to be done with some guidance from either
Dan or Clint or Arun. (And I have no idea which of them would be
available for such things). It's going to be a HUGE headache, but it
really has to be done eventually. The UCIMC can't afford to have it's
software unmaintained. Someone has to step up to maintain it on a
consistent basis and to actually properly track our local patches. We
need to make sure that people stop going in and hacking the code
without carefully tracking the changes. I imagine that David Gehrig is
the right person to take on this job if he chooses to accept it. This
task could well be a 20-40 hour one.


> 
> thanks for all the equipment and support you offer the IMC.
> 
> - d
> 
> p.s. i saw your e-mail about your moving :( <-- frowny face and groogroo 
> and it has not gone ignored.  i doubt you will get much response til after 
> the barnraising, but that's not cause folks don't care.  FYI.
> 
> -- 
> | Danielle Chynoweth
> ----------------------------------------------------------------------
> | Come to our radio station Barnraising and community wireless       |
> | congregation!  November 11 to 13, Champaign-Urbana, Illinois.      |
> | Urbana Independent Media Center * Prometheus Radio * WRFU 104.5 FM |
> | Check http://www.prometheusradio.org/urbana/ for more details!     |
> ----------------------------------------------------------------------
> 

-- 
Zachary C. Miller - @= - http://zach.chambana.net/
IMSA 1995 - UIUC 2000 - Just Another Leftist Muppet - Ya Basta!
 Social Justice, Community, Nonviolence, Decentralization, Feminism,
 Sustainability, Responsibility, Diversity, Democracy, Ecology


More information about the IMC-Tech mailing list