On Thursday, November 17, 2005 7:02 AM, suppressed wrote: <snip>
Well I just use the "top" command and look at the total number of processes. I know that if the number of processes is up at around 350 then MaxClients has been reached. Incidentally, I use mod_log_sql to send the Apache log to a mysql database, so for every Apache process a MySql process is also launched - hence the approx 350 processes in total.I wrote a script yesterday that runs via cron and what it does is it checks if a certain ic page returns a certain string. If the page does not return properly it tries again, up to 5 times. If after 5 times it doesn't get the page back it restarts Interchange as well as emails an address to alert that the site restarted. This way we see the problem before MaxClients is reached since I have figured out that MaxClients is not reached until some undetermined amount of time after the site stops processing requests.So have you found/confirmed that the client count does keeping ticking upwards once the site stops processing requests?No, I have not done this, how would I go about finding this out?
Alternatively just use "ps -elf | grep -c httpd" to see just the number of Apache processes. On my server this is exactly 150 once MaxClients is reached (i.e. my MaxClients setting). Actually, now I come to think of it, I think it does fluctuate between say 147 and 150, so perhaps the Apache processes are dieing off after all, it's just that Interchange is consistently feeding it requests (presumably because the original one failed) i.e. a never ending loop. Yes, actually, this makes a lot of sense doesn't it? To quote Kevin:
So could the situation be that once MaxClients has been reached, Interchange starts to get timeouts from Apache and so continually resends requests. Actually, the more I think about this the more I realise I could do to understand a little bit more about the interaction/communication protocol/communication sequence between Apache, mod_interchange and Interchange itself.Requests that are waiting in the queue will look as if they are hanging. Stopping and resubmitting the requests will probably just make matters worse.
Kevin (or someone else), would you mind providing a brief overview to the communication sequence and "handshaking" between Apache, mod_interchange and the Interchange daemon?
<snip>Fantastic, thanks Ron, just implemented it! It occurs to me that it may be useful to call a few system commands before restarting interchange, and to add the output from these commands to the alert e-mail sent.No problem, I needed something to keep the site up at night and when I was away from my computer, this should help out.e.g. #Number of connections to Apache just before Interchange is restarted netstat -nt | grep :80 | wc -l netstat -nt | grep :443 | wc -l #Number of httpd, interchange and mysql processes running just before Interchange is restarted ps -elf | grep -c httpd ps -elf | grep -c interchange ps -elf | grep -c mysqld #Number of connections each IP address has to server just before Interchange is restarted netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n Although I could just about work out how to insert these into your script it would involve educated guesswork and no doubt ugly coding, so would be grateful if you could insert these system calls or any other similar or additional commands that you think may provide useful information.You could modify the checkic.html page to also do db lookups to verify the db connection is still valid.If you think this would be useful would you mind also adding this as my Perl isn't up to it. Thanks for you help!I'll give it a try, I'm interested to see these values when the server is going down, although I think these values may be a bit off since the server will be down or on it's way down when this is ran, however it still may help.
######## Snippet
my $browser = LWP::UserAgent->new;
$browser->timeout(30);
my $count = 0;
my $up = 0;
while ($count <= 4) {
my $response = $browser->get($url);
if ($response->content =~ m/UP/) {
$count = 5;
$up = 1;
}
$count++;
}
##########
Ron, does this code mean that it takes your script 2.5 minutes to recognise
that the server is down (i.e. 5 x 30 seconds timeout?) If so, would it be
better if we reduced the timeout to about 5 seconds. Hopefully 5 tries with
a 5 second timeout shouldn't cause any false alarms? That way we will spot
the server going down quicker and the various system commands I have
suggested inserting may then give more useful information. Perhaps we can
get away with less than a 5 second timeout - what do you think?
BTW, I have thise running via cron at a 1 minute interval - are you doing the same? Thanks
___________________________________________________________ Yahoo! Model Search 2005 - Find the next catwalk superstars - http://uk.news.yahoo.com/hot/model-search/
_______________________________________________ interchange-users mailing list suppressed http://www.icdevgroup.org/mailman/listinfo/interchange-users
Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.