[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bigsister-general] bigsister performance


Op ma, 17-10-2005 te 21:36 +0200, schreef Thomas Aeby:
> On Mon, 2005-10-17 at 09:19 +0200, Rob Verduijn wrote:
> > Because bigsister wants to report all these tests she starts working so
> > hard that she consumes all the resources on the server grinding it down
> > to a halt.
> > Only a cold boot of the system can save her.
> 
> Hmh, just to make sure I understand correctly: Is this the Big Sister 
> server crashing or is it so many alarms going off (each within its own
> process ...), that the system crashes when the process table gets full
> and/or is out of memory?

This is a bit difficult to answer since the console stops responding, I
will keep an eye on bs to see what memory and/or resources are used.

> As far as I my experience goes, bbd/bsmon won't take a system down,
> since they are just single-threaded processes. If they get hit by a
> huge number of messages they will just process them as fast as they
> can, dropping messages if they are too slow. I don't actually see
> how they would hurt your system (unless you've got one of those systems
> that stop working under heavy load - seen that with rather old Linux
> kernels and/or megaraid RAID controller, for instance).

It's installed on a machine with a 3ware sata raid controller.
Running suse 9.2

> 
> > I need a sollution for this problem.
> 
> Provided that the problem *is* related to a huge number of alarms
> going off, what about suppressing them using the "check" argument?

I've already configured check in bb_event_generator for everything that
has a dependency, but if the main router stays up and the entire bloody
adsl wan network goes down I'm screwed, I'm going to get tons of icmp
ping failures.
I tried delaying the alarms for 5 mins, which mean less total failures
but not 0.
(I'm not exactly thrilled about the service our wan provider gives us,
but this is a €€€ decision from above, and I got 0 to say about it) 

> 
> If not, I cannot judge from here what actually happens. Is there any
> chance you get more information? Out of memory (well, shouldn't bring
> the machine down)? Out of processes (shouldn't stop open shell sessions
> from accepting input)? I/O problems (probably a hardware problem, then)?
> No hints in syslog?

I'm affraid I have not found any hints in the syslog.
I'm going to dig in the older ones, but this can take some time.

One thing that might help :
I've got 8 uxmon-asroot files, I did this to spread the ping tests over
a bigger time intervall so that not all the 400 clients got pinged at
the same time.
(there were a lot more and I am now waiting to see if this helps to
prevent crashes)

I'm also going to build a new bigsister server to make sure it's not a
hardware thingy(was a to do anyway).

Regards
Rob

> 
> BTW: What system are we talking about?
> 
> Best regards,
> Tom
> ----------------------------------------------------------------------------
> Thomas Aeby, Kirchweg 52, 1735 Giffers, Switzerland, Tel: (+41)264180040
> Internet: suppressed                       PGP public key available
> ----------------------------------------------------------------------------
> 
> 



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Bigsister-general mailing list
suppressed
https://lists.sourceforge.net/lists/listinfo/bigsister-general


Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.