Op ma, 17-10-2005 te 21:36 +0200, schreef Thomas Aeby: > On Mon, 2005-10-17 at 09:19 +0200, Rob Verduijn wrote: > > Because bigsister wants to report all these tests she starts working so > > hard that she consumes all the resources on the server grinding it down > > to a halt. > > Only a cold boot of the system can save her. > > Hmh, just to make sure I understand correctly: Is this the Big Sister > server crashing or is it so many alarms going off (each within its own > process ...), that the system crashes when the process table gets full > and/or is out of memory? This is a bit difficult to answer since the console stops responding, I will keep an eye on bs to see what memory and/or resources are used. > As far as I my experience goes, bbd/bsmon won't take a system down, > since they are just single-threaded processes. If they get hit by a > huge number of messages they will just process them as fast as they > can, dropping messages if they are too slow. I don't actually see > how they would hurt your system (unless you've got one of those systems > that stop working under heavy load - seen that with rather old Linux > kernels and/or megaraid RAID controller, for instance). It's installed on a machine with a 3ware sata raid controller. Running suse 9.2 > > > I need a sollution for this problem. > > Provided that the problem *is* related to a huge number of alarms > going off, what about suppressing them using the "check" argument? I've already configured check in bb_event_generator for everything that has a dependency, but if the main router stays up and the entire bloody adsl wan network goes down I'm screwed, I'm going to get tons of icmp ping failures. I tried delaying the alarms for 5 mins, which mean less total failures but not 0. (I'm not exactly thrilled about the service our wan provider gives us, but this is a €€€ decision from above, and I got 0 to say about it) > > If not, I cannot judge from here what actually happens. Is there any > chance you get more information? Out of memory (well, shouldn't bring > the machine down)? Out of processes (shouldn't stop open shell sessions > from accepting input)? I/O problems (probably a hardware problem, then)? > No hints in syslog? I'm affraid I have not found any hints in the syslog. I'm going to dig in the older ones, but this can take some time. One thing that might help : I've got 8 uxmon-asroot files, I did this to spread the ping tests over a bigger time intervall so that not all the 400 clients got pinged at the same time. (there were a lot more and I am now waiting to see if this helps to prevent crashes) I'm also going to build a new bigsister server to make sure it's not a hardware thingy(was a to do anyway). Regards Rob > > BTW: What system are we talking about? > > Best regards, > Tom > ---------------------------------------------------------------------------- > Thomas Aeby, Kirchweg 52, 1735 Giffers, Switzerland, Tel: (+41)264180040 > Internet: suppressed PGP public key available > ---------------------------------------------------------------------------- > > ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Bigsister-general mailing list suppressed https://lists.sourceforge.net/lists/listinfo/bigsister-general
Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.