On Tue, 2005-10-18 at 17:34 +0200, Rob Verduijn wrote: > This is a bit difficult to answer since the console stops responding, I > will keep an eye on bs to see what memory and/or resources are used. If you could live without alarms for a day or two (you have said, this is happening quite often :-)), would it be an option to switch off alarming entirely for the ping tests in order to see if this is the problem? Actually, Big Sister *is* forking the alarm sending command (usually sendmail) regardless of how many alarms are going to go off. So, if we are talking about a real big number of alarms, this might really bring the system into troubles. And, I think, Big Sister will be quicker forking than sendmail is going to send mails. So, memo for Tom: put a limit on the number of alerts that are sent out within a given time (who will read 1000 alert messages, anyway?) > It's installed on a machine with a 3ware sata raid controller. > Running suse 9.2 Ok, that's kernel 2.6.8, I think. I'm pretty sure, the I/O locking problem has been solved long before. Maybe, you could give echo 1 > /proc/sys/vm/overcommit_memory (or "echo 2" for a more moderate approach) a try in order to prevent the kernel from committing more memory to processes that is available and possibly rather get Big Sister in trouble than the underlying system. > I've already configured check in bb_event_generator for everything that > has a dependency, but if the main router stays up I see. > (I'm not exactly thrilled about the service our wan provider gives us, > but this is a €€€ decision from above, and I got 0 to say about it) Well, we are all committed to saving money ... :-( After all, no matter how bad the monitored entities behave, the monitoring application should stand and monitor. > I'm affraid I have not found any hints in the syslog. Not even the (in)famous OOM killer ... that's bad news. > One thing that might help : > I've got 8 uxmon-asroot files, I did this to spread the ping tests over > a bigger time intervall so that not all the 400 clients got pinged at > the same time. I see, are they all mainly executing the built-in ping test? I'm asking because I'm searching for something that could consume resources ... > I'm also going to build a new bigsister server to make sure it's not a > hardware thingy(was a to do anyway). Ok, this will certainly take some time and does not necessarily fix the problem, I'm afraid. Best regards, Tom ---------------------------------------------------------------------------- Thomas Aeby, Kirchweg 52, 1735 Giffers, Switzerland, Tel: (+41)264180040 Internet: suppressed PGP public key available ---------------------------------------------------------------------------- ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Bigsister-general mailing list suppressed https://lists.sourceforge.net/lists/listinfo/bigsister-general
Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.