SUSE Manager/Osad and jabberd troubleshooting
Open file count exceeded
OSAD clients cannot contact the SUSE Manager Server, jabberd takes a lot of time responding to port 5222.
The number of maximum files that the jabber user can open is lower thant the number of connected clients. Since every clients needs one always-open TCP connection and each of this consume one file handler, jabberd starts queuing and refusing connections.
Add a line like the following to /etc/security/limits.conf
jabbersoftnofile<#clients + 100> jabberhardnofile<#clients + 1000>
You should substitute <#clients + 100> and <#clients + 1000> according to your setup, for example for 5000 clients:
Explanation: the soft file limit is the limit of the maximum open files for a single process. In SUSE Manager case the highest consuming process is c2s, which opens a connection per client. 100 additional files are added, here, to accommodate for any non-connection file that c2s needs to work correctly. The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes as well.
jabberd database corruption
After a disk full error or a disk crash, the jabberd database might be corrupted and jabberd fails to start up during spacewalk-service start:
Starting spacewalk services... Initializing jabberd processes... Starting router done Starting sm startproc: exit status of parent of /usr/bin/sm: 2 failed Terminating jabberd processes...
/var/log/messages shows more details:
jabberd/sm: starting up jabberd/sm: process id is 31445, written to /var/lib/jabberd/pid/sm.pid jabberd/sm: loading 'db' storage module jabberd/sm: db: corruption detected! close all jabberd processes and run db_recover jabberd/router: shutting down
Remove the jabberd database and restart. Jabberd will automatically re-create the database.
spacewalk-service stop rm -Rf /var/lib/jabberd/db/* spacewalk-service start
An alternative is to try another db, but SUSE Manager do not deliver drivers for it:
rcosa-dispatcher stop rcjabberd stop cd /var/lib/jabberd/db rm * cp /usr/share/doc/packages/jabberd/db-setup.sqlite . sqlite3 sqlite.db < db-setup.sqlite chown jabber:jabber * rcjabberd start rcosa-dispatcher start
Cloned clients get disconnected repeatedly in a hard-to-predict pattern.
OSAD requires all clients to have different credentials in /etc/sysconfig/rhn/osad-auth.conf, in fact as soon as two clients have the same file they will conflict and, unfortunately, this is very poorly reported by our logs.
Update server and all client tools to the latest maintenance update and wait 30 minutes. Updated clients will detect and automatically heal from this issue.