Difference between revisions of "SUSE Manager/Osad and jabberd troubleshooting"

From MicroFocusInternationalWiki
Jump to: navigation, search
(Cloned hosts section removed - it's obsoleted since many months now)
Line 66: Line 66:
 
     rcjabberd start
 
     rcjabberd start
 
     rcosa-dispatcher start
 
     rcosa-dispatcher start
 
== Cloned hosts ==
 
 
=== Symptoms ===
 
 
Cloned clients get disconnected repeatedly in a hard-to-predict
 
pattern.
 
 
=== Cause ===
 
 
OSAD requires all clients to have different credentials in
 
<tt>/etc/sysconfig/rhn/osad-auth.conf</tt>, in fact as soon as two clients have the same
 
file they will conflict and, unfortunately, this is very poorly reported by our
 
logs.
 
 
=== Cure ===
 
 
Update server and all client tools to the latest maintenance update and wait 30 minutes. Updated clients will detect and automatically heal from this issue.
 
 
  
 
== Upstream guides ==
 
== Upstream guides ==

Revision as of 05:05, 31 May 2016

SUSE Manager Main Page

Typical issues

Open file count exceeded

Symptoms

OSAD clients cannot contact the SUSE Manager Server, jabberd takes a lot of time responding to port 5222.

Cause

The number of maximum files that the jabber user can open is lower thant the number of connected clients. Since every clients needs one always-open TCP connection and each of this consume one file handler, jabberd starts queuing and refusing connections.

Cure

Add a line like the following to /etc/security/limits.conf

jabbersoftnofile<#clients + 100> jabberhardnofile<#clients + 1000>

You should substitute <#clients + 100> and <#clients + 1000> according to your setup, for example for 5000 clients:

jabbersoftnofile5100 jabberhardnofile6000

Explanation: the soft file limit is the limit of the maximum open files for a single process. In SUSE Manager case the highest consuming process is c2s, which opens a connection per client. 100 additional files are added, here, to accommodate for any non-connection file that c2s needs to work correctly. The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes as well.

jabberd database corruption

Symptoms

After a disk full error or a disk crash, the jabberd database might be corrupted and jabberd fails to start up during spacewalk-service start:

   Starting spacewalk services...
   Initializing jabberd processes...
       Starting router                                                                   done
       Starting sm startproc:  exit status of parent of /usr/bin/sm: 2                   failed
   Terminating jabberd processes...

/var/log/messages shows more details:

   jabberd/sm[31445]: starting up
   jabberd/sm[31445]: process id is 31445, written to /var/lib/jabberd/pid/sm.pid
   jabberd/sm[31445]: loading 'db' storage module
   jabberd/sm[31445]: db: corruption detected! close all jabberd processes and run db_recover
   jabberd/router[31437]: shutting down

Cure

Remove the jabberd database and restart. Jabberd will automatically re-create the database.

   spacewalk-service stop
   rm -Rf /var/lib/jabberd/db/*
   spacewalk-service start

An alternative is to try another db, but SUSE Manager do not deliver drivers for it:

   rcosa-dispatcher stop
   rcjabberd stop
   cd /var/lib/jabberd/db
   rm *
   cp /usr/share/doc/packages/jabberd/db-setup.sqlite .
   sqlite3 sqlite.db < db-setup.sqlite
   chown jabber:jabber *
   rcjabberd start
   rcosa-dispatcher start

Upstream guides

Configuring Osad

https://fedorahosted.org/spacewalk/wiki/OsadHowTo

Jabber and OSAD client connection issues

https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD