Difference between revisions of "SUSE Manager/Osad and jabberd troubleshooting"

From MicroFocusInternationalWiki
Jump to: navigation, search
(Jabberd troubles)
Line 1: Line 1:
 
[[SUSE_Manager|SUSE Manager Main Page]]
 
[[SUSE_Manager|SUSE Manager Main Page]]
<h1>Jabberd troubleshooting</h1>
 
  
=== Configuring Osad ===
+
= Typical issues =
https://fedorahosted.org/spacewalk/wiki/OsadHowTo
+
=== Jabber and OSAD client connection issues ===
+
https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD
+
  
== Jabberd troubles ==
+
== Open file count exceeded ==
 +
 
 +
=== Symptoms ===
 +
 
 +
OSAD clients cannot contact the SUSE Manager Server, jabberd takes a lot of time responding to port 5222.
 +
 
 +
=== Cause ===
 +
 
 +
The number of maximum files that the jabber user can open is lower thant the number of connected clients. Since every clients needs one always-open TCP connection and each of this consume one file handler, jabberd starts queuing and refusing connections.
 +
 
 +
=== Cure ===
 +
 
 +
Add a line like the following to /etc/security/limits.conf
 +
 
 +
jabbersoftnofile<#clients + 100>
 +
jabberhardnofile<#clients + 1000>
 +
 
 +
You should substitute <#clients + 100> and <#clients + 1000> according to your setup, for example for 5000 clients:
 +
 
 +
jabbersoftnofile5100
 +
jabberhardnofile6000
 +
 
 +
'''Explanation''': the soft file limit is the limit of the maximum open files ''for a single process''. In SUSE Manager case the highest consuming process is c2s, which opens a connection per client. 100 additional files are added, here, to accommodate for any non-connection file that c2s needs to work correctly. The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes as well.
 +
 
 +
== jabberd database corruption ==
  
 
=== Symptoms ===
 
=== Symptoms ===
Line 47: Line 67:
 
     rcosa-dispatcher start
 
     rcosa-dispatcher start
  
== osad troubles ==
+
== Cloned hosts ==
 +
 
 +
=== Symptoms ===
 +
 
 +
Cloned clients get disconnected repeatedly in a hard-to-predict
 +
pattern.
 +
 
 +
=== Cause ===
  
 
OSAD requires all clients to have different credentials in
 
OSAD requires all clients to have different credentials in
Line 54: Line 81:
 
logs.
 
logs.
  
=== Symptoms ===
+
=== Cure ===
  
In case of a conflict, clients get disconnected repeatedly in a hard-to-predict
+
Update server and all client tools to the latest maintenance update and wait 30 minutes. Updated clients will detect and automatically heal from this issue.
pattern.
+
  
=== Cure ===
 
  
In case duplicates are found have the customer stop the OSAD process, delete
+
== Upstream guides ==
the file and start OSAD again. The daemon will recreate the file with new,
+
random contents that should really be unique:
+
  
    rcosad stop
+
=== Configuring Osad ===
    rm /etc/sysconfig/rhn/osad-auth.conf
+
https://fedorahosted.org/spacewalk/wiki/OsadHowTo
    rcosad start
+
=== Jabber and OSAD client connection issues ===
 +
https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD

Revision as of 07:49, 12 October 2015

SUSE Manager Main Page

Typical issues

Open file count exceeded

Symptoms

OSAD clients cannot contact the SUSE Manager Server, jabberd takes a lot of time responding to port 5222.

Cause

The number of maximum files that the jabber user can open is lower thant the number of connected clients. Since every clients needs one always-open TCP connection and each of this consume one file handler, jabberd starts queuing and refusing connections.

Cure

Add a line like the following to /etc/security/limits.conf

jabbersoftnofile<#clients + 100> jabberhardnofile<#clients + 1000>

You should substitute <#clients + 100> and <#clients + 1000> according to your setup, for example for 5000 clients:

jabbersoftnofile5100 jabberhardnofile6000

Explanation: the soft file limit is the limit of the maximum open files for a single process. In SUSE Manager case the highest consuming process is c2s, which opens a connection per client. 100 additional files are added, here, to accommodate for any non-connection file that c2s needs to work correctly. The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes as well.

jabberd database corruption

Symptoms

After a disk full error or a disk crash, the jabberd database might be corrupted and jabberd fails to start up during spacewalk-service start:

   Starting spacewalk services...
   Initializing jabberd processes...
       Starting router                                                                   done
       Starting sm startproc:  exit status of parent of /usr/bin/sm: 2                   failed
   Terminating jabberd processes...

/var/log/messages shows more details:

   jabberd/sm[31445]: starting up
   jabberd/sm[31445]: process id is 31445, written to /var/lib/jabberd/pid/sm.pid
   jabberd/sm[31445]: loading 'db' storage module
   jabberd/sm[31445]: db: corruption detected! close all jabberd processes and run db_recover
   jabberd/router[31437]: shutting down

Cure

Remove the jabberd database and restart. Jabberd will automatically re-create the database.

   spacewalk-service stop
   rm -Rf /var/lib/jabberd/db/*
   spacewalk-service start

An alternative is to try another db, but SUSE Manager do not deliver drivers for it:

   rcosa-dispatcher stop
   rcjabberd stop
   cd /var/lib/jabberd/db
   rm *
   cp /usr/share/doc/packages/jabberd/db-setup.sqlite .
   sqlite3 sqlite.db < db-setup.sqlite
   chown jabber:jabber *
   rcjabberd start
   rcosa-dispatcher start

Cloned hosts

Symptoms

Cloned clients get disconnected repeatedly in a hard-to-predict pattern.

Cause

OSAD requires all clients to have different credentials in /etc/sysconfig/rhn/osad-auth.conf, in fact as soon as two clients have the same file they will conflict and, unfortunately, this is very poorly reported by our logs.

Cure

Update server and all client tools to the latest maintenance update and wait 30 minutes. Updated clients will detect and automatically heal from this issue.


Upstream guides

Configuring Osad

https://fedorahosted.org/spacewalk/wiki/OsadHowTo

Jabber and OSAD client connection issues

https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD