SUSE Manager/Scalability-research

From MicroFocusInternationalWiki
Revision as of 05:55, 24 June 2016 by SilvioMoioli (Talk | contribs) (Scalability research summary)

Jump to: navigation, search

Scalability research summary

Scalability research on SUSE Manager is an ongoing process, governed by the roadmap defined in RFC 23 and curated mainly by Silvio Moioli.

This page summarises the studies and results so far. Fully detailed data sets and reproducer instructions are always up-to-date on the Scalability research page in the GitHub development wiki.

Discussion is, as always, very welcome on the SUSE Manager mailing list.


List of benchmarks

  • Salt onboarding smoke test - 20160309
    • Purpose: make sure onboarding works on 100 minions
    • Main results: 100 minions can be onboarded in ~16 minutes on a low-end server
  • Minion channel switching via API smoke test - 20160310
    • Purpose: make sure channel switching is fast enough on 100 minions
    • Main results: channel switching takes well under a second per minon on a low-end server
  • Minion package upgrade smoke test, 1000 minions - 20160323
    • Purpose: check that our minimal hardware requirements make sense
    • Main results:
      • one kernel patch took ~52 minutes for 1000 minions, on a low-end server
      • several bugs discovered and fixed, some default settings were changed
      • hardware recommendations and best practices updated in the official documentation
  • Smoke tests (onboarding and patching) repeated - 20160519
    • Purpose: make sure no functional regressions were introduced in newer versions
    • Main results: two bugs fixed, no shipstopper performance regression found
  • Oracle RAC - 20160615
    • Purpose: validate the hypothesis (made by observation of architecture) that RAC does not really help with SUSE Manager scalability
    • Results:
      • hypothesis validated: adding hardware to a single server is recommended instead of adding nodes to a RAC
      • RAC is more prone to deadlock issues and recovery time after deadlock is longer. Engineering recommends against it for this reason. RAC One Node not tested but should not have this problem (active-passive architecture)
      • no result so far suggests that we will not be able to match (or even exceed) Oracle performance with Postgres
      • in at least some cases Postgres implements deadlock avoidance better