Skip navigation

Release: 2.4 Previous Releases
Publish Date: October, 2007

Article Rating?


Configuring a Terracotta Server Cluster

Introduction

For high availability, the Terracotta server can be clustered to run in ACTIVE-PASSIVE mode. In this mode one server runs in ACTIVE mode servicing requests from Terracotta clients and one or more servers run in PASSIVE mode acting as a hot standby for the ACTIVE server in case of a failure.

There are two ways to configure Terracotta server to run in ACTIVE-PASSIVE mode.

  • ACTIVE-PASSIVE using shared disk
  • ACTIVE-PASSIVE over network

ACTIVE-PASSIVE using shared disk

In this configuration, the Terracotta server uses a shared disk (SAN, NFS, SMB) between the ACTIVE and the PASSIVES to replicate state. Note that the Terracotta server needs to run in persistent mode in this configuration.

Prerequisites

  • A shared disk between the ACTIVE and PASSIVES with file locking support.

    Note if you are using NFS as the share, make sure to run rpc.statd and lockd to enable file locking on both the client AND server.

Diagram

The following diagram depicts a typical Terracotta deployment using Shared Disk.

Configuration - Disk Based

  • Two or more servers should be defined in the <servers> section of Terracotta config.
  • The <data> section of each server should all point to the same directory in the shared disk to work correctly
  • The <persistence> section should indicated the <mode> as permanent-store
  • For more information on config check out Configuration Guide and Reference documentation

Sample Configuration:

<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                         xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <servers>
    <server name="Server 1">
      <!-- THIS DIRECTORY IS SHARED BETWEEN Server 1 AND Server 2 -->
      <data>/opt/terracotta/server-data</data>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>
    <server name="Server 2">
      <!-- THIS DIRECTORY IS SHARED BETWEEN Server 1 AND Server 2 -->
      <data>/opt/terracotta/server-data</data>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>         
  </servers>
  ...
</tc:tc-config>

Working Details

When multiple Terracotta servers are running in ACTIVE-PASSIVE using shared disk mode, they all try to acquire a lock on the data directory. The one that succeeds becomes ACTIVE server and gains control of the cluster. The rest of the servers becomes PASSIVE-STANDBY and take over only when the ACTIVE server fails. When that happens and when one of the PASSIVE server becomes ACTIVE, the clients seamlessly connect to the new ACTIVE server and resume work.

Sample NFS Configuration:

NFSv3/TCP
Server: rw, sync
Clients: rw,nosuid,soft,rsize=32768,wsize=32768

A consistent view of the state of the cluster is maintained in the disk and hence the new ACTIVE server can resume work from where the old one left off.

Advantages of running in disk share mode

  • Having multiple PASSIVE servers for protecting against multiple failures do not add extra processing or load to the cluster
  • Split brain problem is avoided by having a central authority (here the file server) arbitrate control to the cluster

Disadvantages of running in disk share mode

  • A common disk share with working file locks is need to work in this mode
  • Terracotta servers need to run in persistence mode. The performance of the Terracotta server will be more directly affected by the shared disk performance than in the Network ACTIVE-PASSIVE mode.

Troubleshooting

  1. Terracotta servers fail to come up.
    Check to see if locking is enabled in your shared disk. Some services like NFS and SMB requires separate lock demon to be running to provide these services. Usually Terracotta logs will have clear messages about these errors.
  2. Multiple Terracotta Servers start up as ACTIVE servers.
    Check your config to make sure that all the data directories for all the servers point to the same logical directory in the shared disk.

ACTIVE-PASSIVE over network

In this configuration, Terracotta server replicates cluster state between the ACTIVE and PASSIVES over the network. Shared disk is NOT needed in this setup.

Prerequisites

  • For data intense cluster, it is recommended that the ACTIVE and the PASSIVE servers are connected to each other over a low latency, high bandwidth network.

Diagram

The following diagram depicts a typical Terracotta deployment using a Network.

Configuration - Network Based

  • Two or more servers should be defined in the <servers> section of Terracotta config.
  • <l2-group-port> is the port used by the Terracotta server to communicate with other Terracotta servers.
  • The <ha> section should indicate the mode as networked-active-passive
  • The <networked-active-passive> subsection has a configurable parameter called <election-time> which is defined in seconds. This is the time duration for an election which is run to elect an ACTIVE server and this parameter is a factor of the network latency and load on the servers. The default value is 5 seconds.
  • For more information on config check out Configuration Guide and Reference documentation

Sample Configuration:

<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                         xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <servers>
    <server name="Server 1">
      <data>/opt/terracotta/server1-data</data>
      <l2-group-port>9530</l2-group-port>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>
    <server name="Server 2">
      <data>/opt/terracotta/server2-data</data>
      <l2-group-port>9530</l2-group-port>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>
     <ha>
        <mode>networked-active-passive</mode>
           <networked-active-passive>
               <election-time>5</election-time>
           </networked-active-passive>
       </ha>  
  </servers>
  ...
</tc:tc-config>

Working Details

When multiple Terracotta servers are running in ACTIVE-PASSIVE mode over network, an election is run to elect an ACTIVE server. When an ACTIVE server is elected and agreed upon by all the other servers, it gains control of the cluster. The rest of the servers becomes PASSIVE-STANDBY and can take over only when the ACTIVE server fails.

When an ACTIVE server fails, one of the available servers in PASSIVE-STANDBY is chosen to be ACTIVE after an election and the clients seamlessly connect to the new ACTIVE server and resume work.

When a PASSIVE server is started while an ACTIVE server is present, the PASSIVE server first needs to sync up state from the ACTIVE server before becoming PASSIVE-STANDBY. While it is syncing state from ACTIVE, it is in PASSIVE-UNINITIALIZED state and cannot become ACTIVE server in case of a failure since the state is not fully synced up. While this is happening, the ACTIVE server has the extra load of sending the state over the PASSIVE that is syncing up. The time taken to sync up is a factor of the amount of data that needs to be synced up and the current load in the cluster.

It is generally recommended that the ACTIVE and PASSIVE servers are started together to avoid huge sync ups unless there is a failure. Also it is recommended that both the ACTIVE and the PASSIVE servers are run in a similarly configured machine to get better throughput.

In this mode, the terracotta servers can run either in persistent mode or non-persistent mode. If an ACTIVE server is running in persistent mode and goes down, and a PASSIVE server takes over, before bringing back the crashed server, the data directory has to be cleaned up since the cluster state might have changed since the crash. The new state will be synched up from the current ACTIVE when the server comes back up. This is true for a crashed PASSIVE server running in persistent mode as well. Failing to do so will result in the server not starting up. It will print a clear message on deducting this condition.

Advantages of running in network mode

  • No common disk share is need in this mode
  • Terracotta servers need not run in persistence mode.

Disadvantages of running in network mode

  • Having multiple PASSIVE servers for protecting against multiple failures do add extra processing overhead though after initial sync it is minimal
  • The cluster could end up with a Split brain problem when there is a network failure where the network topology is such that the network failure caused the cluster to be severed into two or more disconnected subnetworks.

Troubleshooting

  1. Multiple Terracotta Servers start up as ACTIVE servers.
    Adjust the election-time in the config to meet your network latency and load.
  2. When a crashed Terracotta server is restarted, it fails to come up.
    Clear the data directory for the crashed server if its running in persistent mode.

Deployment

The typical steps are listed in the [Deployment Guide]

Adaptavist Theme Builder Powered by Atlassian Confluence