Skip navigation

Release: 2.4 Previous Releases
Publish Date: October, 2007

Article Rating?


Deployment Guide

Introduction

This guide describes in detail a typical Terracotta Deployment across all major milestones of the software development lifecycle. The target audience for this document is the developer, architect or operator who has downloaded Terracotta DSO and who could benefit from some pointers and advice above and beyond the documentation around planning, implementation and deployment of a typical Terracotta Project across all of the typical phases of the SDLC (Software Development Life Cycle).

This includes:

  • Choosing a Deployment Architecture.
  • Getting functionally complete with Terracotta.
  • Troubleshooting.
  • Understanding basic performance and scalability tuning.
  • Production Deployment.

If you have not had the opportunity to read the quick start guide or step through the introductory tutorials, you will want to work with those materials in order to get grounded in the core concepts and workings of Terracotta DSO. You can access the tutorials online at Terracotta Online Tutorials.

Pre-Implementation Planning

This includes:

  • Identification of applications across the enterprise, which could benefit from clustering at the JVM-level, as a service of the run-time. Execute plans around project phasing: i.e. identify and prioritize a sub-set of applications (or application) ahead of others:
    • Where application-server availability and/or application scale-out needs are most urgent OR
    • Where a transparent approach with none to minimal application code modifications (e.g. 3rd party code, legacy application) and development simplicity (e.g. application with complex, hard-to-maintain code) is desirable OR
    • Where level of effort/testing is lower, if moving an application to production in the shortest time is a goal.
  • Specification of Timelines, Project Plans.
  • Planning around Performance, Scalability and Availability SLAs and infrastructure needs could be done either as part of pre-development or accounted for towards the end of development, based on organizational custom/bestpractice.

Development

Typical Terracotta implementations are lighter in terms of application development effort as against any competitive approach, given that the product features no new APIs and maintains the semantics of natural Java. Terracotta products such as Terracotta for Sessions (and Spring) are point solutions to specific problems and they tend to be closer to transparency and often require no development at all.

The typical steps include:

1) Terracotta Download

Setup:

  • Provision Development Environment (typically Developer sandbox)
  • Download the appropriate Terracotta binary distribution
Access to the Code

Source is available as a separate .tar file.

2) Terracotta Installation

Throughout this document, <tc-root> refers to the folder in which you chose to install Terracotta. All scripts referenced will have a .sh extension (.bat in the Windows distribution).

Typically:

  • Get the Terracotta Server up and running:
    • Install the Terracotta Distribution.
    • Start the Terracotta Server (run the start-tc-server.sh command in the <tc root>/bin folder). These steps are simplified with the usage of the Configurator GUI (tc-configurator.sh) in case of the Terracotta Sessions product.
  • Add clustering to the client application by simply installing Terracotta on the client application and modify/add four JAVA_OPTS:
    • Prepend Terracotta Boot Jar to the Boot Classpath
    • Specify tc.install-root - location of the Terracotta install.
    • Specify tc.config - location of the configuration file.
    • Specify location of log files.
  • Choosing the appropriate JDK:
    • Terracotta is a Java-based clustering solution and the Terracotta-server ships with a version of the JRE. In case you wish to run it with another supported JRE/JDK, you may do so simply by renaming the jre directory under <tcroot>and creating a soft-link to the JRE/JDK of your choice or specifying an environmental variable (TC_INSTALL_JAVA).
    • Separately, you may wish to continue running your client application on the version of the JRE/JDK you are already running on. However, the previous comment applies to the client application server nodes as well. Note that, for any deviation of the JRE/JDK (from what is shipped in the Terracotta distribution), you will need to create a new boot jar by running the <tc root>/bin/make-boot-jar.sh command.

3) Terracotta Configuration

This is an XML file that defines the cluster configuration. Please refer to the Configuration Guide and Reference or to the tc-config-reference.xml file located in the <tc root>/docs/tc-config-referece.xml folder, for an exhaustive listing of all configuration elements.

If the preference is not to edit XML directly, then choices include:

  • Usage of the Eclipse plug-in from Terracotta that allows a user to configure DSO roots, define classes to be instrumented, and start and stop both your application and the Terracotta server and/or
  • In case of the Sessions product, use the Configurator, which allows a GUI based specification of configuration elements.

The main sections of the configuration are enumerated here:

Main Elements Purpose TC-DSO (Comments) TC-Session (Comments) TC-Spring (Comments)
Roots Denote the object graphs that hold state that need to be clustered. 1 or multiple root-names and fully-qualified field-names representing the root are to be specified. n/a Implicitly understood as Hashmap that contains HttpSessions (explicit definition not needed) Simply specify Spring Beans that need to be clustered.
Instrumentation Every object that is "clusterable", must be instrumented. Any object defined as a DSO root, or referenced by a DSO root, must be included in the list of classes to be instrumented by DSO. Common practice - Instrument everything (broadest scope) and then progressively narrow the scope of instrumentation via enumeration of include/ exclude expressions. n/a n/a Limited Autodiscovery capabilities, w.r.t. what needs to get instrumented.
Locks Specify Type: Read/ Write/ Concurrent & Auto-locks (based on presence of synchronization keyword) or Named-lock (where cluster wide lock has a specific name). Granularity: Method-level or more granular. n/a Implicitly defined in product implementation. Implicitly defined in product implementation in several use cases.
Transients Identify and specify fields that belong to
the clustered object graph that should not survive network transport/ replication and/or have the Java-transient modifier. (To initialize these values on other nodes of the cluster, bean-shell scripting or some minor application refactoring may be needed.)
n/a n/a Referred to as a nondistributable field.
Other variables Specify Location of Log-Files, Debugging options, Persistence Mode, Specification of Primary/ Secondary Terracotta Host Names, Ports etc. n/a n/a n/a

4) Troubleshooting

While there have been several instances of success within a single round of enumeration of config-elements in the tc-config.xml, the typical deployment might need some iteration around the tc-config.xml. This is usually in response to exceptions, when running the application with the Terracotta-nature imparted to it.

Common exceptions include:

  • Non-Portable Exceptions (com.tc.exception.TCNonPortableObjectException). Common reasons and remedies are described here:
    Reason Remedy
    Object in question is truly non-portable i.e. cannot be clustered - examples include objects that hold a resource native to the OS/Hardware (File Handle, Sockets etc.). Mark these fields as "transient" in the Terracotta config. Note that marking a field transient may in cases, require specification of an accompanying on-load (Bean Shell) script and/or application refactoring, to ensure application expectations around rehydration of the transient field value are met.
    The class that implements the field has not been instrumented. Include it in the instrumentation section of the tc-config file.
    The class that implements the field has not been instrumented and requires to be in the boot-jar. Add the class to the <additional-boot-jarclasses> section of the config file and regenerate the boot-jar. (Please note that in some cases, Terracotta needs to add explicit support for a core java class - simply adding to the boot-jar would not work and you should escalate to Terracotta support. (There is a list of non-supported classes also publicly available on the Terracotta community content)).
  • Lock related exceptions (com.tc.object.tx.UnlockedSharedObjectException):
    Terracotta defines a "transaction" to coincide with the acquisition/release of a lock (e.g. begin/end of a synchronized block or acquisition/release of a named lock) - the Terracotta server co-ordinates cluster-wide access to the shared object-graph. Accessing the object without defining acceptable transaction (lock) semantics results in this exception. The typical remedy is to:
    • Examine the code for the presence of appropriately scoped synchronization semantics and/or tweak the auto-lock section of the configuration.
    • Alternatively, one could employ coarse-grained locks and introduce named locks (and map those to a method and thus not have to change any application source). These should be used carefully - since in the current implementation, these are cluster-wide locks.
  • The Terracotta Console is invaluable when it comes to troubleshooting your clustered Java application. The following image shows three instances of the JTable application (it ships as a sample with the DSO distribution), overlaid on top of the open console. The shared root (underlined) is expanded in the tree view in the right-hand pane. The single instance of the value in the cell for Room A at 9:00 AM is shown in the tree as well as in the three running instances of the application.

  • Successful clustering should result in:
    • Expected number of application-servers in the "Clients" node of the tree-widget on the left.
    • "Roots" node of the tree-widget on the left expanding to display object graph being clustered. If "Roots" portion of the tree is not expandable, then nothing is being clustered.

For a more exhaustive list of troubleshooting tips refer to our TroubleShooting Guide

5) Deployment Planning

Steps:

  • Choose a Deployment Architecture: based on the application's SLA requirements around availability, scalability and based on size of clustered data, budget/Total Cost of Ownership concerns. Possibilities are summarized in the table below (pictorial representation follows):
    Deployment Option Suitability
    I> Dedicated Terracotta Servers (2: 1 active/ 1 passive) that share disk via commonly used disksharing technologies (Figure 1) High availability - No Single Point of Failure (SPoF) in the architecture. Minimal memory/CPU impact on any application server node. Easier operational management - Clustered Data homed on Terracotta servers, which can be operationally managed to high QoS. (Most of the document assumes this recommended implementation.)
    II> Terracotta Servers (2: 1 active/ 1 passive) co-hosted with application server nodes that share disk via commonly used disksharing technologies (Figure 2) If provisioning/ maintaining dedicated Terracotta servers cannot be accomodated and/or Application server nodes have additional, unused horse-power. This deployment still delivers High availability - i.e. no SpoF at the expense higher resource consumption (CPU/memory) on the 2 application servers that house the Terracotta servers.
    III> Single Dedicated Terracotta server (Figure 3) If Terracotta Server as a Single Point of Failure is acceptable. i.e. SLA needs around availability of the cluster are not extremely stringent. Minimal memory/CPU impact on any of the application server nodes. Of course, configurations are possible to ensure "restartability".
    IV> Single Terracotta Server co-hosted with application server node (Figure 4) If deployment is to be software-based alone and no dedicated provisioning of a server and/or disk-sharing technology can be accomodated. Suitable for applications with less stringent cluster availability SLAs and no CapEx (capital expenditure) budget.
Figure 1 - Terracotta Servers on dedicated machines:

Figure 2 - Terracotta Servers co-located with Application Server:

Figure 3 - Terracotta Server on a single, dedicated host:

Figure 4 - Single Terracotta Server co-located with an Application Server node:

Please note that currently just a single active Terracotta server is supported (use a passive for failover - a single active server ensures no split-brain issues). Multiple active Terracotta servers will be supported in the future for applications that execute such significant amounts of clustered I/O that a single Terracotta server would not suffice.

Options:

  • Terracotta Operational Provisioning: As seen from the earlier section, choices range from no hardware/disk provisioning to provisioning one/all of Servers,
    Disk, Disk-sharing technologies and redundancy within the Networking tier:
    • Terracotta Servers (if applicable) - need to sized to meet the demands of the application. Server sizing would depend on:
      • The size of the clustered object graph
      • The number of Puts/Gets to the clustered object graph per peak-second and the extent of lock contention.
      • (Typically an Intel 2 CPU, 2G Heap (3G Ram) is on the high-end,although it is heavily dependent on the application needs.)
    • Shared Disk Technology and Disk Size (if applicable):
      • Disk sized based on amount of clustered data and frequency of DGC (Distributed Garbage Collector eliminates "cluster" garbage and space is later reclaimed on disk) runs. Examples: 100,000 sessions in a day times 50K per session would mean 5G disk (lets say 10G to cover for emergencies), assuming DGC runs every 24 hours.
      • Disk shared between the primary and secondary Terracotta server. Choices of Disk sharing technology that Terracotta has tested with include:
        • SAN
        • NFS
        • Linux GFS
        • Or any other commonly used technologies. Terracotta Server is a JVM process so it is abstracted from the choice of any particular disk sharing technology. The above list constitutes technologies that Terracotta has internally tested with.
    • High Availability Considerations and Infrastructure Hardening (if applicable): Typically network redundancy in terms of Switches and Network Interface cards and virtual-ip technologies such as Cisco VRRP to ensure highly available network connectivity between L1 and L2.

Quality Assurance

Provision a QA Environment

Considerations:

  • The environment preferably mirrors the chosen Deployment Architecture in most respects. Typical QA environments lack Stage and/or Production characteristics in terms of:
    • Featuring a scaled-down number of Application nodes (L1s) And/Or
    • Lower capability hardware and other network/ infrastructure components.
  • Such an environment may already be available in most shops - so then the only consideration would be additional servers as Terracotta servers. Since the Terracotta server is a Java (software) based clustering solution, presumably one could co-host the Terracotta server with other applications on existing hardware, if the goal of this environment is to simply execute functional testing and not stress testing.

Code Coverage

Testing:

  • While the Terracotta approach is minimally intrusive and may require no application code changes in most cases, comprehensive testing of code paths is still recommended to ensure no non-portable, lock-related exceptions and/or
    any other exceptions arise, when other objects and/or sub-graphs join the clustered graph, at run-time, in response to certain application functionalities getting tickled.
  • Future testing can be incremental based on the scope of the change in the tc-config.xml and any accompanying code-changes.

Stage Environment Deployment

Provisioning a Stage Environment

Considerations:

  • The environment preferably mirrors the chosen Deployment Architecture in most respects. Typical Stage environments lack Production characteristics in terms of:
    • Featuring a scaled-down number of Application nodes (L1s)
    • But hopefully they mirror the production environment in terms of capabilities of the hardware and/or other network/infrastructure components, so any results of stress-testing are representative of the production environment.
  • Such an environment may already be available in most shops - so then the only consideration would be additional servers as Terracotta servers and disk and/or any network hardening (if that is not already provisioned for).

Scalability Testing

Setting up scalability and performance testing, typically involves:

  1. Defining typical "Transactions" that involve mutations/ reads of the shared object graph.
  2. Measuring TPS (Transactions per second) against higher-than-expected average Production Loads and at expected peak loads.
  3. Ensuring that configuration element values mirror those that will be employed in production. Examples include ensuring that:
    • Persistence Mode = ON (if appropriate - in this mode, the Terracotta server flushes state to disk, so state survives the Terracotta Server JVM lifecycle - ensures cluster "restartability").
    • DGC (Distributed Garbage Collection) = ON (this reclaims memory and/or space on Disk when it runs at the frequency specified in the tcconfig.xml).
    • In case of Sessions, appropriate values of
      • Session-timeout and
      • Frequency of the reaper are set.
  4. Running tests and comparing it to baselines to determine the positive/negative impacts of any tuning efforts.
  5. If the results of testing are acceptable, then the application is ready to be deployed to production. If not, one may need iterations around the tc-config.xml and/or application refactoring and/or require tweaking operational facets of the deployment to guarantee scale and performance
    SLAs.

Scalability Considerations

Typical performance and scalability tuning involves tweaking the core elements of the Terracotta configuration and other aspects typical to any Java-based application:

  1. Re-examining Definition of Roots - How one defines DSO roots has a definite impact on performance:
    • Cluster only what is essential: Note that tc-config.xml allows for multiple root definitions. Reduction of amount of clustered data implies smaller replication payloads and smaller amounts of network and disk I/O. Use transient config elements (or re-factor application) to ensure that unnecessary object graphs are not being clustered. If you need access to a field of a shared object marked transient, be sure to test rehydration on other nodes.
  2. Maintain Locality of Reference (LoR) - Implies leaving the data and processing co-resident on the same node. Terracotta is location transparent and will work without LoR - however partitioning and/or sticky load-balancing helps avoid needless network-faults. Examples include: Session-clustering, where a load-balancer ensures sessions stay sticky to a given application server node and scenarios where compute needs to be parallelized e.g. POJO-based Master-Workers, a pattern open-sourced by Terracotta - You can download the pattern and sample implementation at: Parallel Web Spider – Example of the Master/Worker Pattern .
  3. Pay attention to Natural Java Semantics: As an example, consider Inner Classes: non-static inner classes have an implicit reference to their parent object. A shared instance of a non-static inner class will cause the parent object to become shared because of that implicit reference.
  4. Re-examining Locking Behavior - Re-check if lock types and granularity are not overtly aggressive - i.e.
    • Minimize scope/duration of lock to reduce possible contention. Specify the right-type of lock (read/ write/ concurrent) where needed.
    • Use appropriate data-structures that support higher levels of concurrency: example - leverage the new java.util.concurrent classes where supported in Terracotta DSO, such as ConcurrentHashMap, which locks at the bucket level.
    • Re-factor application (where appropriate):
      • To minimize contention over the same object lock (where possible).
      • In certain cases, introduce batching into the application - so changes get flushed to the Terracotta Server at regular Terracotta "transaction boundaries", in cases where a DSO transaction, otherwise, would be too large. Examples: Hydrating a cache that as a single "transaction" would exceed heap; A batched UUID generator: See Implementing an efficient Id generator with Spring framework
      • Use generally good locking practices. The author Brian Goetz has contributed dozens of articles to the IBM Developer Works web site. You can access this link to see all of these articles: Brian Goetz - Java: Theory and Practice Articles
  5. Re-examining Scope of Instrumentation:
    • Instrumentation adds a small amount of overhead - so only instrument those classes that need to be portable (by enumerating include/exclude expressions to limit what is being instrumented).
  6. Garbage collection tuning on App-servers and Terracotta Server i.e.
    • Turning size of Eden, Old and Survivor Spaces.
    • Choice of garbage collector type (i.e. UseConcMarkSweepGC and/or UseParallelGC).
    • General Recommendations: Set your initial and maximum heap sizes to the same value (Xms, Xmx). Garbage collection pauses increase in duration with increases in the size of the heap. This is simply because there is more heap to scan. So be careful when increasing size of heap.
    • More comprehensive coverage of tuning can be found at: Tuning Garbage Collection with the 5.0 Java Virtual Machine
  7. Cache Policy Tuning:
    • Terracotta is effectively network attached memory and allows for specification of policies to determine how much of an object graph resides on the application server node and how much faults to Terracotta server RAM and then onto Terracotta server Disk.
    • Obviously the configuration of the virtual memory features (specified as a Percentage of Heap) on the Application Server should be appropriate for the use-case and not result in needless paging from and faulting to
      the Terracotta server. E.g. a LRU cache might work well, where frequently accessed entries are local to the application server and less frequently accessed entries are paged in on-demand from the Terracotta Server. However the settings need to be specified so as to minimize needless network round-trips.
    • Similar settings on the Terracotta Server will prevent needless disk access.
  8. Miscellaneous:
    • Ensure usage of appropriate Disk Raid Levels to balance disk i/o performance with the need for availability.
    • Locate log-files and Terracotta Server data-files on different disks.
    • Turn off reflection support in config if you don't use reflection to mutate shared objects.

Tools for Tuning

Performance:

  1. The Terracotta Admin Console (a Swing GUI) and JConsole also provide visibility to the inherent JMX instrumentation available within the Terracotta implementation and hence are invaluable tools in terms of getting understanding the characteristics of the clustered application. The Admin
    Console additionally displays (cluster-wide and per-client)
    • fault/flush rates,
    • number of object puts/gets,
    • number of times an object was retrieved off disk of the Terracotta Server and other information invaluable to quantitative profiling and debugging.
  2. Thread Dumps provide valuable insight, should there arise any particular issue that needs troubleshooting.
  3. Increasing Terracotta Debug levels (refer to the tc-config-reference.xml) provides detailed information around Instrumentation, Locking, Data replication, Client-reconnects etc.
  4. Network monitoring tools help profile bandwidth consumption. Terracotta is expected to be low, given the fine-grained replication. Choose from several tools such as:
  5. Visual GC - Sun Visual GC - is an extremely useful tool for monitoring and profiling JVM memory usage.

Availability Testing and Considerations

Issues:

  1. As described in the section detailing Deployment Planning, ensure appropriate infrastructure hardening if SLAs around availability are high-end - and ensure that failure scenarios are monitored for, specifically around:
    • Failure of Primary/Secondary L2 processes and/or host.
    • Failure of network between Primary L2 and host. Expectation is that
    • Failure of shared disk.
    • Multiple failures.
  2. Ensure that <client-reconnect-window> element in the config file (Terracotta server waits for this duration to allow all clients to reconnect,under certain failure scenarios) is set to something reasonable for the
    application in question.

Production Deployment

Infrastructure:

  • Production Hardware Deployment: Following the provisioning discussion earlier in the Development stage, expectation is that if servers/disk/network-infrastructure upgrades are part of deployment,they have been sized, installed and deployed.
  • Once the system is deployed to production, typical Terracotta-specific monitoring involves:
    • Primary L2 Process/L2 Host.
    • Secondary L2 Process/L2 Host.
    • Network connectivity between a given Application Node and the L2.
    • Disk usage/availability on both Primary and Secondary L2 boxes.
    • Exceptions in Application Server and/or Terracotta Server logs.

Summary

Terracotta implies little intrusion into application development in order to impart its clustering capabilities for availability and/or scale-out purposes. Based on SLAs around availabilities and scale requirements and capital expenditure budgets, one can choose from a variety of deployment models that fit the bill. Community and Professional Support Options are available. You can also call (415) 738-4000 or email sales@terracottatech.com for more information.

Appendix



Adaptavist Theme Builder Powered by Atlassian Confluence