Call us: +1-415-738-4000

Technical FAQ

The Technical FAQ answers questions on how to use Terracotta products, integration with other products, and solving issues. If your question doesn't appear here, consider posting it on the Terracotta forums. Other resources for resolving issues include:

  • Release Notes – Lists features and issues for specific versions of Terracotta products.
  • Compatibility Information – Includes tables on compatible versions of Terracotta products, JVMs, and application servers.
  • General FAQ – For non-technical questions.

The FAQ is divided into the following sections:

CONFIGURATION

How do I enable persistent mode?

In the servers section of your config.xml, add the following lines:

<server host="host1" name="server1">
...
  <dso>
  ...
    <persistence>
      <mode>permanent-store</mode>
    </persistence>
  ...
  </dso>
...
</server>

Note that the temporary-swap mode performs better than permanent-store, and should be used where persistance of shared data is not required across restarts.

How do I configure failover to work properly with two Terracotta servers?

Configure both servers in the <servers> section of the Terracotta configuration file. Start the two Terracotta server instances that use that configuration file, one server assumes control of the cluster (the ACTIVE) and the second becomes the backup (the PASSIVE). See the high-availability chapter in the product documentation for more information.

DEVELOPMENT

How do I know that my application has started up with a Terracotta client and are sharing data?

The Terracotta Developer Console displays cluster topology by listing Terracotta server groups and connected client nodes in a navigation tree. If you're using Ehcache, Ehcache with Hibernate, Quartz Scheduler, or Terracotta Web Sessions, special panels in the Developer Console become active to give greater visibility into application data. These panels display in-memory values, provide live statistics, and offer a number of controls for configuring and manipulating in-memory data.

In addition, check standard output for messages that the Terracotta client has started up without errors. Terracotta clients also log messages to a file specified in the <clients> section of the Terracotta configuration file.

Is there a maximum number of objects that can be held by one Terracotta server instance?

The number of objects that can be held by a Terracotta server instance is two billion, a limit imposed by the design of Java collections. It is unlikely that a Terracotta cluster will need to approach even 50 percent of that maximum. However, if it does, other issues may arise that require the rearchitecting of how application data is handled in the cluster.

Why is it a bad idea to change shared data in a shutdown hook?

If a node attempts to change shared data while exiting, and the shutdown thread blocks, the node may hang and be dropped from the cluster, failing to exit as planned. The thread may block for any number of reasons, such as the failure to obtain a lock. A better alternative is to use the cluster events API to have a second node (one that is not exiting) execute certain code when it detects that the first node is exiting. If you are using Ehcache, use the cluster-events Ehcache API. In general, you can use the Terracotta Toolkit API to set up cluster-events listeners.

If you're using DSO and the Terracotta Toolkit, you can call org.terracotta.api.Terracotta.registerBeforeShutdownHook(Runnable beforeShutDownHook) to perform various cleanup tasks before the Terracotta client disconnects and shuts down.

Note that a Terracotta client is not required to release locks before shutting down. The Terracotta server will reclaim those locks, although any outstanding transactions are not committed.

What's the best way for my application to listen to Terracotta cluster events such as lost application nodes?

If you are using Ehcache, use the cluster-events Ehcache API. In general, you can use the Terracotta Toolkit API to set up cluster-events listeners.

How can my application check that the Terracotta process is alive at runtime?

Your application can check to see if the system property tc.active is true. For example, the following line of code would return true if Terracotta is active at the time it is run:

Boolean.getBoolean("tc.active");

What are some ways to externally monitor a cluster?

See this question.

ENVIRONMENT

Where is there information on platform compatibility for my version of Terracotta software?

Information on the latest releases of Terracotta products, including a link to the latest platform support, is found on the Product Information. This page also contains a table with links to information on previous releases.

Can I run the Terracotta process as a Microsoft Windows service?

Yes. See Starting up TSA or CLC as Windows Service using the Service Wrapper.

How do I use my JRE instead of the one shipped with Terracotta?

Set the TC_JAVA_HOME environment variable to point to a supported JDK or JRE. The target should be the top level installation directory of the JDK, which is the directory containing the bin directory.

Do you have any advice for running Terracotta software on Ubuntu?

The known issues when trying to run Terracotta software on Ubuntu are:

  • Default shell is dash bash. Terracotta scripts don't behave under dash. You might solve this issue by setting your default shell to bash or changing /bin/sh in our scripts to /bin/bash.

  • The Ubuntu default JDK is from GNU. Terracotta software compatibility information is on the Product Information page.

  • See the UnknownHostException topic.

Which Garbage Collector should I use with the Terracotta Server (L2) process?

The Terracotta Server performs best with the default garbage collector. This is pre-configured in the startup scripts. If you believe that Java GC is causing performance degradation in the Terracotta Server, BigMemory is recommended as the simplest and best way to reduce latencies by reducing collection times.

Generally, the use of the Concurrent Mark Sweep collector (CMS) is discouraged as it is known to cause heap fragmentation for certain application-data usage patterns. Expert developers considering use of CMS should consult the Oracle tuning and best-practice documentation.

INTEROPERABILITY

Can I substitute Terracotta for JMS? How do you do messaging in Terracotta clusters?

Using Terracotta with a simple data structure (such as java.util.concurrent.LinkedBlockingQueue), you can easily create message queues that can replace JMS. Your particular use case should dictate whether to replace JMS or continue using it alongside Terracotta. See the Terracotta Toolkit API for more information on using a clustered queue.

Does Terracotta clustering work with Hibernate?

Through Ehcache, you can enable and cluster Hibernate second-level cache.

What other technologies does Terracotta software work with?

Terracotta software integrates with most popular Java technologies being used today. For a full list, contact us at info@terracottatech.com.

OPERATIONS

How do I confirm that my Terracotta servers are up and running correctly?

Here are some ways to confirm that your Terracotta servers are running:

  • Connect to the servers using the Terracotta Developer Console.
  • Check the standard output messages to see that each server started without errors.
  • Check each server's logs to see that each server started without errors. The location of server logs is specified in tc-config.xml.
  • Use the Terracotta script server-stat.sh or server-stat.bat to generate a short status report on one or more Terracotta servers.
  • Use a tool such as wget to access the /config or /version servlet. For example, for a server running on localhost and using DSO port 9510, use the following wget command to connect to the version servlet:
[PROMPT] wget http://localhost:9510/version

How can I control the logging level for Terracotta servers and clients?

Create a file called .tc.custom.log4j.properties and edit it as a standard log4j.properties file to configure logging, including level, for the Terracotta node that loads it. This file is searched for in the path specified by the environment variable TC_INSTALL_DIR (if defined), user.home, and user.dir.

Are there ways I can monitor the cluster that don't involve using the Terracotta Developer Console or Operations Center?

You can monitor the cluster using JMX. A good starting point is the Terracotta JMX guide. This document does not have a complete list of MBeans, but you can use a tool such as JConsole to view the MBeans needed for monitoring.

Statistics are available over JMX via the object name "org.terracotta:type=Terracotta Server,subsystem=Statistics,name=Terracotta Statistics Gatherer". The Terracotta Cluster Statistics Recorder has both command-line and RESTful interfaces. However, statistics recording can substantially degrade performance due its high resource cost.

Certain cluster parameters, such as heap size and cached-object count, are available via "org.terracotta:type=Terracotta Server,name=DSO".

Cluster events are available over JMX via the object name "org.terracotta:type=TC Operator Events,name=Terracotta Operator Events Bean".

How many Terracotta clients (L1s) can connect to the Terracotta Server Array (L2s) in a cluster?

While the number of L1s that can exist in a Terracotta cluster is theoretically unbounded (and cannot be configured), effectively planning for resource limitations and the size of the shared data set should yield an optimum number. Typically, the most important factors that will impact that number are the requirements for performance and availability. Typical questions when sizing a cluster:

  • What is the desired transactions-per-second?
  • What are the failover scenarios?
  • How fast and reliable is the network and hardware? How much memory and disk space will each machine have?
  • How much shared data is going to be stored in the cluster? How much of that data should be on the L1s? Will BigMemory be used?
  • How many stripes (active Terracotta servers) does the cluster have?
  • How much load will there be on the L1s? On the L2s?

The most important method for determining the optimum size of a cluster is to test various cluster configurations under load and observe how well each setup meets overall requirements.

TROUBLESHOOTING

After my application interrupted a thread (or threw InterruptedException), why did the Terracotta client die?

The Terracotta client library runs with your application and is often involved in operations which your application is not necessarily aware of. These operations may get interrupted, too, which is not something the Terracotta client can anticipate. Ensure that your application does not interrput clustered threads. This is a common error that can cause the Terracotta client to shut down or go into an error state, after which it will have to be restarted.

Why does the cluster seem to be running more slowly?

There can be many reasons for a cluster that was performing well to slow down over time. The most common reason for slowdowns is Java Garbage Collection (GC) cycles.

Another possible cause is when an active server is syncing with a mirror server. If the active is under substantial load, it may be slowed by syncing process. In addition, the syncing process itself may appear to slow down. This can happen when the mirror is waiting for specific sequenced data before it can proceed. This is indicated by log messages similar to the following:

WARN com.tc.l2.ha.L2HACoordinator - 10 messages in pending queue.
Message with ID 2273677 is missing still

If the message ID in the log entries changes over time, no problems are indicated by these warnings.

One indication that slowdowns are occurring on the server and that clients are throttling their transaction commits is the appearance of the following entry in client logs:

INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID[2]
(: TransactionID=[65037) : Took more than 1000ms to add to sequencer  : 1497 ms

Why do all of my objects disappear when I restart the server?

If you are not running the server in persistent mode, the server will remove the object data when it restarts. If you want object data to persist across server restarts, run the server in persistent mode.

Why are old objects still there when I restart the server?

If you are running the server in persistent mode, the server keeps the object data across restarts. If you want objects to disappear when you restart the server you can either run in non-persistent mode or remove the data files from disk before you restart the server. See this question.

Why is the Terracotta Developer Console or Terracotta Operations Center timing out when it tries to connect to a Terracotta server?

If you've verified that your Terracotta cluster is up and running, but your attempt to monitor it remotely using a Terracotta console is unsuccessful, a firewall may be the cause. Firewalls that block traffic from Terracotta servers' JMX ports prevent monitoring tools from seeing those servers. To avoid this and other connection issues that may also be attributable to firewalls, ensure that the JMX and DSO ports configured in Terracotta are unblocked on your network.

Why does the Developer Console runs very slowly when I'm monitoring the cluster?

If you are using the Terracotta Developer Console to monitor a remote cluster, especially in an X11 environment, issues with Java GUI rendering may arise that slow the display. You may be able to improve performance simply by changing the rendering setup.

If you are using Java 1.7, set the property sun.java2d.xrender to "true" to enable the latest rendering technology:

-Dsun.java2d.xrender=true

For Java 1.5 and 1.6, be sure to set property sun.java2d.pmoffscreen to "false" to allow Swing buffers to reside in memory:

-Dsun.java2d.pmoffscreen=false

For information about this Java system property, see http://download.oracle.com/javase/1.5.0/docs/guide/2d/flags.html#pmoffscreen.

You can add these properties to the Developer Console start-up script (dev-console.sh or dev-console.bat).

Why can't certain nodes on my Terracotta cluster see each other on the network?

A firewall may be preventing different nodes on a cluster from seeing each other. If Terracotta clients attempt to connect to a Terracotta server, for example, but the server seems to not not have any knowledge of these attempts, the clients may be blocked by a firewall. Another example is a backup Terracotta server that comes up as the active server because it is separated from the active server by a firewall.

Client and/or server nodes are exiting regularly without reason.

Client or server processes that quit ("L1 Exiting" or "L2 Exiting" in logs) for seemingly no visible reason may have been running in a terminal session that has been terminated. The parent process must be maintained for the life of the node process, or use another workaround such as the nohup option.

Why is more than one active server coming up when I configured only one?

If you have a setup with one active Terracotta server instance and a number of standbys, but are seeing errors because more than one active server is in the cluster, a "split brain" scenario has occurred.

Due to network latency or load, the Terracotta server instances may not may be have adequate time to hold an election. Increase the property in the Terracotta configuration file to the lowest value that solves this issue.

If you are running on Ubuntu, see the note at the end of the UnknownHostException topic.

I have a cluster with more than one stripe (more than one active Terracotta server) but data is distributed very unevenly between the two stripes.

The Terracotta Server Array distributes data based on the hashcode of keys. To enhance performance, each server stripe should contain approximately the same amount of data. A grossly uneven distribution of data on Terracotta servers in a cluster with more than one active server can be an indication that keys are not being hashed well. If your application is creating keys of a type that does not hash well, this may be the cause of the uneven distribution.

Why is a crashed Terracotta server instance failing to come up when I restart it?

If it's running in persistent mode, the ACTIVE Terracotta server instance should come up with all shared data intact. However, if the server's database has somehow become corrupt, you must clear the crashed server's data directory before restarting.

I lost some data after my entire cluster lost power and went down. How can I ensure that all data persists through a failure?

If only some data was lost, then Terracotta servers were configured to persist data. The cause for losing a small amount of data could be disk "write" caching on the machines running the Terracotta server instances. If every Terracotta server instance lost power when the cluster went down, data remaining in the disk cache of each machine is lost.

Turning off disk caching is not an optimal solution because the machines running Terracotta server instances will suffer a substantial performance degradation. A better solution is to ensure that power is never interrupted at any one time to every Terracotta server instance in the cluster. This can be achieved through techniques such as using uninterruptible power supplies and geographically subdividing cluster members.

Why does the JVM on my SPARC machines crash regularly?

You may be encountering a known issue with the Hotspot JVM for SPARC. The problem is expected to occur with Hotspot 1.6.0_08 and higher, but may have been fixed in a later version. For more information, see this bug report.

SPECIFIC ERRORS AND WARNINGS

Why, after restarting an application server, does the error Client Cannot Reconnect... repeat endlessly until the Terracotta server's database is wiped?

The default value of the client reconnection setting l1.max.connect.retries is set to "-1" (infinite). If you frequently encounter the situation described in this question and do not want to wipe the database and restart the cluster, change the retry setting to finite value. See the high-availability page for more information.

What does the warning "WARN com.tc.bytes.TCByteBufferFactory - Asking for a large amount of memory..." mean?

If you see this warning repeatedly, objects larger than the recommended maximum are being shared in the Terracotta cluster. These objects must be sent between clients and servers. In this case, related warnings containing text similar to Attempt to read a byte array of len: 12251178; threshold=8000000 and Attempting to send a message (com.tc.net.protocol.delivery.OOOProtocolMessageImpl) of size may also appear in the logs.

If there are a large number of over-sized objects being shared, low-memory issues, overall degradation of performance, and OutOfMemory errors may result.

What is causing regular segmentation faults (segfaults) with clients and/or servers failing?

Segfaults may be caused by Hyperic (Sigar), the library used by Terracotta servers and clients to report on certain system resources, mainly activity data for CPU, combined CPU, disk, and network. You may need to turn off Sigar, thus losing the ability to monitor and record these network resources directly through Terracotta software. To turn off Sigar, set the Terracotta property sigar.enabled to false on the nodes that exhibit the error:

-Dsigar.enabled=false

For more information on setting Terracotta properties, see the Terracotta Configuration Guide and Reference.

When starting a Terracotta server, why does it throw a DBVersionMismatchException?

The server is expecting a Terracotta database with a compatible version, but is finding one with non-compatible version. This usually occurs when starting a Terracotta server with an older version of the database. Note that this can only occur with servers in the permanent-store persistence mode.

Why am I getting MethodNotFound and ClassNotFound exceptions?

If you've integrated a Terracotta product with a framework such as Spring or Hibernate and are getting one of these exceptions, make sure that an older version of that Terracotta product isn't on the classpath. With Maven involved, sometimes an older version of a Terracotta product is specified in a framework's POM and ends up ahead of the current version you've specified. You can use tools such as jconsole or jvisualvm to debug, or specify -XX:+TraceClassLoading on the command line.

I'm getting a Hyperic (Sigar) exception, and the Terracotta Developer Console or Terracotta Operations Center is not showing certain metrics?

These two problems are related to starting Java from a location different than the value of JAVA_HOME. To avoid the Hyperic error and restore metrics to the Terracotta consoles, invoke Java from the location specified by JAVA_HOME.

When I start a Terracotta server, why does it fail with a schema error?

You may get an error similar to the following when a Terracotta server fails to start:

 Error Message:
Starting BootJarTool...
2008-10-08 10:29:29,278 INFO - Terracotta 2.7.0, as of 20081001-101049
(Revision 10251 by cruise@rh4mo0 from 2.7)
2008-10-08 10:29:30,459 FATAL -
*******************************************************************************
The configuration data in the file at '/opt/terracotta/conf/tc-config.xml'
does not obey the Terracotta schema:
[0]: Line 8, column 3: Element not allowed: server in element servers

*******************************************************************************

This error occurs when there's a schema violation in the Terracotta configuration file, at the line indicated by the error text. To confirm that your configuration file follows the required schema, see the schema file included with the Terracotta kit. The kit includes schema files (*.xsd) for Terracotta, Ehcache, and Quartz configurations.

Why is a newly started passive (backup) Terracotta server failing to join the cluster as it tries to synchronize with the active server?

If a newly started passive (backup) Terracotta server fails with an error similar to com.tc.objectserver.persistence.db.DBException: com.sleepycat.je.LockTimeoutException: (JE 4.1.10) Lock expired, and then continues to fail with that error upon restart, then the synchronization phase between the passive and active Terracotta servers must be tuned. Specifically, try raising the value of the following Terracotta property:

<!-- Value is in microseconds. -->
<property name="l2.berkeleydb.je.lock.timeout" value="180000000" />

Note that you must restart the active server for this property to take effect.

The Terracotta servers crash regularly and I see a ChecksumException.

If the logs reveal an error similar to com.sleepycat.je.log.ChecksumException: Read invalid log entry type: 0 LOG_CHECKSUM, there is likely a corrupted disk on at least one of the servers.

Why is java.net.UnknownHostException thrown when I try to run Terracotta sample applications?

If an UnknownHostException occurs, and you experience trouble running the Terracotta Welcome application and the included sample applications on Linux (especially Ubuntu), you may need to edit the etc/hosts file.

The UnknownHostException may be followed by "unknown-ip-address".

For example, your etc/hosts file may contain settings similar to the following:

127.0.0.1       localhost
127.0.1.1    myUbuntu.usa myUbuntu

If myUbuntu is the host, you must change 127.0.1.1 to the host's true IP address.

NOTE: You may be able to successfully start Terracotta server instances even with the "invalid" etc/hosts file, and receive no exceptions or errors, but other connectivity problems can occur. For example, when starting two Terracotta servers that should form a mirror group (one active and one standby), you may see behavior that indicates that the servers cannot communicate with each other.

On a node with plenty of RAM and disk space, why is there a failure with errors stating that a "native thread" cannot be created?

You may be exceeding a limit at the system level. In *NIX, run the following command to see what the limits are:

ulimit -a

For example, a limit on the number of processes that can run in the shell may be responsible for the errors.

Why does the Terracotta server crash regularly with java.io.IOException: File exists?

Early versions of JDK 1.6 had a JVM bug that caused this failure. Update JDK 1.6 to avoid this issue.