Call us: +1-415-738-4000
The Technical FAQ answers questions on how to use Terracotta products, integration with other products, and solving issues. If your question doesn't appear here, consider posting it on the Terracotta forums. Other resources for resolving issues include:
The FAQ is divided into the following sections:
In the servers section of your config.xml, add the following lines:
<server host="host1" name="server1"> ... <dso> ... <persistence> <mode>permanent-store</mode> </persistence> ... </dso> ... </server>
Note that the temporary-swap mode performs better than permanent-store, and should be used where persistance of shared data is not required across restarts.
Configure both servers in the
<servers> section of the Terracotta configuration file. Start the two Terracotta server instances that use that configuration file, one server assumes control of the cluster (the ACTIVE) and the second becomes the backup (the PASSIVE). See the high-availability chapter in the product documentation for more information.
The Terracotta Developer Console displays cluster topology by listing Terracotta server groups and connected client nodes in a navigation tree. If you're using Ehcache, Ehcache with Hibernate, Quartz Scheduler, or Terracotta Web Sessions, special panels in the Developer Console become active to give greater visibility into application data. These panels display in-memory values, provide live statistics, and offer a number of controls for configuring and manipulating in-memory data.
In addition, check standard output for messages that the Terracotta client has started up without errors. Terracotta clients also log messages to a file specified in the
<clients> section of the Terracotta configuration file.
The number of objects that can be held by a Terracotta server instance is two billion, a limit imposed by the design of Java collections. It is unlikely that a Terracotta cluster will need to approach even 50 percent of that maximum. However, if it does, other issues may arise that require the rearchitecting of how application data is handled in the cluster.
If a node attempts to change shared data while exiting, and the shutdown thread blocks, the node may hang and be dropped from the cluster, failing to exit as planned. The thread may block for any number of reasons, such as the failure to obtain a lock. A better alternative is to use the cluster events API to have a second node (one that is not exiting) execute certain code when it detects that the first node is exiting. If you are using Ehcache, use the cluster-events Ehcache API. In general, you can use the Terracotta Toolkit API to set up cluster-events listeners.
If you're using DSO and the Terracotta Toolkit, you can call
org.terracotta.api.Terracotta.registerBeforeShutdownHook(Runnable beforeShutDownHook) to perform various cleanup tasks before the Terracotta client disconnects and shuts down.
Note that a Terracotta client is not required to release locks before shutting down. The Terracotta server will reclaim those locks, although any outstanding transactions are not committed.
If you are using Ehcache, use the cluster-events Ehcache API. In general, you can use the Terracotta Toolkit API to set up cluster-events listeners.
Your application can check to see if the system property
tc.active is true. For example, the following line of code would return true if Terracotta is active at the time it is run:
See this question.
Information on the latest releases of Terracotta products, including a link to the latest platform support, is found on the Product Information. This page also contains a table with links to information on previous releases.
Set the TC_JAVA_HOME environment variable to point to a supported JDK or JRE. The target should be the top level installation directory of the JDK, which is the directory containing the
The known issues when trying to run Terracotta software on Ubuntu are:
Default shell is dash bash. Terracotta scripts don't behave under dash. You might solve this issue by setting your default shell to bash or changing
/bin/sh in our scripts to
The Ubuntu default JDK is from GNU. Terracotta software compatibility information is on the Product Information page.
See the UnknownHostException topic.
The Terracotta Server performs best with the default garbage collector. This is pre-configured in the startup scripts. If you believe that Java GC is causing performance degradation in the Terracotta Server, BigMemory is recommended as the simplest and best way to reduce latencies by reducing collection times.
Generally, the use of the Concurrent Mark Sweep collector (CMS) is discouraged as it is known to cause heap fragmentation for certain application-data usage patterns. Expert developers considering use of CMS should consult the Oracle tuning and best-practice documentation.
Using Terracotta with a simple data structure (such as
java.util.concurrent.LinkedBlockingQueue), you can easily create message queues that can replace JMS. Your particular use case should dictate whether to replace JMS or continue using it alongside Terracotta. See the Terracotta Toolkit API for more information on using a clustered queue.
Through Ehcache, you can enable and cluster Hibernate second-level cache.
Terracotta software integrates with most popular Java technologies being used today. For a full list, contact us at firstname.lastname@example.org.
Here are some ways to confirm that your Terracotta servers are running:
server-stat.batto generate a short status report on one or more Terracotta servers.
[PROMPT] wget http://localhost:9510/version
Create a file called
.tc.custom.log4j.properties and edit it as a standard
log4j.properties file to configure logging, including level, for the Terracotta node that loads it. This file is searched for in the path specified by the environment variable TC_INSTALL_DIR (if defined),
You can monitor the cluster using JMX. A good starting point is the Terracotta JMX guide. This document does not have a complete list of MBeans, but you can use a tool such as JConsole to view the MBeans needed for monitoring.
Statistics are available over JMX via the object name "org.terracotta:type=Terracotta Server,subsystem=Statistics,name=Terracotta Statistics Gatherer". The Terracotta Cluster Statistics Recorder has both command-line and RESTful interfaces. However, statistics recording can substantially degrade performance due its high resource cost.
Certain cluster parameters, such as heap size and cached-object count, are available via "org.terracotta:type=Terracotta Server,name=DSO".
Cluster events are available over JMX via the object name "org.terracotta:type=TC Operator Events,name=Terracotta Operator Events Bean".
While the number of L1s that can exist in a Terracotta cluster is theoretically unbounded (and cannot be configured), effectively planning for resource limitations and the size of the shared data set should yield an optimum number. Typically, the most important factors that will impact that number are the requirements for performance and availability. Typical questions when sizing a cluster:
The most important method for determining the optimum size of a cluster is to test various cluster configurations under load and observe how well each setup meets overall requirements.
The Terracotta client library runs with your application and is often involved in operations which your application is not necessarily aware of. These operations may get interrupted, too, which is not something the Terracotta client can anticipate. Ensure that your application does not interrput clustered threads. This is a common error that can cause the Terracotta client to shut down or go into an error state, after which it will have to be restarted.
There can be many reasons for a cluster that was performing well to slow down over time. The most common reason for slowdowns is Java Garbage Collection (GC) cycles.
Another possible cause is when an active server is syncing with a mirror server. If the active is under substantial load, it may be slowed by syncing process. In addition, the syncing process itself may appear to slow down. This can happen when the mirror is waiting for specific sequenced data before it can proceed. This is indicated by log messages similar to the following:
WARN com.tc.l2.ha.L2HACoordinator - 10 messages in pending queue. Message with ID 2273677 is missing still
If the message ID in the log entries changes over time, no problems are indicated by these warnings.
One indication that slowdowns are occurring on the server and that clients are throttling their transaction commits is the appearance of the following entry in client logs:
INFO com.tc.object.tx.RemoteTransactionManagerImpl - ClientID (: TransactionID=[65037) : Took more than 1000ms to add to sequencer : 1497 ms
If you are not running the server in persistent mode, the server will remove the object data when it restarts. If you want object data to persist across server restarts, run the server in persistent mode.
If you are running the server in persistent mode, the server keeps the object data across restarts. If you want objects to disappear when you restart the server you can either run in non-persistent mode or remove the data files from disk before you restart the server. See this question.
If you've verified that your Terracotta cluster is up and running, but your attempt to monitor it remotely using a Terracotta console is unsuccessful, a firewall may be the cause. Firewalls that block traffic from Terracotta servers' JMX ports prevent monitoring tools from seeing those servers. To avoid this and other connection issues that may also be attributable to firewalls, ensure that the JMX and DSO ports configured in Terracotta are unblocked on your network.
If you are using the Terracotta Developer Console to monitor a remote cluster, especially in an X11 environment, issues with Java GUI rendering may arise that slow the display. You may be able to improve performance simply by changing the rendering setup.
If you are using Java 1.7, set the property
sun.java2d.xrender to "true" to enable the latest rendering technology:
For Java 1.5 and 1.6, be sure to set property
sun.java2d.pmoffscreen to "false" to allow Swing buffers to reside in memory:
For information about this Java system property, see http://download.oracle.com/javase/1.5.0/docs/guide/2d/flags.html#pmoffscreen.
You can add these properties to the Developer Console start-up script (
A firewall may be preventing different nodes on a cluster from seeing each other. If Terracotta clients attempt to connect to a Terracotta server, for example, but the server seems to not not have any knowledge of these attempts, the clients may be blocked by a firewall. Another example is a backup Terracotta server that comes up as the active server because it is separated from the active server by a firewall.
Client or server processes that quit ("L1 Exiting" or "L2 Exiting" in logs) for seemingly no visible reason may have been running in a terminal session that has been terminated. The parent process must be maintained for the life of the node process, or use another workaround such as the
If you have a setup with one active Terracotta server instance and a number of standbys, but are seeing errors because more than one active server is in the cluster, a "split brain" scenario has occurred.
Due to network latency or load, the Terracotta server instances may not may be have adequate time to hold an election. Increase the
If you are running on Ubuntu, see the note at the end of the UnknownHostException topic.
The Terracotta Server Array distributes data based on the hashcode of keys. To enhance performance, each server stripe should contain approximately the same amount of data. A grossly uneven distribution of data on Terracotta servers in a cluster with more than one active server can be an indication that keys are not being hashed well. If your application is creating keys of a type that does not hash well, this may be the cause of the uneven distribution.
If it's running in persistent mode, the ACTIVE Terracotta server instance should come up with all shared data intact. However, if the server's database has somehow become corrupt, you must clear the crashed server's data directory before restarting.
If only some data was lost, then Terracotta servers were configured to persist data. The cause for losing a small amount of data could be disk "write" caching on the machines running the Terracotta server instances. If every Terracotta server instance lost power when the cluster went down, data remaining in the disk cache of each machine is lost.
Turning off disk caching is not an optimal solution because the machines running Terracotta server instances will suffer a substantial performance degradation. A better solution is to ensure that power is never interrupted at any one time to every Terracotta server instance in the cluster. This can be achieved through techniques such as using uninterruptible power supplies and geographically subdividing cluster members.
You may be encountering a known issue with the Hotspot JVM for SPARC. The problem is expected to occur with Hotspot 1.6.0_08 and higher, but may have been fixed in a later version. For more information, see this bug report.
The default value of the client reconnection setting
l1.max.connect.retries is set to "-1" (infinite). If you frequently encounter the situation described in this question and do not want to wipe the database and restart the cluster, change the retry setting to finite value. See the high-availability page for more information.
If you see this warning repeatedly, objects larger than the recommended maximum are being shared in the Terracotta cluster. These objects must be sent between clients and servers. In this case, related warnings containing text similar to
Attempt to read a byte array of len: 12251178; threshold=8000000
Attempting to send a message (com.tc.net.protocol.delivery.OOOProtocolMessageImpl) of size may also appear in the logs.
If there are a large number of over-sized objects being shared, low-memory issues, overall degradation of performance, and OutOfMemory errors may result.
Segfaults may be caused by Hyperic (Sigar), the library used by Terracotta servers and clients to report on certain system resources, mainly activity data for CPU, combined CPU, disk, and network. You may need to turn off Sigar, thus losing the ability to monitor and record these network resources directly through Terracotta software. To turn off Sigar, set the Terracotta property
sigar.enabled to false on the nodes that exhibit the error:
For more information on setting Terracotta properties, see the Terracotta Configuration Guide and Reference.
The server is expecting a Terracotta database with a compatible version, but is finding one with non-compatible version. This usually occurs when starting a Terracotta server with an older version of the database. Note that this can only occur with servers in the permanent-store persistence mode.
If you've integrated a Terracotta product with a framework such as Spring or Hibernate and are getting one of these exceptions, make sure that an older version of that Terracotta product isn't on the classpath. With Maven involved, sometimes an older version of a Terracotta product is specified in a framework's POM and ends up ahead of the current version you've specified. You can use tools such as jconsole or jvisualvm to debug, or specify
-XX:+TraceClassLoading on the command line.
These two problems are related to starting Java from a location different than the value of JAVA_HOME. To avoid the Hyperic error and restore metrics to the Terracotta consoles, invoke Java from the location specified by JAVA_HOME.
You may get an error similar to the following when a Terracotta server fails to start:
Error Message: Starting BootJarTool... 2008-10-08 10:29:29,278 INFO - Terracotta 2.7.0, as of 20081001-101049 (Revision 10251 by cruise@rh4mo0 from 2.7) 2008-10-08 10:29:30,459 FATAL - ******************************************************************************* The configuration data in the file at '/opt/terracotta/conf/tc-config.xml' does not obey the Terracotta schema: : Line 8, column 3: Element not allowed: server in element servers *******************************************************************************
This error occurs when there's a schema violation in the Terracotta configuration file, at the line indicated by the error text. To confirm that your configuration file follows the required schema, see the schema file included with the Terracotta kit. The kit includes schema files (*.xsd) for Terracotta, Ehcache, and Quartz configurations.
If a newly started passive (backup) Terracotta server fails with an error similar to
com.tc.objectserver.persistence.db.DBException: com.sleepycat.je.LockTimeoutException: (JE 4.1.10) Lock expired, and then continues to fail with that error upon restart, then the synchronization phase between the passive and active Terracotta servers must be tuned. Specifically, try raising the value of the following Terracotta property:
<!-- Value is in microseconds. --> <property name="l2.berkeleydb.je.lock.timeout" value="180000000" />
Note that you must restart the active server for this property to take effect.
If the logs reveal an error similar to
com.sleepycat.je.log.ChecksumException: Read invalid log entry type: 0 LOG_CHECKSUM, there is likely a corrupted disk on at least one of the servers.
java.net.UnknownHostExceptionthrown when I try to run Terracotta sample applications?
If an UnknownHostException occurs, and you experience trouble running the Terracotta Welcome application and the included sample applications on Linux (especially Ubuntu), you may need to edit the etc/hosts file.
The UnknownHostException may be followed by "unknown-ip-address".
For example, your etc/hosts file may contain settings similar to the following:
127.0.0.1 localhost 127.0.1.1 myUbuntu.usa myUbuntu
If myUbuntu is the host, you must change 127.0.1.1 to the host's true IP address.
NOTE: You may be able to successfully start Terracotta server instances even with the "invalid" etc/hosts file, and receive no exceptions or errors, but other connectivity problems can occur. For example, when starting two Terracotta servers that should form a mirror group (one active and one standby), you may see behavior that indicates that the servers cannot communicate with each other.
You may be exceeding a limit at the system level. In *NIX, run the following command to see what the limits are:
For example, a limit on the number of processes that can run in the shell may be responsible for the errors.
Early versions of JDK 1.6 had a JVM bug that caused this failure. Update JDK 1.6 to avoid this issue.