|
Release: 2.4 Previous Releases
Publish Date: October, 2007 |
rate-6226449-85770
| Article Rating? |
|
|
|
Concept and Architecture Guide
Introduction
Open Terracotta is an enterprise-class, open-source, JVM-level clustering solution. JVM-level clustering simplifies enterprise Java by enabling applications to be deployed on multiple JVMs, yet interact with each other as if they were running on the same JVM. Terracotta extends the Java Memory Model of a single JVM to include a cluster of virtual machines such that threads on one virtual machine can interact with threads on another virtual machine as if they were all on the same virtual machine with an unlimited amount of heap.
The programming model of applications clustered using Open Terracotta is the same or similar to that of an application written for a single application. There is no developer API specific to Open Terracotta. Terracotta uses bytecode manipulation—a technique used by many Aspect-Oriented Software Development frameworks such as AspectJ and AspectWerkz—to inject clustered meaning to existing Java language features.
This document describes Terracotta's basic concepts and architecture. For a technical introduction to the technology, see our Introduction to Teracotta article
on InfoQ.
TODO: https://jira.terracotta.org/jira/browse/DOC-136
Core Terracotta Concepts
This document is a guide to the core concepts and architecture of Terracotta.
Roots
A "root" is the top of a clustered object graph as declared in the Terracotta configuration. Object sharing in Terracotta starts with these declared roots. Any object that is reachable by reference from a Terracotta root becomes shared by Terracotta which means that it is given a cluster-wide object id and its changes are tracked by Terracotta and made available to the cluster.
Roots are declared in the Terracotta configuration by name. A root declaration in the Terracotta configuration must either declare an explicit field name in a java class or a field expression. Any field may be declared a root and the same root may be assigned to more than one field.
Fields that are declared as roots assume special behavior. This is the one area of Terracotta that diverges significantly from regular Java semantics.
- The first time a root is assigned by the first JVM that assigns it, the root is "created" in the Terracotta cluster and the object graph of the object assigned to the root field becomes the object graph of that root.
- Once assigned, the value of a root field may never be changed. After the first assignment, all subsequent assignments to fields declared to be that root are ignored.
Because roots are durable beyond the scope of a single JVMs lifecycle, they are sometimes called "superstatic" (which is just a term we made up). This is true no matter what the scope modifier of the root field. If a static field is declared to be a root, that static field will assume a "superstatic" lifecycle. Likewise, if an instance field is declared to be a root, that instance field will assume a "superstatic" lifecycle. This can have a significant effect on code written with respect to root objects and can cause some astonishment if the behavior of Terracotta around roots is not well understood.
While the reference of the root field itself cannot be changed, the object graph of that root object can. Typically, data structures like Maps, Lists, and other Collections are chosen as root objects—although any portable object can be declared a root. The contents of those data structures can change. It is just the assignment of the root field that cannot be changed. There is an exception to this "cannot be changed" rule – Root fields that contain literal values (see the description of literal values in the Clustered Objects section below). The value of literal roots can be freely mutated (ie. assignments are never ignored)
Further reading:
More on configuring roots
TODO:
https://jira.terracotta.org/jira/browse/DOC-137
- Root assignment can only happen once per field.
- Code snippet from Tim, FAQ entry and inbound link.
- Lvalue expression issue code snippet from Tim and link to JIRA issue
- Literal roots
- Portability and roots
- Link to the Roots section of the configuration guide
- Link to the Roots section of the Development Guide
- Best practices around declaring roots (what should be roots, etc)
- Issues around initialization code and roots.
Clustered Objects
When a root field is first assigned, the Java object assigned to that field and all object reachable by reference from that object become clustered objects. This means that each object is given a cluster-wide unique id and all its field data is added to the current Terracotta transaction. When the Terracotta transaction is committed, all of the object data is sent to the Terracotta server. All changes to clustered objects must be made within the context of a Terracotta transaction.
Object data is composed of the values of reference and literal fields. Reference fields are references to other Java objects. Literal fields are data types that Terracotta considers to be not reference types. Terracotta literals are similar to (but not exactly the same as) Java primitives (the current set of literal types can be found in the com.tc.object.LiteralValues class
)
When an object becomes referenced by a clustered object, it and all of the objects in its object graph become clustered objects themselves. Once a Java object becomes clustered, it is always a clustered object until it becomes garbage collected by the distributed garbage collector. All the changes to clustered objects are recorded to the current Terracotta transaction. For literal fields, the new value of that literal field is recorded in the Terracotta transaction. For reference fields, the cluster object id of the object to which the field refers is recorded in the Terracotta transaction.
There is guaranteed to be one and only one instance of a particular clustered object per ClassLoader per JVM. All changes made to a clustered object anywhere in the cluster will be applied to the same instance of the clustered object for its ClassLoader's context. This is to preserve object identity. See this series of blog entries for more information on Terracotta and object identity:
Clustered objects must be portable. An attempt to cluster a non-portable object will result in a runtime exception.
Virtual Heap
Terracotta virtual heap allows arbitrarily large clustered object graphs to fit within a constrained local heap size in client virtual machines. The local heap has a "window" on a clustered object graph. Portions of a clustered object graph are faulted in and paged out as needed. Virtual heap is similar in concept to virtual memory and is sometimes referred to as "Network Attached Memory."
Clustered objects are lazily loaded from the server as they are accessed by a client JVM. This happens by injecting augmented behavior around the GETFIELD bytecode instruction that checks to see if the object referred to by that field is currently instantiated on the local heap. If it isn't on the local heap yet, the client will request the object from the server, instantiate the object on the local heap, and ensure that the field refers to that newly instantiated object.
Conversely, less frequently used objects may be transparently purged from local heap by Terracotta, subject to the whims of the local JVM garbage collector. The amount of clustered object data kept in local heap is determined at runtime. Terracotta actively sets references to objects that have fallen out of the cache to null so that they may become eligible for local garbage collection. Because clustered objects may be lazily loaded, purged objects will be transparently retrieved from the server as references to them are traversed.
Distributed Garbage Collection
The Terracotta server has a distributed garbage collector that will remove clustered objects that are no longer referenced. In order for a clustered object to be eligible for garbage collection, it must not be reachable from any root (i.e., must not be part of a root's object graph) and it must not be resident in any client's heap. When the distributed garbage collector finds objects that are eligible for collection, it removes them from the server and from the server's persistent storage.
Distributed garbage collection can be set to run automatically, or it can be triggered externally by calling the run-dgc shell script or through the server's JMX management interface.
Further reading:
For more information on configuring distributed garbage collection options, see the Configuration Guide and Reference
For more information on the run-dgc shell script, see the Terracotta Tools guide
For more information on the distributed garbage collector JMX interface, see the JMX Guide
Locks
Locks in Terracotta perform two duties: to coordinate access to critical sections of code between threads and to serve as boundaries for Terracotta transactions. Terracotta locks are analogous to synchronization in Java. Clustered locking is injected into your application based on locks section of the Terracotta configuration. Each lock configuration stanza uses a regular expression that matches a set of methods. The locking that occurs in that set of methods is controlled by further configuration options specified in the lock configuration stanza.
One of the configuration options in the lock configuration stanza is the lock "level." Unlike Java, Terracotta locks come in four levels: write, synchronous-write, read, and concurrent. Write locks act like regular Java locks. They guarantee that only one thread in the entire cluster can acquire that lock at any given time. Synchronous-write locks add the further guarantee that the thread holding the lock will not release the lock until the changes made under the scope of that lock have been fully applied and ACKed by the server. Read locks allow multiple threads to acquire the lock at a time, but those threads are not allowed to make any changes to clustered objects while holding the read lock. No thread may acquire a write lock if any thread holds a read lock. No thread may acquire a read lock if any thread holds a write lock. If a thread attempts to make modifications while holding a read lock, an exception will be thrown. If, however, it crosses another lock acquisition boundary on the same lock(e.g., a synchronization on the same object), but that new lock acquisition boundary is configured to be a write lock, that thread will try to upgrade its lock from a read lock to a write lock and will not be allowed to proceed until it acquires that upgraded write lock. Read locks offer a significant performance advantage over write locks when multiple threads concurrently execute code that does not modify any clustered objects. However, Terracotta does not replace existing synchronization; because Java synchronization is a mutual exclusion lock, threads on the same JVM do not benefit from Terracotta read locks. Only threads on separate JVMs benefit from Terracotta read locks.
Concurrent locks are always granted. They do not protect critical code sections and serve only as transaction boundaries. Changes made within the scope of a concurrent lock are applied atomically, but there is no guarantee about the order in which transactions created in the scope of concurrent locks on different threads are applied. Transactions made in the same thread under the same concurrent lock will be applied in order, but there is no guarantee about the order in which transactions under a concurrent lock made on different threads are applied. You should only use concurrent locks when you don't care about potential write-write conflicts between threads.
In addition to the lock level, a lock stanza may be specified as a "named lock" or an "autolock." Methods that match an autolock stanza are augmented by Terracotta to acquire a cluster-wide lock on a clustered object wherever there is synchronization on that object. This works by adding a request for a clustered object's lock from the server around the MONITORENTER bytecode instruction and releasing that lock around the MONITOREXIT bytecode instruction. Synchronized methods are a syntactic shortcut for synchronizing on the object the method belongs to, so autolocks applied to synchronized methods will lock on the object to which the method belongs. You can think of autolocks as an extension of a method's existing Java synchronization to have a cluster-wide meaning.
A thread that attempts to execute a method that matches a named lock stanza must first acquire the lock of that name from the Terracotta server. Named locks are very coarse-grained and should only be used when autolocks are not possible.
Autolocks are fine-grained locks in the sense that the lock requested is the lock for a particular clustered object. The lock acquired is constructed based on the unique cluster id of the object being synchronized on. This provides a significant performance advantage over named locks, but autolocks only acquire a lock if the object being synchronized on is a clustered object. In autolocked methods, if the object being synchronized on is not a clustered object, then only the regular, local JVM lock is acquired. Likewise, autolocks applied to methods that have no synchronization in them will have no clustered locking.
Methods that match a lock stanza will have no clustered locking behavior if that method is not part of a class that has been included for Terracotta bytecode instrumentation.
A thread that holds the clustered lock for a clustered object may call wait() on that object. This will cause the thread to commit the current Terracotta transaction, release the clustered lock, and pause execution. Any other thread in the cluster that holds the clustered lock on that object may call notify() on that object. The Terracotta server will select a waiting thread (in the case of a notify() call) or the set of all waiting threads (in the case of a notifyAll() call) and ensure that the appropriate waiting threads throughout the Terracotta cluster are notified. A thread that is notified will contend for the clustered lock and resume execution when granted that lock.
Further reading:
More on configuring locks
TODO: http://jira.terracotta.org/jira/browse/DOC-140
- Greedy locks
- Link to the locks section of the configuration guide
- Link to the includes section of the configuration guide.
- Complete the discussion of lock upgrades and downgrades.
- Nested concurrent locks
- literal autolocks
- sync. block on a class object
- Synchronized static methods and autolocks
- Link to development guide describing best-practices around locking, lock scope, reducing lock contention, etc.
- Note about concurrency issues in Terracotta being basic concurrency issues. Reference to "Concurrent Programming in Java"
Transactions
Terracotta transactions are sets of clustered object changes that must be applied atomically. Transactions are bounded by lock acquisition and release. When a clustered lock is acquired by a thread, a Terracotta transaction is started and all changes to clustered objects made within the scope of that lock are added to that transaction. When the lock is released, the Terracotta transaction is committed.
When a lock is acquired, Terracotta guarantees that all of the transactions made under the scope of that lock in every JVM in the Terracotta cluster are applied locally on the heap of the acquiring thread. This ensures that threads always see a consistent view of clustered objects.
All changes to clustered objects must happen within the context of a Terracotta transaction. This means that a thread must acquire a clustered lock prior to modifying the state of any clustered objects. If a thread attempts to modify a clustered object outside the context of a terracotta transaction, a runtime exception will be thrown. A special case of this is worth mentioning. If a thread synchronizes on an object that is not yet clustered, that thread does not acquire a clustered lock and does not start a Terracotta transaction. That thread may not make modifications to any shared objects, even if the object the thread has synchronized becomes shared within the scope of that synchronization.
For example, this code will result in a runtime exception:
Foo foo = new Foo();
synchronized (foo) {
foo.setBar("Hey, there. I'm setting the bar attribute.");
myRoot.put("foo", foo);
}
This code will resolve the issue:
Foo foo = new Foo();
synchronized (foo) {
foo.setBar("Hey, there. I'm setting the bar attribute.");
synchronized (myRoot) {
myRoot.put("foo", foo);
}
}
TODO: http://jira.terracotta.org/jira/browse/DOC-141
- Moraga: synchronous commit: full ACK prior to lock release (configurable per lock stanza)
- FAQ/troubleshooting entry and inbound link: exception caused by modifying a shared object outside the scope of a lock.
- Outbound link to troubleshooting guide for resolving unlocked shared exception issues
- Outbound link to the developer guide for lock/transaction placement issues.
Distributed Method Invocation (DMI)
Any method of an object contained in a shared object graph can be distributed, meaning that an invocation of that method in one virtual-machine will trigger the same method invocation on all the mirrored instances in other virtual-machines. This is a useful mechanism for implementing a distributed listener model. It is important to note that distributed methods work in the context of a shared instance that defines that method. It is not sufficient that the instance's class be instrumented, it must also be contained in a shared object graph.
Further reading:
More on configuring distributed method invocation
Bytecode Instrumentation
Terracotta's clustering behavior is injected into application code at runtime by the use of bytecode instrumentation. Before the bytecode of a class is loaded by the JVM, Terracotta manipulates the bytecode of that class according to the Terracotta configuration. This includes acquiring clustered locks and pushing changes to clustered objects among other things.
Terracotta can be configured to instrument all or just a subset of the classes loaded into the JVM. You can elect to instrument all classes to make sure that there are no classes that don't have clustering behavior. Conversely, you can elect to instrument only a subset of classes that you know need to have clustering behavior injected into them. Restricting the set of classes included for instrumentation by Terracotta is useful for reducing the overhead at class load-time introduced by the Terracotta class instrumentation process. It is also useful for reducing any runtime overhead introduced by the clustering code injected into classes. While the overhead is minimal for code that doesn't manipulate shared objects or acquire shared locks, excluding classes that don't need Terracotta instrumentation eliminates that overhead entirely.
A class must match an "include" pattern in order to be instrumented. If a class doesn't match the included set, no instrumentation will occur, regardless of any further configuration that might pertain to that class. For example, if a root is configured in a class that has not been included for instrumentation, the field will not actually become a root.
Further reading:
Portability
Terracotta can cluster most Java objects, but there are limits to what can be added to a Terracotta cluster. An object that can be clustered by Terracotta is called "portable". In order for an object to be portable, its class must be instrumented. That is, Terracotta must weave in special modifications to the bytecode of an object's class before the class is loaded. Whether or not a class is instrumented is determined by the Terracotta configuration file. In addition, some classes are automatically instrumented by Terracotta.
Instances of most instrumented classes are portable, but there are a few constraints on object portability. Some objects are inherently non-portable because they represent JVM-specific or host machine-specific resources. Some of the filesystem-related classes such as java.io.FileDescriptor are examples of host machine-specific resources that are inherently non-portable. Thread and Runtime are examples of JVM-specific resources that are inherently non-portable.
Other non-portable objects are instances of classes that extend inherently non-portable, uninstrumented classes, or logically instrumented classes described in the topic Physically vs. Logically Managed Objects.
A portable object that becomes clustered by Terracotta is said to be "managed". A portable object becomes managed if it becomes reachable from another managed object. Everything reachable by a managed object is part of a "managed graph" of objects. Terracotta keeps track of changes made to managed objects by way of special functionality injected into regular classes when they are loaded. The classes of all portable objects must be instrumented in this way; likewise, all classes that directly access fields of managed objects must also be instrumented - even if they themselves will never be shared.
Boot JAR Classes
For most classes that need Terracotta functionality injected into them, this instrumentation can be performed transparently by Terracotta when the class is loaded. Some classes, however, are loaded too early in the lifecycle of the JVM for Terracotta to hook into their loading process. These are classes that are loaded by the boot classloader. Such classes cannot be instrumented at classload time, but must be pre-instrumented and placed in a special JAR file that is then prepended to the boot classpath.
This JAR file, called the "boot JAR", is created by the Terracotta boot JAR tool. Some of the classes in the Terracotta boot JAR are placed in it automatically by the boot JAR tool. Other classes may be added to the boot JAR by augmenting the Terracotta configuration to include them in the boot JAR section and then running the boot JAR tool.
A class that is loaded by the boot classloader cannot be shared by Terracotta unless it is in the boot JAR. Likewise, a class that has a superclass that is loaded by the boot classloader cannot be shared by Terracotta unless that superclass is in the boot JAR.
Physically vs. Logically Managed Objects
Most objects are managed by moving their field data around the Terracotta cluster. These classes of objects are described as "physically managed" because Terracotta records and distributes changes to the physical structure of the object. When a field of a physically managed object changes, the new value of the field is sent to the Terracotta server and to other members of the Terracotta cluster that currently have the changed object in memory.
Some classes of objects are not shared this way, however. Instead of moving the physical structure of such objects around, we record the methods called on those objects along with the arguments to those methods and then replay those method calls on the other members of the Terracotta cluster. These classes of objects are described as "logically managed" because Terracotta records and distributes the logical operations that were performed on them rather than changes to their internal structure.
Objects are logically managed either for performance reasons or because their internal structure is JVM-specific. Classes that use hashed structures such as java.util.Hashtable, java.util.HashMap, or java.util.HashSet are logically managed. The hashcodes used to create the internal structure of these classes are JVM-specific. If an object that has a structure based on JVM-specific hashcodes were physically managed, its structure on other JVMs in the Terracotta cluster would be incorrect, since the hashcodes would be different on the other JVMs.
Non-portable and Logically Managed Classes
Classes that have a non-portable class in their type hierarchy are not portable. Sub-classes of non-instrumented classes fall into this classes fall into this category as do sub-classes of inherently non-portable classes like java.lang.Thread. A subclass of Thread is not portable because Thread itself is not portable, even if it is instrumented.
While logically managed classes are themselves portable, there are some restrictions on the portability of classes that have logically managed classes in their type hierarchy. This is due to technical details of the Terracotta implementation of logically managed classes.
If the subclass of a logically managed class has declared additional fields, that class is not portable if:
- it directly writes to a field declared in the logically managed superclass
- it overrides a method declared in the logically managed superclass
If you find that a class is not portable because it inherits from a class that is non-portable due to being uninstrumented, you can modify the Terracotta configuration to include all of the classes in the type hierarchy for instrumentation. However, if you find that a class is not portable because of the other inheritance restrictions, you must refactor the class you want to make portable so that it does not violate the inheritance restriction.
Portability Contexts
There are a number of contexts in which objects become shared by Terracotta. In each of these contexts, the portability of the objects that are about to be shared is checked. If there is a portability error, Terracotta throws an exception.
Field Change
When a field of an object that is already shared changes, the object being newly assigned to that reference is checked for portability. If that object is not portable, Terracotta throws a portability exception.
If the newly referenced object is itself portable, then its object graph is traversed. If any object reachable by the newly referenced object is not portable, Terracotta throws a portability exception.
Logical Action
Methods called on logically-managed shared objects that change the state of that object are termed "logical actions". When such a method is called, the arguments to that method are checked for portability. If any of those objects are not portable, Terracotta throws a portability exception.
If the argument objects are themselves portable, then their object graphs are traversed. If any object reachable by the argument objects is not portable, Terracotta throws a portability exception.
Object Graph Traversal
When a top-level object becomes shared in any of the above contexts, Terracotta traverses all of the objects reachable in that top-level object's graph. During that traversal, if any non-portable object is encountered, Terracotta throws a portability exception.
Remedies
As of Terracotta 2.3, when a non-portable exception occurs, the entire object graph of the new objects being added will be printed to the log. See the non-portable-dump section of the Configuration Guide and Reference
If a non-portable exception occurs, there are a number of possible remedies. Consider the following example classes:
class Person {
String firstName;
String lastName;
... }
class Customer extends Person {
long customerID;
... }
class MyProperties extends java.util.Properties {
... }
class MyThread extends Thread {
... }
class Address {
private Logger logger;
private String stree1;
private String street2;
private State state;
... }
class Logger {
private FileOutputStream out;
... }
TODO: http://jira.terracotta.org/jira/browse/DOC-186
This section should be moved to the troubleshooting guide and augmented.
Class Not Included
If you want to share an instance of the Customer class, but the Customer class has not been included for instrumentation in the Terracotta configuration, simply add or modify an <include> declaration in the Terracotta configuration file that matches the Customer classname.
Superclass Not Included
If you want to share an instance of the Customer class, but the Person class hasn't been included for instrumentation in the Terracotta configuration, simply add or modify an <include> declaration in the Terracotta configuration file that matches the Person classname.
Subclass of a Logically-Managed Class
If you want to share an instance of the MyProperties class you will get a portability exception because it extends java.util.Properties which, in turn, extends java.util.Hashtable which is a logically-managed class.
The general solution is to refactor your code so there are no logically-managed classes in the type hierarchy of the objects you want to share. In this case, you can change from an inheritance model to an aggregation model by modifying the MyProperties class to contain a reference to a Properties object rather than extending the Properties class.
 | Useful Information
It is only subclasses of logically-managed classes that cannot be shared. For example, direct instances of java.util.HashMap can always be shared without issue. |
Subclass of a Non-Shareable Class
If you want to share an instance of the MyThread class, you will get a portability exception because it extends Thread which is not shareable. Thread is not shareable because it represents a JVM-specific resource.
The general solution is to refactor your code so there are no non-shareable classes in the type hierarchy of the objects you want to share. In this case, you can make MyThread an implement Runnable and pass it to a Thread object.
Reference to Non-Shareable Class
In order to share an instance of the Address class, when the object graph of the Address object is traversed and checked for portability, the traverser will follow the Address.logger reference and try to share all of the fields of the Logger object. When it reaches the Logger.out reference, it will throw a portability error because the Logger.out field is a reference to FileOutputStream which is a host machine-specific resource and cannot be shared.
In this case, it's not practical to remove the reference to the logger object. Terracotta recommends making a portion of the object graph transient so the unshareable FileOutputStream object is ignored by Terracotta.
Transience in Terracotta
In the Logger example from "Remedies", you see a case where a single reference from an entire object graph prevents you from sharing anything in that graph. This is a very similar situation to Java serialization, where a single reference to a non-serializable object can prevent an entire graph from being serialized.
Like Java serialization's use of the transient modifier, Terracotta provides a mechanism, Terracotta transience, to allow certain fields to be skipped during sharing. Terracotta also provides a richer model than Java serialization, allowing you to automatically run various methods or snippets of code when an object is loaded so that you can assign appropriate values to transient fields.
Although Terracotta transience and Java transience are similar, by default, Terracotta does not skip fields that are marked with the Java transient modifier when sharing an object. This is because Java serialization and Terracotta sharing are significantly different, and just because a field should not be serialized does not mean Terracotta should not share it.
Making Fields Transient
Making a field transient in Terracotta is similar to declaring a field transient for Java serialization. When the traverser reaches a transient field, it skips that field. The result is that the object referred to by that field reference and all objects that are only referenceable by that transient object are not checked for portability and they are not shared.
Fields of classes can be made transient in the <includes> section of the Terracotta configuration in one of two ways:
- Set an include declaration to honor the built-in Java transient field modifier. In this case, if your source code uses the transient field modifier on the Address.logger field, Terracotta will automatically make the Address.logger field transient.
- Declare specific fields to be transient by name. This allows you to make fields transient that haven't been declared transient in the source code.
All classes that match a given include declaration will be given the declared field transience behavior. Similarly, just because a field is declared to be transient to Terracotta does not prevent it from being serialized when using Java serialization. The two are different concepts, and the only way in which they are connected is that you can tell Terracotta to honor the Java transient keyword for Terracotta transience, as explained in the "Declaring On-Load Behavior" topic.
Fields that marked both as DSO Roots and are made transient cannot have both qualities applied to them. In this case, the field will made a root (ie. it will not be transient), and a warning will be logged
Declaring On-Load Behavior
Just like Java serialization, when a shared object containing a transient reference is materialized on another node in the cluster, by default that transient reference is null. If this object were being shared using Java serialization, there would be two ways to initialize the Address.logger field before the object is used:
- Refactor the code to make the logger accessible only through a method that checks to see if the logger is null and performs lazy initialization of the logger if it is. This solution requires a fair amount of code change and also distorts the object model since you cannot refer to the logger field the way you normally would.
- The other alternative when using Java serialization is to define a special method on the Address class named readObject(java.io. ObjectInputStream). This method is guaranteed to be called when the Address object is deserialized, so you can initialize the logger field in this method before any thread has access to the newly deserialized object.
Terracotta has a similar feature that is configurable in the <includes> section of the configuration. Each include declaration can be augmented with on-load behavior which determines what further steps Terracotta takes beyond applying shared object data when loading a shared object into a virtual machine.
There are two flavors of Terracotta on-load behavior. The first is to declare a method by name to be called when an object is loaded. This is similar to the special method readObject used by Java serialization, but Terracotta lets you decide for yourself which method to use to initialize the transient field.
If you don't have a suitable method already defined or if, for some reason, you can't add such a method to the class in question, you can specify a BeanShell script in the Terracotta configuration that will be executed when the object is loaded.
BeanShell Scripting
The Java BeanShell is a scripting language that is similar to ordinary Java code and has the advantages of being more dynamic (for example, you don't have to declare types for your variables) and can be executed at runtime. The BeanShell has been standardized in JSR-274. A full description of the BeanShell is beyond the scope of this document, but you can view an introduction, tutorial, complete documentation, and the full specification at http://www.beanshell.org/
.
Beyond the basic specification, Terracotta has made a few small modifications to the BeanShell environment to permit better use in the Terracotta run-time environment.
 | Referencing Shared Object in BeanShell
For technical reasons, it is not possible to refer to the object being shared as this in BeanShell code used in an on-load declaration in the Terracotta configuration file. Instead, you must refer to it as self. For example, to initialize the Address.logger field in the example code from the Remedies section, you might use the following on-load BeanShell script:
self.logger = Logger.getLogger(self.getClass()); |
Terracotta Limitations
This section describes limitations in Distributed Shared Objects technology.
Unknown Not-Instrumentable Classes
Some user-defined classes may not be shareable in ways we cannot detect. They may, for example, have native methods in user-defined classes that make them impossible to share correctly.
Classes That Should Be Logically Managed
Terracotta cannot detect all cases where a user-defined or third-party library class should be logically managed. An example of such a case is user code that explicitly examines an object's hash code. Because the hash code of an object can be a VM-specific value, it might not be safe to share such objects. Typically, this occurs in collection libraries like GNU Trove.
There is currently no customer-facing way to make new classes logically-managed. There is an ongoing effort to add logically-managed support for commonly-used third-party libraries that require it. Terracotta does have limited support for the GNU Trove collections, for example. For more specific information about exactly which third-party classes have been adapted by Terracotta to be logically managed, please contact Terracotta support at support@terracottatech.com.
Non-Static Inner Classes
If a non-static inner class is included for instrumentation, its containing class must also be included for instrumentation and vice versa. Terracotta does not currently have a way to enforce this, so it is possible to start a Terracotta client in this state.
The symptom of this mis-configuration is NoSuchMethodErrors being thrown in instrumented methods that use direct field access on the uninstrumented inner or outer class's instance fields.
Performance Considerations
Although convenient for experimentation and prototyping, a class include pattern of *..* can have negative performance impacts. The optimal set of included classes is only those types which be will shared in Terracotta. In particular those classes that comprise the implementation of the web container should be excluded. For example the classes matching patterns such as org.apache.catalina..* or org.apache.jasper..* should be excluded unless absolutely necessary (since these are core classes of the Tomcat web container). The default configuration file included with Sessions Configurator contains exclude patterns suitable for many popular containers.
Terracotta for Sessions Concepts
TODO: http://jira.terracotta.org/jira/browse/DOC-147
This section needs to be written.
Further Reading
For a quick-start introduction to Terracotta Sessions, see the Sessions Quick-start Guide.
Terracotta for Spring Concepts
TODO: http://jira.terracotta.org/jira/browse/DOC-143
Terracotta for Spring is a runtime for Spring-based application that provides high-availability and high performance clustering for your Spring applications without changes to the application code.
With Terracotta for Spring, developers can create single-node Spring applications as usual. They define which Spring application contexts they want to have clustered in the configuration file. Terracotta for Spring handles the rest. Spring applications are clustered automatically and transparently and have the same semantics across the cluster as on the single node.
Terracotta for Spring does not require changes to existing code and does not require the source code. The application is transparently instrumented at load time, based on a declarative XML configuration.
Additionally, Terracotta for Spring does not require that classes implement Serializable, Externalizable or other interfaces or annotations. This is possible since we do not use serialization, (e.g. flatten the entire object graph), but, instead, are only sending the actual data that has changed over the network.
Further Reading
• For a quick-start guide to using Terracotta for Spring, see the Spring Quick-start Guide.
Clustering Singleton Spring Beans
Life-cycle semantics and scope for Spring beans are preserved across the cluster within the same logical ApplicationContext. The current clusterable bean type is Singleton.
You can declaratively configure which beans in which application contexts (in a specific web application or in a stand-alone application) you want to cluster.
Within the web application, each of the application contexts that needs to be shared is identified and configured by specifying the set of bean configuration XML file(s) that are used to create the application context. For each application context you can specify several attributes including which Spring beans to cluster.
Each of the Spring beans to cluster are identified by using the bean name. By default, all of a bean's references (fields) are clustered but you can also choose to define certain references (fields) as "non-distributable", which means that they will not be clustered but maintain a node-local value. This is important, since the bean might have references to classes that are holding on to node-specific resources, such as files, sockets etc.
Distribution of Application Context Events
Spring's event mechanism (local and synchronous) in the ApplicationContext is turned into asynchronous, distributed, and reliable events, which are still local within the same logical ApplicationContext. An event that is published in a distributed application context is broadcast to all of the listeners in each Java Virtual Machine (JVM). This can serve as a simpler and more performant alternative to publish/subscribe using Java Messaging Service (JMS). The messages/events can be any type of regular Java instance (POJO), and are not required to implement any specific interfaces.
 |
Terracotta for Spring changes the semantics of Spring application context events. The events are no longer handled within the transaction context of the publisher. Instead, each ApplicationListener is responsible for managing its own transaction context. |
When distributed events are enabled, calls to ApplicationEventPublisher.publishEvent are distributed to all application contexts within the cluster.
Clustering of JMX State
Clustered beans can be exported using Spring JMX support. The Java Management Extensions (JMX) data is aggregated, which guarantees a single point of management and monitoring, as well as a coherent view of all JMX data in the cluster, regardless from where it is accessed.
Terracotta Architecture
Terracotta Server
The Terrracotta server is the heart of a Terracotta cluster. It performs two basic functions:
- Cluster-wide thread coordination and lock management
- Clustered object data management and storage
The server brokers between all threads on all client JVMs for lock requests. It keeps track of which locks are held by which threads on the client JVMs and accepts and responds to lock requests. It also keeps track of which threads in the client JVMs are waiting on clustered objects. When notify() or notifyAll() is called on a clustered object on a thread in a client JVM, the server chooses which set of threads in the cluster ought to be notified and sends notifications up to the appropriate clients.
The server also manages object data and the persistence of that data. As clustered objects are changed on the client JVMs, the server receives those changes, stores those changes to disk as necessary, and sends those changes to other client JVMs in the cluster that need them. It also keeps track of the set of clustered objects currently resident in heap for each client JVM and responds to requests from client JVMs for objects that they don't currently have in their local heap.
Server Death from the Servers' Perspective
If a server process dies, what happens next depends on whether the server is in persistent or non-persistent mode and whether or not standby servers have been configured and are waiting to take over. If the server is in non-persistent mode, there can be no standby servers. The server may be restarted, but it will start fresh with no object data; and, it will forcibly destroy any server data that it finds on disk.
If the dead server was in persistent mode and there are no standby servers, it may be restarted and clients that were connected when the previous instance died will be allowed to reconnect. Any clients that were not connected to the previous server instance will not be allowed to reconnect, since they are perceived by the server as having died or as having been part of another Terracotta cluster.
If the dead server was in persistent mode and there are standby servers, one of the standby servers will be elected as the new active server. As in the previous case, clients that were connected to the dead server when the previous instance died will be allowed to reconnect.
Further reading:
For more information on configuring multiple servers for high-availability, see the Servers section of the Configuration Guide and Reference
Client Reconnect and the Client Reconnect Window
When a standby or restarted server instance enters active service, there is a (configurable) time window during which previously connected clients are allowed to reconnect. When all previously connected clients have reconnected or the client reconnect time window has elapsed, the server will complete the handshakes with its reconnected clients and the cluster will assume normal operation.
During the client reconnect window, while any previously connected client fails to reconnect, the server will wait for any previously connected clients to reconnect and will not resume normal operation. All previously connected clients that fail to reconnect by the time the client reconnect window closes will be perceived by the server as having died. When the client reconnect window closes, the server will assume normal operation.
Further reading:
For more information on configuring the client reconnect window, see the client-reconnect-window section of the Configuration Guide and Reference
Client Death From the Server's Perspective
If a connection from a Terracotta client becomes disconnected, the server perceives that client as having died. The server will reclaim any locks currently held by that client. If a dead client tries to reconnect, that client will be rejected because it is in an indeterminate state.
Terracotta Client
A Terracotta 'client' is a JVM that participates in a Terracotta cluster and that your application runs in. You application may run in a standalone JVM or in an application server. From the Terracotta cluster perspective, they are both Terracotta 'clients.
On startup, a Terracotta client JVM initiates a network connection with a Terracotta server. Once the connection to the Terracotta server is made, the client is allowed to proceed with its normal startup operations. As classes are loaded into the client JVM, they are instrumented with Terracotta bytecode modifications according to the Terracotta configuration (for more information on bytecode modifications, see the instrumented-classes section of the Configuration Guide and Reference.
Server Death From the Client's Perspective
If a client JVM's network connection to the Terracotta server is disconnected, client will attempt to reconnect to the server for a configurable number of seconds. If that fails, it assumes that the active server has died and that, at some point, a new active server will appear and start accepting connections. While the client is disconnected, any thread that attempts to acquire a shared lock will block. When the client perceives server death, the client will try to connect to the new active server. If it successfully connects to the new active server, it will initiate a handshake procedure. When and if the handshake procedure is complete, the client will automatically continue operation.
Further reading:
For more information on configuring multiple servers and client reconnect options, see the Servers section of the Configuration Guide and Reference.
Terracotta Cluster
A Terracotta cluster is any number of client JVMs connected to any number of Terracotta servers that share the same object data and locks. (As of Terracotta 2.3, there is not currently support for a cluster with more than one active server). The applications running in the client JVMs may be separate instances of the same logical application, or they may be separate logical applications that share some distributed object data and thread coordination. Whether or not there is a single application or multiple applications distributed across multiple Terracotta clients is an architectural distinction that you may choose to make in your application(s). There is no such distinction from the perspective of the Terracotta cluster.
The object data of the cluster may be persisted to an on-disk representation by the Terracotta server. Clusters may be run in persistent and in non-persistent mode.
Further reading:
For more information on configuring the persistence mode of a Terracotta cluster, see the persistence section of the Configuration Guide and Reference
Non-Persistent Clusters
For a cluster running in non-persistent mode, some data may be written to disk to be paged out of memory and retrieved later. But, a cluster running in non-persistent mode cannot survive a server restart. If a server running in non-persistent mode terminates, all of the client JVMs in that entire cluster must be terminated as well. The old cluster's data must be removed prior to the new server starting. When the new server instance starts, the cluster starts with fresh data and no clients connected to previous instances of the server will be allowed to reconnect.
Persistent Clusters and High Availability
The Terracotta server of a cluster running in persistent mode can survive a server restart and, in fact, multiple servers may be configured for high-availability. In persistent mode, the server ensures that a consistent and stable view is persisted to disk so that the server may be restarted at any time.
If multiple servers are configured for high availability, they must also be configured to write their data to a shared filesystem that supports file locks. Any number of servers may be configured, but a single server will be active and the rest will be passive. Should the active server fail, one of the passive servers will automatically be promoted to be the new active server, the clients will automatically reconnect to the new active server, and the cluster will resume normal operations.
Further reading:
For more information on configuring multiple servers, see the Servers section of the Configuration Guide and Reference
For more information on configuring the servers' data location, see the server data section of the Configuration Guide and Reference
Cluster Events
Cluster events, introduced in Terracotta 2.3, provides a mechanism for a Terracotta client to be notified of significant events that occur in the cluster, such as client connects and disconnects.
Application specific behavior can then be defined based on cluster membership events. Examples include:
- Repartitioning of data or building in work acquisition logic based on worker (client JVM) failures - in a POJO-based-grid use-case
- Sending out an alert to operational staff or dynamically provisioning a new server into the cluster etc.
The callback code has to be implemented by the application developer, since it is necessarily application specific. There are two ways to implement the callback:
- Through the Terracotta JMX interface: This method is advocated for end-users of Terracotta. Basically find the MBean which emits notifications on client-JVM connects, disconnects and perhaps other events in the future (org.terracotta:type=Terracotta Cluster,name=Terracotta Cluster Bean) and write code that implements NotificationListener and reacts to the notifications received from the MBean.
For more information on the Terracotta cluster events JMX interface and some sample code, see the JMX Guide
Terracotta Scalability
For an in-depth discussion of how the Terracotta architecture scales, see this article on Terracotta scalability
Appendix
Contacting Terracotta
Contact Terracotta at the following:
Web site: http://www.terracotta.org
Online forums: http://forums.terracottatech.com/forums/
Information: info@terracottatech.com
Platform Support
See Platform Support for information on which platforms are supported by Terracotta.
See Integrations to see the status of integrations with third-party technologies.
Copyright Information
Copyright © 2005-2007
Terracotta, Inc.
All Rights Reserved
This publication (the "Documentation") and the Terracotta software which it describes (the "Software") are protected to the maximum extent permitted under applicable law, including but not limited to, the regulations set forth in Title 17 of the United States Code, and California law. This Documentation, or any parts thereof, may not be reproduced in any form, by any method, for any purpose, without the express written consent of Terracotta. Terracotta makes no warranty, either express or implied, including but not limited to any implied warranties of merchantability or fitness for a particular purpose, with respect to the Software discussed in this Documentation, and the Documentation itself (collectively, "the Materials"). The Materials are made available solely on an "as-is" basis. In no event shall Terracotta be liable to anyone for special, collateral, incidental, indirect, punitive, exemplary, or consequential damages in connection with, or arising from the purchase or use of, the Materials. Under no circumstances and regardless of the cause of action alleged, shall Terracotta's liability exceed the purchase price of the Software described herein. Terracotta reserves the right to revise and improve its Software and Documentation as it deems fit. The Documentation describes the state of the Software at the time of publication.
Trademarks
"Terracotta," the stylized "T" logo, and "Open Terracotta" are trademarks of Terracotta. All other brand names, product names, or trademarks belong to their respective holders. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective holders.
Government Use
Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in FAR 12.212 (Commercial Computer Software-Restricted Rights) and DFAR 267.7202 (Rights in Technical Data and Computer Software), as applicable.