Taylor's Brain
Brief intro: I'm the Product Manager for Terracotta. I read our forums
religiously and try my best to help people with questions they have about Terracotta. I find I repeat myself quite often - in forums and in person - so I thought it might be a good idea to record some of the most common things here. Get some of the stuff I know about Terracotta out of my Brain if you will...
This list of "things" is by no means ordered, I am just going to start logging things I see.
How do I do "X" with Terracotta
One of the most common questions I get from people is "Terracotta doesn't have an API, so how do I use it?" It comes in the form of "how do I share an object between two threads, or keep a sorted list, or coordinate the activity of two JVMs", and sometimes in the form of FUD e.g. "You can't do x, y, or z" with Terracotta because it doesn't have an API.
The answer is always the same - don't concern yourself with Terracotta at first. Just think about a single JVM, use normal Java, and figure out how you would do that same thing using two threads. You can even code your entire application in a single VM and not use Terracotta at all to unit test. Do it something like this:
public class Main
{
public static void main(String[] args)
{
new MyThread1().start();
new MyThread2().start();
}
}
After that, just use Terracotta to cluster the appropriate data structure, and start one thread in on VM on one machine, and the other thread in another VM on another machine.
Now, change your main methods to be:
public class Main1
{
public static void main(String[] args)
{
new MyThread1().start();
}
}
public class Main2
{
public static void main(String[] args)
{
new MyThread2().start();
}
}
Read this blog post by Shay Banon
- author of Compass (a distributed search engine built on top of Lucene) - and this post
I wrote detailing how he and I built the integration for Compass in a few hours using this very concept.
Yellow flags
A lot of people post on our forums with some problem or other. And I always ask them to post their tc-config.xml file. You may wonder why this is. It's because just by looking at their tc-config.xml, I can get a sense of what they are trying to do with their application, and I can also look out for a set of "yellow flags" - things that from experience I know mean there might be a problem and deserve more attention.
So, here's a short list of things I look out for:
On performance (part 1)
A lot of people come to Terracotta and they use it for the first time and they find it works absolutely perfect for them, and exceeds their performance expectations in every way.
And then there is second set of people that come to Terracotta, and find after playing with it or integrating it into their application, that the performance is simply not what they expected, and doesn't even come close to their needs.
What's going on?
Well, the difference is in the use case. There are ways to use Terracotta such that it can give absolutely stunning performance results, and ways in which the performance will be let's say less than stunning. But the important part of the story is the people that found the performance to be stunning. If you happen to have fallen into the second bucket, don't despair, you can actually get the performance you need. It may involve a little bit of tuning, but I promise you it will not be outrageously hard - most things in Terracotta are really just good Java practices, so if you know Java, you already know Terracotta.
So rest assured that with some small changes to your application - or to Terracotta's configuration - stunning performance can be achieved.
Here are some things to think about:
- Terracotta reads from memory. If you have uncontended reads, your "cluster" reads can go nearly as fast as memory. On my MacBook Pro I achieved 500,000 reads per second. That's for one machine. If you add two, you can get 1MM, 3, 1.5MM and so on. It is possible to get some really stunning numbers.
- Terracotta writes deltas. This is really important, and it means that you can easily take advantage of this behavior. If you have an object graph that is 10kb and you are updating a String of 100 bytes, the savings between sending the entire object graph of 10kb and 100 bytes is pretty big. In this contrived example (100 bytes vs. 10kb) your application will send some 99% less network traffic than if it were using a traditional approach that uses Java Serialization. But. If your object graph is 200 bytes, and you are modifying 200 bytes, you won't see any difference at all! So your performance depends on your application.
- Just like in Java, performance is primarily driven by locking. And just like in Java, you should write your application to be "correct" and then tune its performance. Often times the correct application has enough performance to meet its performance requirements. But when it doesn't, don't fear - tuning Terracotta locking is not any different than tuning Java locking. You can move a fine-grained lock to a course-grained lock or vice versa, to improve performance. And we have tools to help you identify performance hotspots.
Let's move on then, to a real world case that comes up so often, I want to just write it down as a case study of how to tune locks for better performance...
On locking, performance, maps, and striping locks
One of the most common use cases of Terracotta is clustering a map. You can use a Hashtable, a HashMap, a ConcurrentHashMap or any other Java Map.
Before we get started, if you are still using a Hashtable, I hope a) you have legacy code that you cannot change or b) you thought very carefully about why you are using a Hashtable and decided that for convenience a Hashtable was ok. I am not saying you should never use a Hashtable (some people do) but I am saying you should really look carefully at why you are using it, because generally speaking its performance will be worse than the other options.
Ok, so you have a map. Let's look at a common operation:
public void update(K myKey, V myValue)
{
MyObject foo = myMap.get(myKey);
foo.update(newValue);
myMap.put(myKey, foo); }
First, the third line of code (myMap.put(myKey, foo) is not needed in Terracotta. There is no notion of "put the object back" as Terracotta plugs right in to the Java Memory model, so Terracotta knows you made an update to the foo object, and you don't need to "tell" the map about that change.
Ok, second, unless you only have 1 thread, or are using Hashtable - and you probably shouldn't be - this code is going to blow up somewhere unless we put in some synchronization. Let's look at a naive first pass:
public void update(K myKey, V myValue)
{
synchronized (myMap) {
MyObject foo = myMap.get(myKey);
foo.update(newValue);
}
}
All well and good. If this were under Terracotta, we would need some config to go along with it (assuming the map is a root):
<locks>
<autolock>
<method-expression>void MyPackage.MyClass.update(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
</locks>
Ok. So what's wrong with this? Well it locks on myMap, so there is only one mutex for myMap. The synchronized keyword uses this mutex and it makes code execute mutually exclusive in the Java Virtual Machine. That's good - right? We want to make sure no one else performs an operation at the same time to ensure that the operations don't cause corrupted heap.
Well...not quite. Turns out if we look closely at what we are doing, we are getting an object from a map and then updating that object. We don't need such coarse-grained synchronization to cover this kind of update, we can actually do something much better, and not compromise the integrity of our program.
To do so, we can use the notion of "lock striping" which is just a fancy term for using many locks. In this case, the object "foo" has it's own lock. Considering that every call to our method "update" will probably come with a different key, our code will in fact be trying to update many different physical objects if called from different threads. If those calls happen concurrently (at the same time) then our mutually exclusive lock prevents updates to different objects happening at the same time. So let's fix this.
public void update(K myKey, V myValue)
{
MyObject foo;
synchronized (myMap) {
foo = myMap.get(myKey);
}
synchronized (foo) {
foo.update(newValue);
}
}
There! Now we can update our "foo" object without blocking anyone else from updating their "foo" object. Note that the tc-config.xml remains unchanged.
But wait can we do better?
Yes. In Terracotta, we use the notion of a "read" lock to perform operations that would normally be mutually exclusive (one and only one thread) to be concurrent. In the code above we are only "reading" from the map. So we don't in fact need a mutually exclusive lock because no changes are being made to the map. If we can make the get call to the map not mutually exclusive, then we can safely read from the map, and then get a write lock on our "foo" object, and we will have all but eliminated any contention. This is the essence of lock-striping - eliminate lock contention.
So, let's do that:
public void update(K myKey, V myValue)
{
MyObject foo = getObject(myKey);
synchronized (foo) {
foo.update(newValue);
}
}
public V getObject(K myKey)
{
synchronized (myMap) {
return = myMap.get(myKey);
}
}
Now we need to change the Terracotta config to be the following:
<locks>
<autolock>
<method-expression>void MyPackage.MyClass.update(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
<autolock>
<method-expression>void MyPackage.MyClass.getObject(..)</method-expression>
<lock-level>read</lock-level>
</autolock>
</locks>
 |
You can also use ReentrantReadWrite locks if you are using Java 1.5 and above. Terracotta supports ReentrantReadWrite locks just like you would expect - read locks give you read locks, and write locks give you write locks |
On performance (part 2)
Here's a really common thing I hear: "I have a program that runs X fast without Terracotta, and when I add Terracotta, it goes Y fast, which is slower, why?"
This is a classic apples to oranges comparison. Terracotta does not make your program faster. It makes it scalable. There's a difference.
Faster meaans on a single machine, your program can do some percentage more operations in a given time (per second usually). But that physical machine is bound to have an upper limit somewhere. It's only got a single cpu, or 2 cpus, or maybe even 4 or 8, and they might even run at 3.0GHZ or more. But you can't add more horsepower to the machine (well sometimes you can, but even there, you have to stop at some point - there are only so many processor slots on a motherboard).
Scalable means you can add more processing power and get more throughput. Now scalability often comes at a price. And clustering certainly does. So you will probably see a performance drop with Terracotta for a single instance of your program, but you've gained two things for that price:
a) Scalability
b) Availability
Let's talk about Scalability. Let's say you measure your application, and before Terracotta it can do 100 operations per second. You add Terracotta, and it can only do 80 operations per second. That's a 20% overhead, which sounds bad, right? But wait. You can now add a second machine. And with Terracotta, that second machine will add an equal amount of horsepower to the equation. So now your application cluster can do 160 operations per second. Which is 60% FASTER than the first machine.
Next, let's talk about Availability. Before Terracotta was added to your application if you had a power failure, all the data in your application would disappear (unless you wrote it to disk - and even in that case if you had a complete meltdown of the machine hosting your application the data would still disappear). But now that your application is using Terracotta, if the single instance of your application disappears - for any reason - the data is still intact. And that means if you have additional nodes in your cluster running your application that your application will continue to operate in the face of the failure of the single node. Furthermore, Terracotta can write your application's data to disk. Without anything more than a trivial configuration change. And it can do so for more than one server and for each servers disk. Not only preventing loss in the case of a single failure, but in the case of a complete failure. This is availability.
There's no API. So how do I use it?
Right. Generally, this is the hardest thing for people to get their heads around. If there's no API, what the heck does Terracotta do then?
Well, I like to say that Java is the API. What does that mean? It means you get to write Java, and Terracotta can make objects in your heap highly available across the cluster.
What does that mean? We're getting into the weeds. Here's what you have to do. 20 years of network programming, databases, RMI, J2EE are in your brain, and they are preventing you from seeing the simple essence of what Terracotta is trying to give you.
Very simply, you have a single JVM. It has a heap. That's where objects live. You have threads, that execute instructions, which operate on objects, that live in your heap. It looks like this:
- Java Virtual Machine
- Heap - that holds objects
- Threads - that execute application code (embedded in the objects that live in the heap)
If we want to "share" data between two threads, we don't even think about how to do that, because all threads in a single Java Virtual Machine have access to the Heap. And Threads in a single Java Virtual Machine can coordinate with one another (using Java primitives such as synchronized wait and notify) and so coordinating threads in single Virtual Machine is so trivial that we don't even give it a moments thought.
But that all goes horribly wrong when we go to two JVMs. Now our picture looks like:
- Java Virtual Machine 1
- Heap - that holds objects
- Threads - that execute application code (embedded in the objects that live in the heap)
- Java Virtual Machine 2
- Heap - that holds objects
- Threads - that execute application code (embedded in the objects that live in the heap)
The two systems are completely disconnected. So what do we do if we want to somehow share data or coordinate activity across those two? Enter 20 years of solutions. J2EE. Databases. RMI. JMS. Sockets. Files. ESBs. SOA. And on. And on. And on.
What if, instead, you could in essence just add more threads (and heap). Well on a single physical machine that's not possible. And if you add another JVM it's not connected. But with Terracotta, the physical JVMs are connected in a way that that is exactly what you are doing. Adding more physical JVMs to the cluster simply adds more heap, and more threads, to one big logical Java Virtual Machine. Which means you don't have to bring in an external technology just to share data between two physical systems, or coordinate activities between two physical systems. Just start 1 thread in one system, and another thread in a second system. And use plain Java.
That's the concept. Try our Cookbook section to get a feel for using Terracotta.
Don't use broad wildcard statements in the config
The Terracotta configuration file (often referred to as tc-config.xml) allows for very specific identification of classes to instrument and lock, and also very broad (using wildcards).
Let's look at an some examples. First, a very specific tc-config.xml:
<tc:tc-config xmlns:tc="http:
xmlns:xsi="http:
xsi:schemaLocation="http:>
<application>
<dso>
<instrumented-classes>
<include>
<method-expression>com.mycompany.Foo</method-expression>
</include>
</instrumented-classes>
<locks>
<autolock>
<method-expression>void com.mycompany.Foo.update(String)</method-expression>
<lock-level>write</lock-level>
</autolock>
</locks>
</dso>
</application>
</tc:tc-config>
And the wildcard "instrument and lock the world":
<tc:tc-config xmlns:tc="http:
xmlns:xsi="http:
xsi:schemaLocation="http:>
<application>
<dso>
<instrumented-classes>
<include>
<method-expression>*..*</method-expression>
</include>
</instrumented-classes>
<locks>
<autolock>
<method-expression>* *..*.*(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
</locks>
</dso>
</application>
</tc:tc-config>
Which one is better? Generally speaking I prefer the specific file to the overly broad. The specific file will be the most efficient, and will not cause you problems down the line by instrumenting or locking a class that you did not intend to be instrumented or locked. Generally speaking it is better to know exactly what Terracotta is doing in your application than to wildcard the configuration and just cover everything.
That being said, there are some advantages to the second approach. Certainly, when first learning Terracotta, it's a bit overwhelming to have to list out every class and every method in the tc-config.xml just to get something working. You'll notice the demos that ship with the Terracotta installation tend to use the wildcard notation. I am not sure if I like this, but it does help people new to Terracotta get started right away. The only problem is that the wildcard style is appropriate for demo apps and naive first integrations - not production applications. Which is why I like it - and don't. It's a good technique - it just should be used with a bit of caution.
If you aren't confused already
sometimes, I think a hybrid approach is the right choice. For example, the SharedEditor demo that ships with Terracotta declares instrumentation on an entire package. In the context of that application, I think that is the right thing to do. Here's the config:
<instrumented-classes>
<!--Here, we say, instrument all of the classes found under the 'demo.sharededitor.models'
package-->
<include>
<class-expression>demo.sharededitor.models.*</class-expression>
<honor-transient>true</honor-transient>
</include>
It basically says to instrument all classes in the demo.shareeditor.models package. Since the model part of the application represents the state, this is a very valid thing to declare as instrumentable. And so I like it. You just have to make smart decisions - don't use wildcards to save yourself some typing, only use them to cover the datastructures you intend to cluster. If there is a reasonable range, such as a package containing the model classes of your application, use wildcards.
Well I hope you found that helpful. If you are new to Terracotta and configuration then you should read the [docs1:Configuring Terracotta] document which helps you go through the process of configuring Terracotta to cluster your application. It describes an outline and a process flow that will help you get your application clustered with Terracotta.
So you're using DMI. Hmmmm...
Distributed Method Invocation - or DMI - is a neat trick that Terracotta provides just in case you can't get the thing you are trying to do done using other mechanisms. And that's why I get a bit nervous when someone uses it. Here's why...
Terracotta is sometimes a bit hard for people to understand - it's really a different way to think about distributed computing. So when people first come to Terracotta, they look for something they already understand to help them figure out what Terracotta does. And sometimes they stumble onto DMI, probably because of the name mind you, and they go, "AHA! Now I understand Terracotta. It's kinda like RMI without the RMI gunk - I just make a method fire across the cluster. I get this."
Well I am obviously paraphrasing quite a bit, and I don't mean to put anyone down. I completely understand why DMI is the first thing people get - it's the closest thing they understand from their previous context and understanding of distributed computing. If anything it's a failing on our part to properly educate people on using Terracotta...which we are alway trying to improve of course...but I digress...
So the reason DMI gives me caution when someone uses it is because it generally indicates they haven't quite gotten Terracotta yet, and it's possible they may be using DMI (on accident) instead of the full set of features Terracotta provides. You see DMI is a bit of a hack - it's more of an exception to the rule. A last resort if you will, when other techniques won't work.
Why do I say it's a "hack"? Well the main principle of Terracotta is to provide a seamless integration with the Java language. The core premise is that distributed computing can be modeled in the same way a multi-threaded application can be modeled, and in fact that there is no difference between the two models. But DMI is different. It's not natural Java in the sense that a single VM provides the same level of functionality - there is no DMI analog in a single JVM.
Now that doesn't mean DMI isn't useful. It is. It's just that when I see it, it means to me look a little closer, and see if this person is using DMI because they really need it, or that it was just the first thing that jumped out to them. Don't get me wrong - DMI is really cool and fun - it's just something you should use in case of emergency, not on a regular basis.
No read locks
No application does pure writes. If it did, it wouldn't be useful, it would just be a black hole. So somewhere, you are doing a read only operation in your application, I guarantee it! So if there aren't read locks in your tc-config.xml, I know one of two things:
- You know about read locks, but decided instead to omit the lock altogether to do a "dirty" read
- You don't know about read locks, and everything is a write lock (unnecessarily)
- You don't know about read locks, and some of your synchronization is a dirty read on accident
Oops, that's three.
Well, for the first bullet, I don't recommend dirty reads unless you really really know you have a bottleneck. Early optimization is the root of all evil. Second bullet is an experience thing, so this entry is here to help you know there is the notion of read locks, and you should get to know them. Third bullet is really the same experience thing, except it's Terracotta's fault that it doesn't tell you when you are reading from a shared object without a read lock present (like we do when you try to write to a shared object without a write lock present). I want to fix that and make that behavior default but optional (e.g. you can turn it off after you know what you are doing).
Named Locks
Named locks are a rather blunt instrument. They are useful in certain circumstances, but generally speaking you should be using autolocks. So if you are using Named Locks, you should know exactly why you are using them.
Oftentimes, if you are using Named Locks because there is no existing synchronization, you can use the auto-synchronized feature. Auto-synchronized synthesizes a synchronized block on the method. The advantage of auto-synchronized vs. named locks is that with a synchronized autolock, you have a mutually exclusive lock for the object instance, with a named lock, you have ONE lock for all object instances.
To learn how to use the auto-synchronized feature, check out the locking recipe