Call us: +1-415-738-4000

Enterprise Ehcache API Guide

Enterprise Ehcache includes APIs for extending your application's capabilities.

Enterprise Ehcache Search API For Clustered Caches

Enterprise Ehcache Search is a powerful search API for querying clustered caches in a Terracotta cluster. Designed to be easy to integrate with existing projects, the Ehcache Search API can be implemented with configuration or programmatically. The following is an example from an Ehcache configuration file:

<cache name="myCache" maxElementsInMemory="0" eternal="true">
 <searchable>
   <searchAttribute name="age" />
   <searchAttribute name="first_name" expression="value.getFirstName()" />
   <searchAttribute name="last_name" expression="value.getLastName()" />
   <searchAttribute name="zip_code" expression="value.getZipCode()" />
 </searchable>
 <terracotta />
</cache>

To learn more about the Ehcache Search API, refer to the following:

Sample Code

The following example assumes there is a Person class that serves as the value for elements in myCache. With the exception of "age" (which is bean style), each expression attribute in searchAttribute is set to use an accessor method on the cache element's value. The Person class must have accessor methods to match the configured expressions. In addition, assume that there is code that populates the cache. Here is an example of search code based on these assumptions:

// After CacheManager and Cache created, create a query for myCache:

Query query = myCache.createQuery();

// Create the Attribute objects.

Attribute<String> last_name = myCache.getSearchAttribute("last_name");
Attribute<Integer> zip_code = myCache.getSearchAttribute("zip_code");
Attribute<Integer> age = myCache.getSearchAttribute("age");

// Specify the type of content for the result set.
// Executing the query without specifying desired results
// returns no results even if there are hits.

query.includeKeys(); // Return the keys for values that are hits.

// Define the search criteria.
// This following uses Criteria.and() to set criteria to find adults
// with the last name "Marley" whose address has the zip code "94102".

query.addCriteria(last_name.eq("Marley").and(zip_code.eq(94102)));

// Execute the query, putting the result set
// (keys to element that meet the search criteria) in Results object.

Results results = query.execute();

// Find the number of results -- the number of hits.

int size = results.size();

// Discard the results when done to free up cache resources.

results.discard();

// Using an aggregator in a query to get an average age of adults:

Query averageAgeOfAdultsQuery = myCache.createQuery();
averageAgeOfAdultsQuery.addCriteria(age.ge(18));
averageAgeOfAdultsQuery.includeAggregator(age.average());
Results averageAgeOfAdults = averageAgeOfAdultsQuery.execute();

If (averageAgeOfAdults.size() > 0) {
  List aggregateResults = averageAgeOfAdults.all().iterator().next().getAggregatorResults();
  double averageAge = (Double) aggregateResults.get(0);
}

The following example shows how to programmatically create the cache configuration, with search attributes.

Configuration cacheManagerConfig = new Configuration();

CacheConfiguration cacheConfig = new CacheConfiguration("myCache", 0).eternal(true);

Searchable searchable = new Searchable();
cacheConfig.addSearchable(searchable);

// Create attributes to use in queries.
searchable.addSearchAttribute(new SearchAttribute().name("age"));

// Use an expression for accessing values.
searchable.addSearchAttribute(new SearchAttribute()
       .name("first_name")
       .expression("value.getFirstName()"));

searchable.addSearchAttribute(new SearchAttribute()
       .name("last_name").expression("value.getLastName()"));
searchable.addSearchAttribute(new SearchAttribute()
       .name("zip_code").expression("value.getZipCode()"));

cacheManager = new CacheManager(cacheManagerConfig);
cacheManager.addCache(new Cache(cacheConfig));

Ehcache myCache = cacheManager.getEhcache("myCache");

// Now create the attributes and queries, then execute.
...

Dynamic Search provides flexibility by allowing the search configuration to be changed after a cache is initialized. This is done with one method call, at the point of element insertion into the cache. The DynamicAttributesExtractor method returns a map of attribute names to index and their respective values. This method is called for every Ehcache.put() and replace() invocation.

If you think that you will want to add search attributes after a cache is initialized, you can explicitly indicate the dynamic search configuration. Set the allowDynamicIndexing attribute to "true" to enable use of the dynamic attributes extractor:

<cache name="cacheName" ...>
    <searchable allowDynamicIndexing="true">
       ...
    </searchable>
</cache>

Assuming that we have previously created Person objects containing attributes such as name, age, and gender, the following example shows how to create a dynamically searchable cache and register the DynamicAttributesExtractor:

Configuration config = new Configuration();
config.setName("default");
CacheConfiguration cacheCfg = new CacheConfiguration(“PersonCache”);
cacheCfg.setEternal(true);
cacheCfg.terracotta(new TerracottaConfiguration().clustered(true));
Searchable searchable = new Searchable().allowDynamicIndexing(true);

cacheCfg.addSearchable(searchable);
config.addCache(cacheCfg);

CacheManager cm = new CacheManager(config);
Ehcache cache = cm.getCache(“PersonCache”);
final String attrNames[] = {“first_name”, “age”};
// Now you can register a dynamic attribute extractor that would index the cache elements, using a subset of known fields
cache.registerDynamicAttributesExtractor(new DynamicAttributesExtractor() {
    Map<String, Object> attributesFor(Element element) {
        Map<String, Object> attrs = new HashMap<String, Object>();
        Person value = (Person)element.getObjectValue();
        // For example, extract first name only
        String fName = value.getName() == null ? null : value.getName().split(“\\s+”)[0];
        attrs.put(attrNames[0], fName);
        attrs.put(attrNames[1], value.getAge());
        return attrs;
    }
});
// Now add some data to the cache
cache.put(new Element(10, new Person(“John Doe”, 34, Person.Gender.MALE)));

Given the code above, the newly put element would be indexed on values of name and age fields, but not gender. If, at a later time, you would like to start indexing the element data on gender, you would need to create a new DynamicAttributesExtractor instance that extracts that field for indexing.

Dynamic Search Rules

  • A dynamically searchable cache must have a dynamic extractor registered before data is added to it. (This is to prevent potential races between extractor registration and cache loading which might result in an incomplete set of indexed data, leading to erroneous search results.)

  • Each call on the DynamicAttributesExtractor method replaces the previously registered extractor, as there can be at most one extractor instance configured for each such cache.

  • If a dynamically searchable cache is initially configured with a predefined set of search attributes, then this set of attributes will always be queried for extracted values, regardless of whether or not there is a dynamic search attribute extractor configured.

  • The initial search configuration takes precedence over dynamic attributes, so if the dynamic attribute extractor returns an attribute name already used in the initial searchable configuration, an exception will be thrown.

  • Clustered Ehcache clients do not share dynamic extractor instances or implementations. In a clustered searchable deployment, the initially configured attribute extractors cannot vary from one client to another, and this is enforced by propagating them across the cluster. However, for dynamic attribute extractors, each clustered client maintains its own dynamic extractor instance, not shared with the others. Each distributed application using dynamic search must therefore maintain its own attribute extraction consistency.

Note: Dynamic search is available for offheap-backed caches only.

Stored Search Indexes

Searches occur on indexes held by the Terracotta server. By default, index files are stored in /index under the server's data directory. However, you can specify a different path using the <index> element:

...
<server>
  <data>%(user.home)/terracotta/server-data</data>
  <index>%(user.home)/terracotta/index</index>
  <logs>%(user.home)/terracotta/server-logs</logs>
  <statistics>%(user.home)/terracotta/server-statistics</statistics>
...
</server>
...

To enhance performance, it is recommended that you store server data and search indexes on different disks.

Best Practices for Optimizing Searches

  1. Construct searches wisely by including only the data that is actually required.

    • Only use includeKeys() and/or includeAttribute() if those values are actually required for your application logic.
    • If you don't need values or attributes, be careful not to burden your queries with unnecessary work. For example, if result.getValue() is not called in the search results, then don't use includeValues() in the original query.
    • Consider if it would be sufficient to get attributes or keys on demand. For example, instead of running a search query with includeValues() and then result.getValue(), run the query for keys and include cache.get() for each individual key.

    Note: includeKeys() and includeValues() have lazy deserialization, which means that keys and values are de-serialized only when result.getKey() or result.getValue() is called. However, there is still some time cost with includeKeys() and includeValues(), so consider carefully when constructing your queries.

  2. Searchable keys and values are automatically indexed by default. If you do not need to search against keys or values, turn off automatic indexing with the following:

    <cache name="cacheName" ...>
      <searchable keys="false" values="false"/>
      ...
      </searchable>
    </cache>
    
  3. Limit the size of the results set with query.maxResults(int number_of_results). Another recommendation for managing the size of the result set is to use a built-in Aggregator function to return a summary statistic (see the net.sf.ehcache.search.aggregator package in this Javadoc).

  4. Make your search as specific as possible. Queries with "ILike" criteria and fuzzy (wildcard) searches may take longer than more specific queries. Also, if you are using a wildcard, try making it the trailing part of the string instead of the leading part ("321*" instead of "*123"). If you want leading wildcard searches, then you should create a <searchAttribute> with the string value reversed in it, so that your query can use the trailing wildcard instead.

  5. When possible, use the query criteria "Between" instead of "LessThan" and "GreaterThan", or "LessThanOrEqual" and "GreaterThanOrEqual". For example, instead of using le(startDate) and ge(endDate), try not(between(startDate,endDate)).

  6. Index dates as integers. This can save time and may even be faster if you have to do a conversion later on.

  7. Searches of eventually consistent caches are faster because queries are executed immediately, without waiting for pending transactions at the local node to commit. Note: This means that if a thread adds an element into an eventually consistent cache and immediately runs a query to fetch the element, it will not be visible in the search results until the update is published to the server.

  8. Because changes to the cache may not be available to queries until the changes are applied cluster wide (or, for transactional caches, commit() is called), it is possible to get results that include removed elements, inconsistent data from the same query executed at different times, or calculated values that are no longer accurate (such as those returned by aggregators). You can take certain precautions to prevent these types of problems. For example, if your search uses aggregators, add all aggregators to the same query to get consistent data. If your code attempts to get values using keys returned by a query, use null guards.

Enterprise Ehcache Cluster Events

The Enterprise Ehcache cluster events API provides access to Terracotta cluster events and cluster topology.

Cluster Topology

The interface net.sf.ehcache.cluster.CacheCluster provides methods for obtaining topology information for a Terracotta cluster. The following table lists these methods.

Method Definition
String getScheme() Returns a scheme name for the cluster information. Currently TERRACOTTA is the only scheme supported. The scheme name is used by CacheManager.getCluster() to return cluster information (see Events API Example Code).
Collection getNodes() Returns information on all the nodes in the cluster, including ID, hostname, and IP address.
boolean addTopologyListener(ClusterTopologyListener listener) Adds a cluster-events listener. Returns true if the listener is already active.
boolean removeTopologyListener(ClusterTopologyListener) Removes a cluster-events listener. Returns true if the listener is already inactive.

The interface net.sf.ehcache.cluster.ClusterNode provides methods for obtaining information on specific Terracotta nodes in the cluster. The following table lists these methods.

Method Definition
getId() Returns the unique ID assigned to the node.
getHostname() Return the hostname on which the node is running.
getIp() Return the IP address on which the node is running.

Cluster Events

Terracotta Distributed Ehcache Cluster Event Flow

The interface net.sf.ehcache.cluster.ClusterTopologyListener provides methods for detecting the following cluster events:

  • nodeJoined(ClusterNode) – Indicates that a node has joined the cluster. All nodes except the current node detect this event once per the current node's lifetime. For the current node, see nodeJoined for the Current Node.
  • nodeLeft(ClusterNode) – Indicates that a node has left the cluster. All remote nodes can detect this event once per the current node's lifetime. The current node may fail to detect this event, but if it is still functioning it may automatically rejoin the cluster (as a new node) once the connection to the cluster is reestablished.
  • clusterOnline(ClusterNode) – The current node is able to perform operations within the cluster. This event can be received by the current node only after its nodeJoined event is emitted.
  • clusterOffline(ClusterNode) – The current node is unable to perform operations within the cluster. This event can be received by the current node only after the clusterOnline event.


Events API Example Code

/* Get cluster data.
* Local ehcache.xml exists, with at least one cache configured 
* with Terracotta clustering. */

CacheManager mgr = new CacheManager(); 
CacheCluster cluster = mgr.getCluster("TERRACOTTA");
// Get current nodes
Collection<ClusterNode> nodes = cluster.getNodes();
for(ClusterNode node : nodes) {
   System.out.println(node.getId() + " " + node.getHostname() + " " + node.getIp());
}

// Register listener
cluster.addTopologyListener(new ClusterTopologyListener() {
   public void nodeJoined(ClusterNode node) { System.out.println(node + " joined"); }
   public void nodeLeft(ClusterNode node) { System.out.println(node + " left"); }
   public void clusterOnline(ClusterNode node) { System.out.println(node + " enabled"); }
   public void clusterOffline(ClusterNode node) { System.out.println(node + " disabled"); }
});
NOTE: Programmatic Creation of CacheManager
If a CacheManager instance is created and configured programmatically (without an ehcache.xml or other external configuration resource), getCluster("TERRACOTTA") may return null even if a Terracotta cluster exists. To ensure that cluster information is returned in this case, get a cache that is clustered with Terracotta: // mgr created and configured programmatically. CacheManager mgr = new CacheManager(); // myCache has Terracotta clustering. Cache cache = mgr.getEhcache("myCache"); // A Terracotta client has started, making available cluster information. CacheCluster cluster = mgr.getCluster("TERRACOTTA");

nodeJoined for the Current Node

Since the current node joins the cluster before code adding the topology listener runs, the current node may never receive the nodeJoined event. You can detect if the current node is in the cluster by checking if the cluster is online:

cluster.addTopologyListener(cacheListener);
if(cluster.isClusterOnline()) {
  cacheListener.clusterOnline(cluster.getCurrentNode());
}

Bulk-Load API

The Enterprise Ehcache bulk-load API can optimize bulk-loading of caches by removing the requirement for locks and adding transaction batching. The bulk-load API also allows applications to discover whether a cache is in bulk-load mode and to block based on that mode.

NOTE: The Bulk-Load API and the Configured Consistency Mode
The initial consistency mode of a cache is set by configuration and cannot be changed programmatically (see the attribute "consistency" in <terracotta>). The bulk-load API should be used for temporarily suspending the configured consistency mode to allow for bulk-load operations.

The following table lists the bulk-load API methods that are available in org.terracotta.modules.ehcache.Cache.

Method Definition
public boolean isClusterBulkLoadEnabled() Returns true if a cache is in bulk-load mode (is *not* consistent) throughout the cluster. Returns false if the cache is not in bulk-load mode (*is* consistent) anywhere in the cluster.
public boolean isNodeBulkLoadEnabled() Returns true if a cache is in bulk-load mode (is *not* consistent) on the current node. Returns false if the cache is not in bulk-load mode (*is* consistent) on the current node.
public void setNodeBulkLoadEnabled(boolean) Sets a cache's consistency mode to the configured mode (false) or to bulk load (true) on the local node. There is no operation if the cache is already in the mode specified by setNodeBulkLoadEnabled(). When using this method on a nonstop cache, a multiple of the nonstop cache's timeout value applies. The bulk-load operation must complete within that timeout multiple to prevent the configured nonstop behavior from taking effect. For more information on tuning nonstop timeouts, see Tuning Nonstop Timeouts and Behaviors.
public void waitUntilBulkLoadComplete() Waits until a cache is consistent before returning. Changes are automatically batched and the cache is updated throughout the cluster. Returns immediately if a cache is consistent throughout the cluster.

Note the following about using bulk-load mode:

  • Consistency cannot be guaranteed because isClusterBulkLoadEnabled() can return false in one node just before another node calls setNodeBulkLoadEnabled(true) on the same cache. Understanding exactly how your application uses the bulk-load API is crucial to effectively managing the integrity of cached data.
  • If a cache is not consistent, any ObjectNotFound exceptions that may occur are logged.
  • get() methods that fail with ObjectNotFound return null.
  • Eviction is independent of consistency mode. Any configured or manually executed eviction proceeds unaffected by a cache's consistency mode.

Bulk-Load API Example Code

The following example code shows how a clustered application with Enterprise Ehcache can use the bulk-load API to optimize a bulk-load operation:

import net.sf.ehcache.Cache;

public class MyBulkLoader {
  CacheManager cacheManager = new CacheManager();  // Assumes local ehcache.xml.
  Cache cache = cacheManager.getEhcache("myCache"); // myCache defined in ehcache.xml.
  cache.setNodeBulkLoadEnabled(true); // myCache is now in bulk mode.

  // Load data into myCache...

  // Done, now set myCache back to its configured consistency mode.
  cache.setNodeBulkLoadEnabled(false); 
}

On another node, application code that intends to touch myCache can run or wait, based on whether myCache is consistent or not:

...
  if (!cache.isClusterBulkLoadEnabled()) {

   // Do some work.
   }

  else {

    cache.waitUntilBulkLoadComplete()
    // Do the work when waitUntilBulkLoadComplete() returns.
}
...

Waiting may not be necessary if the code can handle potentially stale data:

...
  if (!cache.isClusterBulkLoadEnabled()) {

   // Do some work.
   }

  else {

   // Do some work knowing that data in myCache may be stale.

   }
...

Unlocked Reads for Consistent Caches (UnlockedReadsView)

Certain environments require consistent cached data while also needing to provide optimized reads of that data. For example, a financial application may need to display account data as a result of a large number of requests from web clients. The performance impact of these requests can be reduced by allowing unlocked reads of an otherwise locked cache.

In cases where there is tolerance for getting potentially stale data, an unlocked (inconsistent) reads view can be created for Cache types using the UnlockedReadsView decorator. UnlockedReadsView requires Ehcache 2.1 or higher. The underlying cache must have Terracotta clustering and use the strong consistency mode. For example, the following cache can be decorated with UnlockedReadsView:

<cache name="myCache"
     maxElementsInMemory="500"
     eternal="false">
   <persistence strategy="distributed"/>
   <terracotta clustered="true" consistency="strong" />
</cache>

You can create an unlocked view of myCache programmatically:

Cache cache = cacheManager.getEhcache("myCache");
UnlockedReadsView unlockedReadsView = new UnlockedReadsView(cache, "myUnlockedCache");

The following table lists the API methods available with the decorator net.sf.ehcache.constructs.unlockedreadsview.UnlockedReadsView.

Method Definition
public String getName() Returns the name of the unlocked cache view.
public Element get(final Object key) `public Element get(final Serializable key) Returns the data under the given key. Returns null if data has expired.
public Element getQuiet(final Object key) `public Element getQuiet(final Serializable key) Returns the data under the given key without updating cache statistics. Returns null if data has expired.

UnlockedReadsView and Data Freshness

By default, caches have the following attributes set as shown:

<cache ... copyOnRead="true" ... >
...
  <terracotta ... consistency="strong"  ... />
...
</cache>

Default settings are designed to make distributed caches more efficient and consistent in most use cases.

Explicit Locking

The explicit locking methods for Enterprise Ehcache provide simple key-based locking that preserves concurrency while also imposing cluster-wide consistency. If certain operations on cache elements must be locked, use the explicit locking methods available in the Cache type.

The explicit locking methods are listed in the following table:

public void acquireReadLockOnKey(Object key) Set a read lock on the element specified by the argument (key).
public void acquireWriteLockOnKey(Object key) Set a write lock on the element specified by the argument (key).
public void releaseReadLockOnKey(Object key) Remove a read lock from the element specified by the argument (key).
public void releaseReadLockOnKey(Object key) Remove a write lock from the element specified by the argument (key).
public boolean isReadLockedByCurrentThread(Object key) Returns true if the current thread holds a read lock on the element specified by the argument (key).
public boolean isWriteLockedByCurrentThread(Object key) Returns true if the current thread holds a write lock on the element specified by the argument (key).

The following example shows how to use explicit locking methods:

String key1 = "123";
Foo val1 = new Foo();
cache.acquireWriteLockOnKey(key1);
try {
      cache.put(new Element(key1, val1));
   } finally {
      cache.releaseWriteLockOnKey(key1);
   }

// Now safely read val1.
cache.acquireReadLockOnKey(key1);
try {
     Object cachedVal1 = cache.get(key1).getValue();
   } finally {
     cache.releaseReadLockOnKey(key);
   }

For locking available through the Terracotta Toolkit API, see Locks.

Configuration Using the Fluent Interface

You can configure clustered CacheManagers and caches using the fluent interface as follows:

...
Configuration configuration =
new Configuration().terracotta(newTerracottaClientConfiguration()
  .url("localhost:9510"))
              // == <terracottaConfig url="localhost:9510 />
  .defaultCache(new CacheConfiguration("defaultCache", 100))
              // == <defaultCache maxElementsInMemory="100" ... />
  .cache(new CacheConfiguration("example", 100)
             // == <cache name="example" maxElementsInMemory="100" ... />
  .timeToIdleSeconds(5)
  .timeToLiveSeconds(120)
            // added these TTI and TTL attributes to the cache "example"
  .terracotta(new TerracottaConfiguration()));
           // added <terracotta /> element in the cache "example"

// Pass the configuration to the CacheManager.
this.cacheManager = new CacheManager(configuration);
...

Write-Behind Queue in Enterprise Ehcache

If your application uses the write-behind API with Ehcache and you cluster Ehcache with Terracotta, the write-behind queue automatically becomes a clustered write-behind queue. The clustered write-behind queue features the following characteristics:

  • Atomic – Put and remove operations are guaranteed to succeed or fail. Partial completion of transactions cannot occur.
  • Distributable – Work is distributable among nodes in the cluster.
  • Durable – Terracotta clustering guarantees that a lost node does not result in lost data. Terracotta servers automatically ensure that another node receives the queued data belonging to the lost node.
  • Performance enhancement – Asynchronous writes reduce the load on databases.

The write-behind queue is enabled for a cache with the <cacheWriter /> element. For example:

<cache name="myCache" eternal="false" maxElementsInMemory="1000">
   <persistence strategy="none"/>
   <cacheWriter writeMode="write-behind" 
                maxWriteDelay="8" 
        rateLimitPerSecond="5"
                writeCoalescing="true" 
        writeBatching="true" 
        writeBatchSize="20"
            writeBehindMaxQueueSize="500" 
        retryAttempts="2" 
        retryAttemptDelaySeconds="2">
       <cacheWriterFactory class="com.company.MyCacheWriterFactory"
                properties="just.some.property=test; another.property=test2"      
            propertySeparator=";"/>
   </cacheWriter>
</cache>

Values for <cacheWriter /> attributes can also be set programmatically. For example, the value for writeBehindMaxQueueSize, which sets the maximum number of pending writes (the maximum number of elements that can be waiting in the queue for processing), can be set with net.sf.ehcache.config.CacheWriterConfiguration.setWriteBehindMaxQueueSize().

See the Ehcache documentation for more information on the write-behind API and on using synchronous write-through caching.