This site hosts historical documentation. Visit www.terracotta.org for recent product information.
BigMemory uses a Search index that is maintained at the local node. The index is stored under a directory in the DiskStore and is available whether or not persistence is enabled. Any overflow from the on-heap tier of the cache is searched using indexes.
Search operations perform in O(log(n)) time. For tips that can aid performance, see Best Practices.
For caches that are on-heap only, Attributes are extracted during query execution rather than ahead of time, and indexes are not used. Instead, a fast iteration of the cache takes advantage of the fast access to do the equivalent of a table scan for each query. Each element in the cache is only visited once.
On-heap search operations perform in O(n) time. To see performance results, see Maven-based performance test, where an average of representative queries takes 4.6 ms for a 10,000 entry cache, and 427 ms for a 1,000,000 entry cache.
Construct searches by including only the data that is actually required.
includeKeys()
and/or includeAttribute()
if those values are required for your application logic.result.getValue()
is not called in the search results, do not use includeValues()
in the query. includeValues()
and then result.getValue()
, run the query for keys and include cache.get()
for each individual key. Note: includeKeys()
and includeValues()
have lazy deserialization, which means that keys and values are de-serialized only when result.getKey()
or result.getValue()
is called. However, calls to includeKeys()
and includeValues()
do take time, so consider carefully when constructing your queries.
Searchable keys and values are automatically indexed by default. If you will not be including them in your query, turn off automatic indexing with the following:
<cache name="cacheName" ...>
<searchable keys="false" values="false"/>
...
</searchable>
</cache>
Limit the size of the result set. Depending on your use case, you might consider maxResult or an Aggregator:
query.maxResults(int number_of_results)
Sometimes maxResults is useful where the result set is ordered such that the items you want most are included within the maxResults.count()
. For details, see the net.sf.ehcache.search.aggregator
package in the Ehcache Javadoc.Make your search as specific as possible.
Queries with iLike
criteria and fuzzy (wildcard) searches might take longer than more specific queries.
If you are using a wildcard, try making it the trailing part of the string instead of the leading part ("321*"
instead of "*123"
).
TIP: If you want leading wildcard searches, you should create a <searchAttribute>
with the string value reversed in it, so that your query can use the trailing wildcard instead.
When possible, use the query criteria "Between" instead of "LessThan" and "GreaterThan", or "LessThanOrEqual" and "GreaterThanOrEqual". For example, instead of using le(startDate)
and ge(endDate)
, try not(between(startDate,endDate))
.
Index dates as integers. This can save time and can also be faster if you have to do a conversion later on.
Searches of eventually consistent BigMemory Max data sets are fast because queries are executed immediately, without waiting for the commit of pending transactions at the local node. Note: This means that if a thread adds an element into an eventually consistent cache and immediately runs a query to fetch the element, it will not be visible in the search results until the update is published to the server.
Unlike cache operations, which have selectable concurrency control or transactions, queries are asynchronous and Search results are eventually consistent with the caches.
Although indexes are updated synchronously, their state lags slightly behind that of the cache. The only exception is when the updating thread performs a search.
For caches with transactions, an index does not reflect the new state of the cache until commit
has been called.
Unexpected results might occur if:
sum()
, disagree with the same calculation done by redoing the calculation yourself by re-accessing the cache for each key and repeating the calculation.Because the state of the cache can change between search executions, the following is recommended:
BigMemory SQL supports using the presence or absence of null as a search criterion:
select * from searchable where birthDate is null
select * from searchable where birthDate is not null
The Search API supports the same criteria:
myQuery.addCriteria(cache.getAttribute("middle_name").isNull());
The opposite case: require that a value for the attribute must be present:
myQuery.addCriteria(cache.getAttribute("middle_name").notNull());
which is equivalent to:
myQuery.addCriteria(cache.getAttribute("middle_name").isNull().not());
Alternatively, you can call constructors to set up equivalent logic:
Criteria isNull = new IsNull("middle_name");
Criteria notNull = new NotNull("middle_name");