Query Interceptor Design

Problem

Often, attribute fields are related to each other. For example, one field may be a derivative of another, or there may be a one-to-many relationship with a threading key.

To provide efficient querying, each field could be attribute indexed. However, this slows down ingest and takes up disk space. Instead, queries against one field could be translated into queries against another field. Thus, it would be possible to query multiple fields by only indexing one field.

Proposed Solution

Provide a pluggable mechanism to re-write queries. For example, if two fields are known to be related in a certain way, a query against one field could be re-written against the second field (if one of them is indexed).

Provide some 'system level' re-writers, that can be configured without any code - for example, a 'one-to-many' re-writer that transforms "foo = 'foo'" into "bar = 'bar1' OR bar = 'bar2'", based on some kind of database lookup. Possibly just provide ones that use the default back-end (i.e. accumulo, hbase).

Re-writers will need to be plugged into the ingest pipeline as well - they need a chance to i.e. store the lookups from one field to another.

Proposed Implementation

Provide the following API:

trait QueryInterceptor extends Closeable {

  /**
    * Called exactly once after the interceptor is instantiated
    */
  def init(ds: GeoMesaDataStore[_]): Unit

  /**
    * Modifies the query in place
    */
  def rewrite(query: Query): Unit

  /**
    * Option to track features being added or removed
    */
  def updater(sft: SimpleFeatureType): Option[QueryStateUpdater]
}

trait QueryStateUpdater extends Closeable with Flushable {
  def add(sf: SimpleFeature): Unit
  def remove(sf: SimpleFeature): Unit
}

Query interceptors would be configured as class names in SimpleFeatureType user data. They would be instantiated with a hook to the datastore, which provides access to the backing database (hbase, accumulo, etc). Generic implementations may be possible using the geomesa IndexAdapter. Any additional configuration could generally be done with typesafe config, using ConfigFactory.load() to pick up classpath configs, which would avoid having to specify complex configs in the user data.

Hooks would be added in QueryRunner, which is the single point that all queries pass through. Any interceptors for a type would be passed the query, with a chance to mutate it in place.

During writes, an updater would be created for interceptors that require one. The updaters would be added in our feature writer, and probably need to be in various m/r classes as well (this may be a good chance to consolidate that type of writing code into a single place).