GeoMesa Accumulo Distributed Runtime Tips

This How-To article provides some tips and tricks for working with GeoMesa Accumulo Distributed Runtime.

The key concepts covered are:

  • VFS Contexts
  • Namespaces
  • Upgrading GeoMesa on Accumulo
  • Table-based config and Migration

Classpath Contexts and VFS

Accumulo utilizes the Apache commons VFS (Virtual File System) library to provide server-side runtime loadable code for iterators within tablet servers. The Accumulo Documentation provides an extensive overview of this functionality and is worth a read to really understand it. GeoMesa uses it to install distributed runtime jars.

Drawbacks of using lib/ext

Accumulo previously recommended placing iterators in the lib/ext folder in the Accumulo installation directory but this has been deprecated since 1.5ish. This has major drawbacks such as:

  1. You must restart the tablet servers to upgrade
  2. You cannot run multiple versions of GeoMesa
  3. Upgrades are difficult

These issues are all mitigated by using classpath contexts and VFS. If you are currently using this see the Table based config and migration tips at the end of this doc.

Classpath Contexts are named configuration settings the point to a specific group of jar files to load into the classloader. They are generally defined in the shell:

> accumulo shell -u root -p secret
Shell - Apache Accumulo Interactive Shell
- 
- version: 1.9.2
- instance name: local
- instance id: 901f50c7-e41a-47db-8553-199816ed455
- 
- type 'help' for a list of available commands
- 
root@local> config -s general.vfs.context.classpath.geomesa136=hdfs://localhost:9000/accumulo/classpath/geomesa136/[^.].*.jar
root@local> config
...
site       | general.vfs.context.classpath.geomesa136 ............... | 
system     |    @override ........................................... | hdfs://localhost:9000/accumulo/classpath/geomesa136/[^.].*.jar
...

Note that you should change the VFS tmp directory which defaults to java.io.tmpdir so that OS tmpwatch does not delete the VFS class loading. This is easily done and documented and can be changed in the cluster:

  <property>
    <name>general.vfs.cache.dir</name>
    <value>/opt/accumulo/tmp/accumulo-vfs-cache</value>
  </property>

You can verify this in the shell:

root@local> config
...
default    | general.vfs.cache.dir .................................. | /tmp/accumulo-vfs-cache-hdfs
site       |    @override ........................................... | /opt/accumulo/tmp/accumulo-vfs-cache
...


Namespaces

Namespaces provide logical groupings of tables as well as a great way to configure GeoMesa Accumulo's classpath. When coupled with properly configured VFS classpath contexts, it is easy to upgrade your system between GeoMesa versions or support multiple geomesa versions on the same cluster. It is recommended that namespaces should be named for applications and NOT for geomesa versions since this makes upgrading them and running multiple applications/versions on the same cluster confusing.

To create a namespace simply use the shell:

root@local> createnamespace mycoolproject

Linking Namespaces to GeoMesa Versions

Once you have created a namespace for your application you can link it to a GeoMesa version by assigning a classpath context to the namespace. By default there is no classpath context assigned:

root@local> config -ns mycoolproject
...
default    | table.classpath.context ................................ | 
...

In the above example we created a VFS context named "geomesa136" which which can link to our namespace:

root@local> config -ns mycoolproject -s table.classpath.context=geomesa136

Now we should see it in the config:

root@local> config -ns mycoolproject
...
default    | table.classpath.context ................................ | 
namespace  |    @override ........................................... | geomesa136
...

This means that all tables created in this namespace will have GeoMesa 1.3.6 available to them.

Supporting Multiple Versions

Using the above methods you can easily support multiple versions on the same Accumulo cluster. For instance, to support an application with GeoMesa 2.0.2 alongside 1.3.6 you can create new VFS classpaths and a separate namespace for your second project

root@local> config -s general.vfs.context.classpath.geomesa202=hdfs://localhost:9000/accumulo/classpath/geomesa202/[^.].*.jar
root@local> createnamespace myotherproject
root@local> config -ns myotherproject -s table.classpath.context=geomesa202

Now you have two applications running different versions of GeoMesa and they will not collide.

Upgrading the GeoMesa Version

To upgrade a GeoMesa version for a namespace follow these steps:

  1. Create a new VFS classpath context for the new GeoMesa version
  2. Stop ingest. GeoMesa writers must be in sync with the distributed runtime
  3. Remove the namespace classpath context for your namespace
  4. Assign the new VFS classpath context to your namespace
  5. Restart ingest with the new version.
  6. Upgrade readers (you can optionally stop them but it won't break the system it will only throw errors)

For example, lets upgrade our GeoMesa 1.3.6 to 1.3.7 for the mycoolproject namespace:

root@local> config -s general.vfs.context.classpath.geomesa137=hdfs://localhost:9000/accumulo/classpath/geomesa137/[^.].*.jar
root@local> config -ns mycoolproject -d table.classpath.context
root@local> config -ns mycoolproject -s table.classpath.context=geomesa137

Using the Tools

GeoMesa Accumulo tools ships with a script to assist you in your initial setup. It's in the bin directory.

Key Takeaways, Do's and Don'ts

Note that you can also create contexts like app1-v1, app1-v2, app1-v3. The real key for this is to not re-use a single context over and over but to put new jars in a new context and then lazily assign it to a namespace or set of tables


  1. Use VFS classloading over lib/ext. Always.
  2. Generally name your namespaces and GeoMesa catalogs after projects/applications
  3. Include some sort of one-up version number at the end of your classpath contexts (e.g. context1, context2 or geomesa137, geomesa 138, etc)
  4. Accumulo can support multiple GeoMesa versions on a single cloud install
    1. GeoServer cannot. Run multiple GeoServers....one per GeoMesa version.
  5. Create a classpath context for each new GeoMesa version. This makes it available to others on the system and allows it to be used for multiple applications.
  6. When upgrading, create a new context and then update the namespace's configuration afterwards
    1. DO NOT change the jar files in HDFS or add additional jar files to an existing context.
      1. This can lead to inconsistencies within the VFS classloader and force you to restart the tservers.

Table Based Configuration and Migration Tips

Table based configuration

Tables can also be configured to use the VFS classloaders. Namespaces are recommended due to the ease of upgrading as well as the ability to assign permissions to multiple users based on namespaces. For example:

config -t project_z3_table -s table.classpath.context=geomesa137
config -t project_z2_table -s table.classpath.context=geomesa137
config -t project_attr_table -s table.classpath.context=geomesa137
config -t project_id_table -s table.classpath.context=geomesa137
config -t project_stats -s table.classpath.context=geomesa137

As you can see, configuring the iterator and upgrading it must be done per table instead of per namespace which can be onerous for large numbers of schemas. If you go this route it is recommended to look at scripting the accumulo shell commands for updating.

Migration from using lib/ext

If you were previously using the lib/ext folder for your iterators you can still migrate to using VFS classpaths for existing tables by doing the following:

  1. Stop ingest and services if possible...you may need to restart the tservers once
  2. Remove the distributed runtime from lib/ext
  3. Create a context named your version number (e.g. config -s general.vfs.context.classpath.geomesa137=hdfs://localhost:9000/accumulo/classpath/geomesa137/[^.].*.jar )
  4. Configure your tables for the classpath context as listed above
  5. Try restarting...

You may be able to start using VFS without removing the iterator from lib/ext by using this setting:

general.vfs.context.classpath.geomesa137.delegation=post

The description of this option states:

The default behavior follows the Java ClassLoader contract in that classes, if they exists, are loaded from the parent classloader first. You can override this behavior by delegating to the parent classloader after looking in this classloader first

No Downtime GeoServer upgrade with Scan Time Config

Idea: Use the scan time iterator config for blue/green deploys of geoserver:

  1. Create a VFS context with the new iterator
  2. Modify GeoMesa to support a data store property to change the VFS classpath context on every scan
  3. Create a new GeoServer with new client lib and set the property on teh datastore
  4. See if the upgrade works...if it does then you can switch over to the new geoservers live with no