This How-To article provides some tips and tricks for working with GeoMesa Accumulo Distributed Runtime.
The key concepts covered are:
Accumulo utilizes the Apache commons VFS (Virtual File System) library to provide server-side runtime loadable code for iterators within tablet servers. The Accumulo Documentation provides an extensive overview of this functionality and is worth a read to really understand it. GeoMesa uses it to install distributed runtime jars.
Accumulo previously recommended placing iterators in the lib/ext folder in the Accumulo installation directory but this has been deprecated since 1.5ish. This has major drawbacks such as:
These issues are all mitigated by using classpath contexts and VFS. If you are currently using this see the Table based config and migration tips at the end of this doc. |
Classpath Contexts are named configuration settings the point to a specific group of jar files to load into the classloader. They are generally defined in the shell:
> accumulo shell -u root -p secret Shell - Apache Accumulo Interactive Shell - - version: 1.9.2 - instance name: local - instance id: 901f50c7-e41a-47db-8553-199816ed455 - - type 'help' for a list of available commands - root@local> config -s general.vfs.context.classpath.geomesa136=hdfs://localhost:9000/accumulo/classpath/geomesa136/[^.].*.jar root@local> config ... site | general.vfs.context.classpath.geomesa136 ............... | system | @override ........................................... | hdfs://localhost:9000/accumulo/classpath/geomesa136/[^.].*.jar ... |
Note that you should change the VFS tmp directory which defaults to java.io.tmpdir so that OS tmpwatch does not delete the VFS class loading. This is easily done and documented and can be changed in the cluster:
<property> <name>general.vfs.cache.dir</name> <value>/opt/accumulo/tmp/accumulo-vfs-cache</value> </property> |
You can verify this in the shell:
root@local> config ... default | general.vfs.cache.dir .................................. | /tmp/accumulo-vfs-cache-hdfs site | @override ........................................... | /opt/accumulo/tmp/accumulo-vfs-cache ... |
Namespaces provide logical groupings of tables as well as a great way to configure GeoMesa Accumulo's classpath. When coupled with properly configured VFS classpath contexts, it is easy to upgrade your system between GeoMesa versions or support multiple geomesa versions on the same cluster. It is recommended that namespaces should be named for applications and NOT for geomesa versions since this makes upgrading them and running multiple applications/versions on the same cluster confusing.
To create a namespace simply use the shell:
root@local> createnamespace mycoolproject |
Once you have created a namespace for your application you can link it to a GeoMesa version by assigning a classpath context to the namespace. By default there is no classpath context assigned:
root@local> config -ns mycoolproject ... default | table.classpath.context ................................ | ... |
In the above example we created a VFS context named "geomesa136" which which can link to our namespace:
root@local> config -ns mycoolproject -s table.classpath.context=geomesa136 |
Now we should see it in the config:
root@local> config -ns mycoolproject ... default | table.classpath.context ................................ | namespace | @override ........................................... | geomesa136 ... |
This means that all tables created in this namespace will have GeoMesa 1.3.6 available to them.
Using the above methods you can easily support multiple versions on the same Accumulo cluster. For instance, to support an application with GeoMesa 2.0.2 alongside 1.3.6 you can create new VFS classpaths and a separate namespace for your second project
root@local> config -s general.vfs.context.classpath.geomesa202=hdfs://localhost:9000/accumulo/classpath/geomesa202/[^.].*.jar root@local> createnamespace myotherproject root@local> config -ns myotherproject -s table.classpath.context=geomesa202 |
Now you have two applications running different versions of GeoMesa and they will not collide.
To upgrade a GeoMesa version for a namespace follow these steps:
For example, lets upgrade our GeoMesa 1.3.6 to 1.3.7 for the mycoolproject namespace:
root@local> config -s general.vfs.context.classpath.geomesa137=hdfs://localhost:9000/accumulo/classpath/geomesa137/[^.].*.jar root@local> config -ns mycoolproject -d table.classpath.context root@local> config -ns mycoolproject -s table.classpath.context=geomesa137 |
GeoMesa Accumulo tools ships with a script to assist you in your initial setup. It's in the bin directory.
Note that you can also create contexts like app1-v1, app1-v2, app1-v3. The real key for this is to not re-use a single context over and over but to put new jars in a new context and then lazily assign it to a namespace or set of tables |
Tables can also be configured to use the VFS classloaders. Namespaces are recommended due to the ease of upgrading as well as the ability to assign permissions to multiple users based on namespaces. For example:
config -t project_z3_table -s table.classpath.context=geomesa137 config -t project_z2_table -s table.classpath.context=geomesa137 config -t project_attr_table -s table.classpath.context=geomesa137 config -t project_id_table -s table.classpath.context=geomesa137 config -t project_stats -s table.classpath.context=geomesa137 |
As you can see, configuring the iterator and upgrading it must be done per table instead of per namespace which can be onerous for large numbers of schemas. If you go this route it is recommended to look at scripting the accumulo shell commands for updating.
If you were previously using the lib/ext folder for your iterators you can still migrate to using VFS classpaths for existing tables by doing the following:
You may be able to start using VFS without removing the iterator from lib/ext by using this setting:
The description of this option states:
|
Idea: Use the scan time iterator config for blue/green deploys of geoserver: