RoadMap - April 2016

Goals of the Roadmap

We'd like to capture the current state of GeoMesa in terms of major work items, desired features, and crazy new ideas! Feel free to add notes or updates, etc. Please don't delete ideas but make them strickthrough unless its really just not needed. If you do add a new item, try to flush it out with a good description so that its useful to other people who may pick up your idea in the future.

Tooling & Automation

Installation automation- AWS, Cloudera, Hortonworks, general   
Data modeling tooling   
JEMA like workflow tooling

- hosted analytics

- YARN app for load balancing workflows

- parallelization

- data and analytic hosting platform?

Amazon Marketplace?    
Tools for Hbase, Cassandra, Dynamo, etcWe have accumulo and kafka. maybe add some basic tools for the other ones   
Push Button DeploysEverything   

Indexing, Performance, Optimization

Pluggable SFCs   
Interval index? 

start and end times

  • G. Sfakianakis, I. Patlakas, N. Ntarmos and P. Triantafillou, "Interval indexing and querying on key-value cloud stores," Data Engineering (ICDE), 2013 IEEE 29th International Conference on, Brisbane, QLD, 2013, pp. 805-816.
Cost-based Optimization

- summary stats

- vacuum analyze


Move these out into separate line items as we flush them out

- Aggregate/batch simple features in iterator like we do for bin queries
- spatiotemporal compound indexes for all attribute indexes? (this would really only help with attribute equals queries)

- don't double store feature ID (once in row, once in simple feature) - would require changes to deserializationn and might complicate batching simple features

- automatically sample results for WMS queries based on expected result size

    • we have sampling already, for WMS queries sending more results than can be displayed in the map isn't necessary
    • would need to tie into query planning for expected result sizes

- use lazy deserialization in client

    • currently we deserialize the full feature in kvsToFeatures
    • instead we could use the KryoBufferSimpleFeature for lazy evaluation
    • we'd have to extend it so that it was mutable - possibly have some delegated mutable state
Spatial Joins    
Pan and zoom caching    
Configurable z shards

- currently hard-coded at four shards

- number of ranges we compute might need to be based on the shards so we don't create too many

Complex Feature Support    

Integration Efforts

Integration of GeoMesa with new Open Source tools, databases, platforms, etc






Jupyter/iPython/R notebook



Metrics/graphite and predefined dashboards

- streaming metrics

Nifi- flow definitions for canonical setups   
Hive (see Hortonworks visual sql builder)   
Cloud native storage

You can't afford to use AWS for petabytes?

S3 Storage - store binary files, fairly static

Use case: Using weather stuff




Support/Customer Relations/Compliance

Avro WFS MimeType pluginneed to open sourceGEOMESA-840 

What do we need additional Documentation for?

  • HBase?
  • Kafka?
  • Streaming?
  • Converters

Should we open up Confluence? What about documentation that changes or new tutorials?

FAQ pagefor common errors, configuration, etc   
Data defined ageoffNeed to be able to age off data based on some criteria in the SFT or visibilitiesGEOMESA-899 
Query Auditing

Need to be able to audit all queries against a GeoServer WFS/WMS/WPS/REST/ETC

  • Not just geoserver...

Andrew Hulbert

Seapy, Thomas, Chris, Jim

Accumulo 1.7Support for Accumulo 1.7   



- polish

- open source

- modularity

New Features, Capabilities, & Brainstorms

AreaDescription/IdeasPOC/LeadJIRA TicketsImportance/Usefullness (low/medium/high)
Schema Evolutionability to gracefully evolve schemas   

Streaming Dissemination

- quotas, rate limiting, geofencing, filtering

SQLSupport for SQL queries   
Native API

finishing it, almost done.

merge with blob store?

no schema evolution concerns


Blob StoreMore file formats (S3, other image formats, etc)   
Avro/JSON store

Don't transform to SFs, store native avro/json as "blobs" and index them

  • Push down predicates to json/avro as jsonpath,avropath on the server side
Vector Tiles
  • Support the features of vector tiles in geomesa (like skipping features)
  • Decimation...server side support for, etc
Binary File Datastores    
Count Min Sketch
  • hook into writer and batch job api, stored in metadata table
  • hook into query planner
  • use z2,z3?
  • configurable precision
Attribute Bloom filters?
  • Intersection queries
  • push attribute bloom filter into the iterator to do bitwise comparisons.