RoadMap - April 2016

Goals of the Roadmap

We'd like to capture the current state of GeoMesa in terms of major work items, desired features, and crazy new ideas! Feel free to add notes or updates, etc. Please don't delete ideas but make them strickthrough unless its really just not needed. If you do add a new item, try to flush it out with a good description so that its useful to other people who may pick up your idea in the future.

Tooling & Automation

Goal/AreaDescriptionPOC/LeadJIRAStatus
Installation automation- AWS, Cloudera, Hortonworks, general   
Data modeling tooling   
JEMA like workflow tooling

- hosted analytics

- YARN app for load balancing workflows

- parallelization

- data and analytic hosting platform?

   
Amazon Marketplace?    
Tools for Hbase, Cassandra, Dynamo, etcWe have accumulo and kafka. maybe add some basic tools for the other ones   
Push Button DeploysEverything   

Indexing, Performance, Optimization

Goal/AreaDescriptionPOC/LeadJIRAStatus
Pluggable SFCs   
Interval index? 

start and end times

  • G. Sfakianakis, I. Patlakas, N. Ntarmos and P. Triantafillou, "Interval indexing and querying on key-value cloud stores," Data Engineering (ICDE), 2013 IEEE 29th International Conference on, Brisbane, QLD, 2013, pp. 805-816.
   
Cost-based Optimization

- summary stats

- vacuum analyze

  
Performance

Move these out into separate line items as we flush them out

- Aggregate/batch simple features in iterator like we do for bin queries
- spatiotemporal compound indexes for all attribute indexes? (this would really only help with attribute equals queries)

- don't double store feature ID (once in row, once in simple feature) - would require changes to deserializationn and might complicate batching simple features

- automatically sample results for WMS queries based on expected result size

    • we have sampling already, for WMS queries sending more results than can be displayed in the map isn't necessary
    • would need to tie into query planning for expected result sizes

- use lazy deserialization in client

    • currently we deserialize the full feature in kvsToFeatures
    • instead we could use the KryoBufferSimpleFeature for lazy evaluation
    • we'd have to extend it so that it was mutable - possibly have some delegated mutable state
   
Spatial Joins    
Pan and zoom caching    
Configurable z shards

- currently hard-coded at four shards

- number of ranges we compute might need to be based on the shards so we don't create too many

   
Complex Feature Support    

Integration Efforts

Integration of GeoMesa with new Open Source tools, databases, platforms, etc

ProductDescription/IdeasPOC/LeadJIRAStatus

DynamoDB

   

Cassandra

    
Kudu/Impala/Parquest    

Jupyter/iPython/R notebook

    

Graphite

Metrics/graphite and predefined dashboards

- streaming metrics

   
Nifi- flow definitions for canonical setups   
Hive (see Hortonworks visual sql builder)   
Pig    
Cloud native storage

You can't afford to use AWS for petabytes?

S3 Storage - store binary files, fairly static

Use case: Using weather stuff

Openstack?

 

   

Support/Customer Relations/Compliance

ItemDescription/IdeasPOC/LeadJIRAStatus
Avro WFS MimeType pluginneed to open sourceGEOMESA-840 
Documentation

What do we need additional Documentation for?

  • HBase?
  • Kafka?
  • Streaming?
  • Converters

Should we open up Confluence? What about documentation that changes or new tutorials?

   
FAQ pagefor common errors, configuration, etc   
Data defined ageoffNeed to be able to age off data based on some criteria in the SFT or visibilitiesGEOMESA-899 
Query Auditing

Need to be able to audit all queries against a GeoServer WFS/WMS/WPS/REST/ETC

  • Not just geoserver...

Andrew Hulbert

Seapy, Thomas, Chris, Jim

GEOMESA-1173 
Accumulo 1.7Support for Accumulo 1.7   

Visualization

Stealth

- polish

- open source

- modularity

New Features, Capabilities, & Brainstorms

AreaDescription/IdeasPOC/LeadJIRA TicketsImportance/Usefullness (low/medium/high)
Schema Evolutionability to gracefully evolve schemas   
Streaming

Streaming Dissemination

- quotas, rate limiting, geofencing, filtering

   
SQLSupport for SQL queries   
NanoCubes    
Native API

finishing it, almost done.

merge with blob store?

no schema evolution concerns

 

   
Blob StoreMore file formats (S3, other image formats, etc)   
Avro/JSON store

Don't transform to SFs, store native avro/json as "blobs" and index them

  • Push down predicates to json/avro as jsonpath,avropath on the server side
   
Vector Tiles
  • Support the features of vector tiles in geomesa (like skipping features)
  • Decimation...server side support for, etc
   
Binary File Datastores    
Count Min Sketch
  • hook into writer and batch job api, stored in metadata table
  • hook into query planner
  • use z2,z3?
  • configurable precision
   
Attribute Bloom filters?
  • Intersection queries
  • push attribute bloom filter into the iterator to do bitwise comparisons.