RoadMap - April 2016

RoadMap - April 2016

Goals of the Roadmap

We'd like to capture the current state of GeoMesa in terms of major work items, desired features, and crazy new ideas! Feel free to add notes or updates, etc. Please don't delete ideas but make them strickthrough unless its really just not needed. If you do add a new item, try to flush it out with a good description so that its useful to other people who may pick up your idea in the future.

Tooling & Automation

Goal/Area

Description

POC/Lead

JIRA

Status

Goal/Area

Description

POC/Lead

JIRA

Status

Installation automation

- AWS, Cloudera, Hortonworks, general

 

 

 

Data modeling tooling

  • Auto detect SFTs from data via command line tools GEOMESA-1139, GEOMESA-1195

  • Data Modeling UI/Studio

  • Schema Evolution

 

 

 

JEMA like workflow tooling

- hosted analytics

- YARN app for load balancing workflows

- parallelization

- data and analytic hosting platform?

 

 

 

Amazon Marketplace?

 

 

 

 

Tools for Hbase, Cassandra, Dynamo, etc

We have accumulo and kafka. maybe add some basic tools for the other ones

 

 

 

Push Button Deploys

Everything

 

 

 

Indexing, Performance, Optimization

Goal/Area

Description

POC/Lead

JIRA

Status

Goal/Area

Description

POC/Lead

JIRA

Status

Pluggable SFCs

S2 evaluation, Hilbert, etc. The Nasa thing @Andrew Annex (Deactivated) please put it in here, metro hash?

http://healpix.sourceforge.net/

http://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/

 

 

 

 

Interval index? 

start and end times

  • G. Sfakianakis, I. Patlakas, N. Ntarmos and P. Triantafillou, "Interval indexing and querying on key-value cloud stores," Data Engineering (ICDE), 2013 IEEE 29th International Conference on, Brisbane, QLD, 2013, pp. 805-816.

 

 

 

Cost-based Optimization

- summary stats

- vacuum analyze

@Former user (Deleted)

 

 

Performance

Move these out into separate line items as we flush them out

- Aggregate/batch simple features in iterator like we do for bin queries
- spatiotemporal compound indexes for all attribute indexes? (this would really only help with attribute equals queries)

- don't double store feature ID (once in row, once in simple feature) - would require changes to deserializationn and might complicate batching simple features

- automatically sample results for WMS queries based on expected result size

- use lazy deserialization in client

 

 

 

Spatial Joins

 

 

 

 

Pan and zoom caching

 

 

 

 

Configurable z shards

- currently hard-coded at four shards

- number of ranges we compute might need to be based on the shards so we don't create too many

 

 

 

Complex Feature Support

 

 

 

 

Integration Efforts

Integration of GeoMesa with new Open Source tools, databases, platforms, etc

Product

Description/Ideas

POC/Lead

JIRA

Status

Product

Description/Ideas

POC/Lead

JIRA

Status

DynamoDB

 

@Andrew Annex (Deactivated)

 

 

Cassandra

 

 

 

 

Kudu/Impala/Parquest

 

 

 

 

Jupyter/iPython/R notebook

 

 

 

 

Graphite

Metrics/graphite and predefined dashboards

- streaming metrics

 

 

 

Nifi

- flow definitions for canonical setups

 

 

 

Hive

 (see Hortonworks visual sql builder)

 

 

 

Pig

 

 

 

 

Cloud native storage

You can't afford to use AWS for petabytes?

S3 Storage - store binary files, fairly static

Use case: Using weather stuff

Openstack?

 

 

 

 

Support/Customer Relations/Compliance

Item

Description/Ideas

POC/Lead

JIRA

Status

Item

Description/Ideas

POC/Lead

JIRA

Status

Avro WFS MimeType plugin

need to open source

@James Hughes

GEOMESA-840

 

Documentation

What do we need additional Documentation for?

  • HBase?

  • Kafka?

  • Streaming?

  • Converters

Should we open up Confluence? What about documentation that changes or new tutorials?

 

 

 

FAQ page

for common errors, configuration, etc

 

 

 

Data defined ageoff

Need to be able to age off data based on some criteria in the SFT or visibilities

@Andrew Hulbert

GEOMESA-899

 

Query Auditing

Need to be able to audit all queries against a GeoServer WFS/WMS/WPS/REST/ETC

  • Not just geoserver...

@Andrew Hulbert

Seapy, Thomas, Chris, Jim

GEOMESA-1173

 

Accumulo 1.7

Support for Accumulo 1.7

 

 

 

Visualization

Stealth

- polish

- open source

- modularity

New Features, Capabilities, & Brainstorms

Area

Description/Ideas

POC/Lead

JIRA Tickets

Importance/Usefullness (low/medium/high)

Area

Description/Ideas

POC/Lead

JIRA Tickets

Importance/Usefullness (low/medium/high)

Schema Evolution

ability to gracefully evolve schemas

 

 

 

Streaming

Streaming Dissemination

- quotas, rate limiting, geofencing, filtering

 

 

 

SQL

Support for SQL queries

 

 

 

NanoCubes

 

 

 

 

Native API

finishing it, almost done.

merge with blob store?

no schema evolution concerns

 

 

 

 

Blob Store

More file formats (S3, other image formats, etc)

 

 

 

Avro/JSON store

Don't transform to SFs, store native avro/json as "blobs" and index them

  • Push down predicates to json/avro as jsonpath,avropath on the server side

 

 

 

Vector Tiles

  • Support the features of vector tiles in geomesa (like skipping features)

  • Decimation...server side support for, etc

 

 

 

Binary File Datastores

 

 

 

 

Count Min Sketch

  • hook into writer and batch job api, stored in metadata table

  • hook into query planner

  • use z2,z3?

  • configurable precision

 

 

 

Attribute Bloom filters?

  • Intersection queries

  • push attribute bloom filter into the iterator to do bitwise comparisons.