Fork me on GitHub


Spatial In-Memory Big data Analytics

Simba is a distributed in-memory spatial analytics engine based on Apache Spark. It extends the Spark SQL engine across the system stack to support rich spatial queries and analytics through both SQL and DataFrame query interfaces. Besides, Simba introduces native indexing support over RDDs in order to develop efficient spatial operators. It also extends Spark SQL's query optimizer with spatial-aware and cost-based optimizations to make the best use of existing indexes and statistics.


Core Features

SQL & DataFrame API

Simba extends the SQL and DatFrame query interfaces of Spark SQL, providing a natural way to express complex spatial analysis queries.

Index over RDDs

Simba supports building native (spatial) indexes over RDDs inside the kernel to achieve superior query performance over large data sets.

Efficient Algorithms

Simba implements efficient algorithms for different spatial operators, which are tailored to its indexing support and underlying Spark engine.

Query Optimizations

Simba introduces spatial and index-aware optimizations to both logical and physical optimizers of Spark SQL, and utilizes a CBO module to select good query plans.


Venue Publication Link
SIGMOD 2016 Simba: Efficient In-Memory Spatial Analytics
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, Minyi Guo
SIGSPATIAL 2016 Simba: Spatial In-Memory Big Data Analysis (Demo Paper)
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Zhongpu Chen, Liang Zhou, Minyi Guo