Storage and
processing
background
raster
Wide column store, Distributed file
system,
Array database/Key-value store, RDD, wide
column store
OB-RDBMS with
extension to
raster or traditional file based
image storage processing software
vector
Relational DBMS, Wide column store
Distributed file system, relational DBMS
complemented with Spatial extensions, or
wide column store and key-value store with
GIS functions
OB-RDBMS
point
cloud
Key-value store
Key-value store, RDD
OB-RDBMS with extension to
point cloud storage and processing
or conventional software solutions
text
based
Distributed file system, Document Store
DBMS, Wide-column store,
Distributed file system, Document Store
DBMS, Wide-column store, RDD
conventional GIS processing
applied (often with format
conversion)
Common
solutions
raster
Apache Accumulo, Cloudera
Rasdaman, SciDB, GeoTrellis, GeoMesa,
Geowave operating on the top of different
DB-engines
Grass GIS, Saga GIS, Orfeo,
OSSIM, gvSIG, QGIS, PostGIS
Raster etc.
vector
Cassandra, HBase,Distributed file system Apache Hadoop, Hive, HBase, Accumulo,
MongoDB, Neo4j with
extension for spatial
functions and existing libraries (e.g.MapR)
PostGIS, SpatiaLite, MySQL,
QGIS,
point
cloud
Distributed file system
Apache Spark with extension for spatial
functions (e.g.Spark Lidar)
PostGIS, LasTools, rLiDAR, Geo-
Plus, Grass GIS- LiDAR Tools
text
based
Cassandra, Cloudera, HBase, Neo4j,
CouchDB, MongoDB, Hortonworks,
MillWheel
Apache Storm, S4, Spark, Hive
Desktop software (GPS tracklog
processing, etc.)
Table 1.: Characteristics of Big Data, Geospatial Big Data and Geospatial Data with common solutions
Defining distributed geospatial computing or processing is also a challenge. The Encyclopedia of GIS (Phil
Yang 2008,) defines distributed geospatial computing (DGC) as “geospatial computing that resides on multiple
computers connected through computer networks”. So “geospatial computing communicates through wrapping
applications, such as web server and web browser”. To translate it, distributed geospatial computing is when
geoprocessing is done within a distributed computing environment. In the Handbook of Geoinformatics (2009)
Yu et al. focus on the multi-agent system with ontology to perform distributed processing of geospatial data. The
distributed processing of geospatial data is continuously evolving together with the evaluation of computer
networks. A single milestone we would like to emphasize from the evaluation progress is when Google Earth
was issued in 2004; by reason of it caused a life changing effect on the citizens’ everyday life and made popular
geospatial applications. Furthermore, until nowadays Google’s solutions are leading in the process of massive
dataset along with the development of easy-to-use interface (e.g., Google BigTable) and play an important role
in the open source community developments. Consequently, distributed systems supports heterogeneous network
and infrastructural background, cloud solutions have been developed to exploit the advantages of distributed
systems and made available services for geospatial computing as well.
Method
In previous work we have made a feasibility study on technological and conceptual aspects. The outcome was
presented in our previous paper (Nguyen Thai and Olasz, 2015), where we have described the
architecture of this
demo application as well as processing results on calculating NDVI index using Landsat8 satellite images for the
territory of Hungary. The processing results seemed really convincing, so we have started to design and
implement IQLib framework. This framework should be able store metadata information on our dataset, tracking
partitioned data, their location, partitioning method. It should distribute data to processing nodes as well as
deploying existing processing services on processing nodes and execute them in parallel.
As a result IQLib has three major modules; each module is responsible for a major step in GIS data processing.
The first module is called Data Catalogue, second module is Tiling and Stitching, the last module is called
Distributed Processing module.
Data Catalogue module is responsible for storing metadata corresponding to survey areas. A survey area contains
all the dataset that are logically related to inspection area, regardless of their data format and purpose of usage.
We would like to store all the available, known and useful information on those data for processing.
Tiling and Stitching module does exactly what its name defines. Usually tiling algorithms are performed on raw
datasets before running a specific processing service on given data. Stitching usually
runs after processing
services have successfully done their job. Tiling algorithms usually process raw data, after these tiled data are
distributed across processing nodes by data distribution component. Metadata of tiled dataset are registered into
Data Catalogue. With this step we always know the parents of tiled data.
Distributed Processing module is responsible for running processing services on tiled dataset.
Dostları ilə paylaş: