Angéla Olasz
Department of Geoinformation, Institute
of Geodesy, Cartography and Remote Sensing (FÖMI),5. Bosnyák sqr.
Budapest, Hungary, olasz.angela@fomi.hu,
Binh Nguyen Thai
Department of Cartography and Geoinformatics, Eötvös Loránd University (ELTE), 1/A Pázmány Péter sétány,
Budapest, Hungary, ntb@inf.elte.hu
Corresponding
Author
Angéla Olasz
olasz.angela@fomi.hu
PeerJ Preprints |
https://doi.org/10.7287/peerj.preprints.2226v1
| CC BY 4.0 Open Access | rec: 4 Jul 2016, publ: 4 Jul 2016
Geospatial Big Data processing in an open source distributed computing environment
Abstract
In recent years, distributed computing has reached many areas of computer science including geographic and
remote sensing information systems. However, distributed data processing solutions have primarily been focused
on processing simple structured documents, rather than complex geospatial data. Hence, migrating current
algorithms and data management to a distributed processing environment may require a great deal of effort. In
data
processing, different aspects are to be considered such as speed, precision or timeliness. All depending on
data types and processing methods. Available data volume and variety evolving as never before which instantly
exceeding the capabilities of traditional algorithm performance and hardware environment in the
aspect of data
storage, management and computation. Augmented efficiency is required to exploit the available information
derived from Geospatial Big Data. Most of the current distributed computing frameworks have important
limitations on transparent and flexible control on processing (and/or storage) nodes. Hence, this paper presents a
prototype for distribution (“tiling”), aggregation (“stitching”) and processing of Big Geospatial Data focusing the
distribution and processing of raster data type. Furthermore, we introduce an own data
and metadata catalogue
enables to store the “lifecycle” of datasets, accessible for users and processes. The
data distribution framework
has no limitations on programming environment and can execute scripts (and workflows) written in different
language (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data allowing full
control of data distribution and processing. In this paper, the IQLib concept
(https://github.com/posseidon/IQLib/) and background of practical realization as a prototype is presented,
formulated within the IQmulus EU FP7 research and development project (http://www.iqmulus.eu). Further
investigations on algorithmic and implementation details are in focus for the oral presentation.