The Curious case of querying large raster datasets on the fly via an internet facing portal.
Geospatial data is quite different from any other data you can work with it being difficult to understand and manipulate, one of our clients came up with a unique request to work with raster datasets and find a particular value of a pixel and statistical information on the fly which was also understandable to a no GIS user. Although, this may seem simple enough process but let me explain how it is quite different on the surface and what’s beneath. But first of all, let’s get into what are different Geospatial datasets.
Geospatial data can be broadly divided into two parts:-
Vector Data: Vector data represents geographic features as points, lines, and polygons. It includes information about the coordinates, shape, and attributes of these features. Examples of vector data include road networks, boundaries, buildings, and vegetation boundaries. Vector data is often used for detailed spatial analysis and precise representation of features.
Raster Data: Raster data represents geographic information as a grid of cells or pixels, where each cell contains a value. It is commonly used to represent continuous phenomena, such as elevation, satellite imagery, and climate data. Raster data is structured as a grid, and each cell corresponds to a specific location and carries information about the attribute being represented.
Since in GISKernel, we have an expertise in working with different type of Geospatial data one of our client came up with a unique request of processing Raster data quite smoothly and fast, as Raster data is continuous and quite large on the fly information of a particular cell/ grid have to either stored as a cache or takes time to show up. Furthermore, the client which we were working on doesn’t have any mapping interface to process the request as they wanted a particular cells grid/ information either as a pixel value or a band value.
As Raster are huge files, finding a particular pixel value on a non-mapping interface is like finding a needle in the stack. Also, Raster’s and databases dont fit in well with each other as the data tables are to be created with raster datasets which eventually leads to change in raster datasets data creation, furthermore we needed to filter huge amount of spatial data which tens of thousands of people would be able to query on the fly via just uploading the raster files to the portal.
To tackle all the abovementioned scenarios, we curated a custom application which consisted of:-
Using AWS S3 buckets for Data storage:- We chose S3 buckets for Data Storage for continuous access over the cloud and availability of data, as different users can upload, and the application can access the data everywhere once connected with internet.
Backend data management via POSTGRES SQL:- We chose POSTGRES as it was open source and free to use. Also, it is malleable with geospatial data types, these included running fast spatial queries either from a database or an API with using raster2pgsql extension in backend to allow raster datasets to sit in and be allowed to query.
GDAL Libraries:- To properly use this tool, we also used GDAL libraries which improved raster data handling massively.
Django Based API application:- We used this API to provide a seamless experience for our users once they have the access to the application hosted over the internet. This frontend assisted our users in querying out all the requirements from getting a particular pixel value in a raster dataset to getting a mean median and mode of an extent which massively reduced their dependency on directly accessing the datasets and then applying queries.
Architecture:-
Application Overview:-
The simple application structure of an odd task greatly reduced the stress on database maintenance, user accessibility, and non-GIS users accessing spatial datasets. As a GIS folk we understand how rasters work and a particular value from a grid is easy for us to load up on a map and imagine, but the effectiveness of this tool can help any non GIS user to extract data from a raster dataset without the hassle of loading, downloading and deleting a GIS software.