DataPump, StatMap’s server-side ETL (Extraction Transformation Loading) software, enables users to pull and push data between different locations, including databases and standalone file structures – meaning that you can output what is needed and in what form it is needed.
It provides the means for the eVO Platform to integrating your back-office system databases with the eVO Spatial Data Repository (SQL Server 2008+, PostGIS, or Oracle 10g+).
DataPump is provided as part of the standard eVO Platform application server installation, enabling the Spatial Data Repository to become your enterprise spatial data warehouse, pulling data from all business systems within your organisation.
The rich and powerful DataPump ETL capabilities include the following:
Using the Earthlight EVO internet client interface, you can publish DataPump ETL workflows to EVO server – either as scheduled routines which run at specified times or run in response to a trigger or event;
Manipulate the structure of data: changing data from one format to another, applying calculations to field values, merging fields, etc.
Change and transform Projections – choose any one of the EPSG defined coordinate reference systems;
Apply pre-built transformations, or design your own: the adaptability and configurability of DataPump enables users to create your own workflows, building from scratch, or to re-use existing transformation workflows for extracting, transforming and loading data from one system – or format – to another.
Consume and translate complex Real-Time data from published data source feeds or publish data from your Spatial Data Repository.
Gazetteers: OS AddressBase & LLPG Management
All varieties of gazetteers, including Open Gazetteers, are fully handled by DataPump – including OS AddressBase and LLPG / NLPG.
DataPump supports the management of all three Ordnance Survey AddressBase products:
- AddressBase Plus™
- AddressBase Premium™
With DataPump, there is no need to worry about maintaining AddressBase and LLPG gazetteers, as this can all be automated to ensure that your eVO gazetteers are all up-to-date. So taking away the need for manual intervention.
Using .csv full and update files from the Ordnance Survey, DataPump loads the raw files directly into the eVO Spatial Data Repository: either SQL Server 2008 (and later), PostGIS 2.0 (and later) or Oracle 10g (and later).
When loading both OS AddressBase and LLPG gazetteers, DataPump creates a geometry field and populates it with the geographical location of the address.
It is very simple and fast to use any of the AddressBase products as your enterprise gazetteer, or as one of a number of gazetteers. eVO Platform products can support one or more gazetteers – each of which published applications can use.
When matched with eVO Platform’s geocoding capabilities, AddressBase and the LLPG becomes a very powerful means of integrating all of your address-based business systems with the eVO Platform.
Single Point of Truth – Spatial Data Warehouse
The server-side DataPump ETL tools can be used to automate data flows between system databases, ensuring that data is synchronised between them and the spatial data repository database – in effect, creating an enterprise spatial data warehouse. In this way, you can provide a central ‘point of truth’, realising the full data warehouse concept – enabling analysis to be undertaken upon cross-organisation business data.
The unified view of spatial data – including all address-based systems, such as social care and educational management systems – provides the concept of creating a repository ripe for undertaking spatial analysis across business data, identifying underlying data trends and patterns.
Open Standard & 3rd Party Applications
Because DataPump reads and writes to open standards, you are able to use the data imported by DataPump into the spatial data repository by 3rd party applications, such as MapInfo Professional, ESRI ArcGIS, and QGIS.
Similarly, exported data types are also open for use by 3rd party systems who can read OGC compliant formats (e.g. .shp, .tab, .kml, etc.)
It is then simple to publish views of the business data, or summary data, to the general public in order to improve communication and enable self-service to reduce FOI requests and the time taken to respond to them.
Import / Export
DataPump provides the ability to both import and export data to both databases and flat file types. For importation of data, it works in one of three modes:
(i) Scheduled – scheduled to run on the application server;
(ii) File Watcher – triggered by events for changes in files within a location on a file server;
(iii) Ad-hoc – run when manually triggered by an administrator(s).
Web Feature Service (WFS)
DataPump consumes published Web Feature Services (WFS), which ensures that your spatial data warehouse is synchronised with externally published datasets – such as the WFS feeds published by public bodies, e.g. Natural England and the British Geological Survey (BGS).
For public sector organisations, DataPump provides you with the ability to consume Full and Change Only Updates from your local land and property gazetteer (LLPG) management software, using the standard DTF exchange formats. eVO Platform products – including Earthlight and Aurora – use the designated gazetteers as the basis for their lightning fast address search and query facilities.
Data Quality Assessment / Quality Control
Maintaining and assessing data quality and controlling what is stored within the Spatial Data Repository is vital to ensuring the integrity of your enterprise data. Open Geospatial Consortium (OGC) standards for geometry and geographical representation are vitally important in ensuring that data can be shared freely between software and systems.
Whilst the sophisticated and powerful data editing and maintenance capabilities of EVO Platform products ensures that data is kept OGC-compliant, data from other sources can contain errors and failures in quality standards required by organisations.
Rule-based Quality Assessment Checks
DataPump ETL provides Quality Assurance and Control report checking capabilities to ensure that the quality of your enterprise data is continually checked and examined. DataPump reports upon records which are not compliant and anomalous with Geometry and business attribute field value ranges which you have set for datasets within your DataPump QC/QA checking routines.
As and when external datasets are imported, the QA/QC routines in DataPump can be triggered to run immediately post importation, or can be scheduled to run at chosen intervals. These generate text reports which can be used to identify and correct the errors inherent in the imported data.
DataPump undertakes all checking and QA/QC for data either within or being loaded into the Spatial Data Repository database. The broad areas of Quality Assurance and Quality Control cover Spatial, Geometry and Attribute value domains.
Rule Enforcement and Topology
To complement the powerful and highly configurable Network Connectivity within Earthlight Galactic, DataPump offers the ability to run tasks which can be set to split lines where the implemented rules state that edges should be joined – via a junction feature – where they cross, irrespective of whether there are coincident vertices or end nodes on each of the edges.
This enables topology for network data sets to be created and then Quality Assessed via automated or manually invoked processes.