SAP Data Quality Management, microservices for location data (DQMms) offers cloud-based microservices for address cleansing, geocoding, and reverse geocoding on the SAP Cloud Platform (SCP). It enables embedding address cleansing and enrichment services within any business process or application in order to quickly leverage the value of complete and accurate address data. DQMms key capabilities are:
- Address Cleansing
- Validate addresses per postal authority reference data
- Standardize the way addressed look and are structured
- Correct components of addresses if incorrect
- Enhance addresses with missing attributes
- Assign diagnostic codes that describe why an address was incorrect or what was corrected
- Append latitude and longitude to addresses
- Reverse Geocoding
- Provide addresses given a latitude and longitude
One of the possible integration scenarios is with SAP Data Services (DS). To that extent, a new platform transform, called DQM Microservices, has been added to DS 4.2 SP8 (May 2017) seamlessly integrating DS with DQMms.
Using the built-in DS data quality (DQ) capabilities involves a licensing cost (for the DQ module) and requires additional licensing of reference data (the address and geocoding directories) and management of that data on premise. You can now quickly and easily setup this functionality with a run-time Data Integrator license only.
With a full DS license, you can set up hybrid scenarios, e.g. to better manage overall cost in address validation and cleansing, using the on-premise address directories to process countries where you have large volumes of data and sending other countries to DQMms.
You can find more information about the DS – DQMms integration on sapspot.com in:
- section 5.5.3 DQM Microservices of the SAP Data Services Reference Guide
- section 6 Connecting to SAP Data Quality Management, microservices for location data of the SAP Data Services Supplement for SAP
You can download sample code from the SAP Data Services Blueprints wiki page, find the zip file and documentation under the heading Data Quality Management 4.2 Microservices blueprints.
In this blog, I’ll guide you thru the different steps end-to-end, required for building a DS job implementing a hybrid scenario as indicated above.
With your SCP account
1. Access your SCP Cockpit.
2. Select Services in the menu on the left.
3. Select the Data Quality Services tile.
4. Enable the service.
Create an SAP DQM microservices datastore
Before you can use the DQM Microservices transform, you must create an SAP DQM Microservices datastore first. In DS Designer:
1. In the Datastores tab of the object library, right-click and select New.
2. Give the datastore a name.
3. For the Datastore type, choose SAP DQM Microservices.
4. Enter the Connection URL.
- Return to your SCP Cockpit
- Select Services in the menu on the left.
- Select the Data Quality Services tile.
- Select the Application URL link.
- Copy the Application URL from the pop-up windows displaying the Available Endpoints.
- Paste it into the datastore definition.
5. Enter the Client ID.
- Return to your SCP Cockpit
- In the Data Quality Services Overview, Select the OAuth settings link.
- Select the Register New Client button in the Clients tab
- Enter a name for your client. Select dqmmicro for the Subscription. The Client ID will be generated. Copy and paste into the datastore definition.
- Select Client Credentials for Authorization Grant and enter the Client secret.
6. Enter the Client Secret as specified above.
7. Enter the Access Token URL.
- Return to your SCP Cockpit
- Select the Branding tab in the OAuth Settings pane.
- Copy and paste the Token Endpoint into the datastore definition.
8. Don’t forget to enter the proxy host name and port number when your DS server is behind a corporate firewall.
Create a DS address cleansing dataflow
I have a table with addresses from many countries across the world. Because I am based in Belgium, most of my addresses are from there. Therefore, I have the Belgian address directory installed on my DS server and will use it for cleansing of local addresses. The cost of many of the existing address directories being prohibitive, I will cleanse addresses from other countries with the DQMms service.
Here is an extract of my ADDRESSES table:
Create a DS job with a single dataflow. The table is used as source in that dataflow:
In the Case transform I distinguish between Belgian and other addresses. When I ran the job, I discovered my Chinese addresses weren’t cleansed. That’s because China isn’t supported yet by DQMms. The list of supported countries can be found in the Country Coverage section of the DQM microservices Developer Guide at sapspot.com. So, I decided to cleanse the Chinese addresses using the Global Address directory that is part of a standard DS installation.
This is the Case transform definition:
All Belgian and Chinese records will be routed to the Global_Address_Cleanse transform. All other addresses to Base_DQM_Mircoservices.
The latter has the same format as the DQ transforms: an input, options and output tab. There are only a few options available. Enter the Datastore. Select the Service. You can choose between addresCleanse (covers Geocoding, too) and reverseGeo.
(A subset of) the extended list of typical address cleanse options can be set using one of the Configurations that are defined in the SCP Cockpit. There are a few predefined configurations. And you can define your own. Selecting no configuration (select NONE from the dropdown) gives you many possible output fields with default formatting. No further Settings can be entered in this case.
Note: DQMms requires Country information in 2-character ISO format. So, if your countries are coded differently, or written in full, you may consider to precede the Base_DQM_Mircoservices transform with a Country_ID transform.
I have selected following list of output fields:
The default format for DQMms street and region names is LONG. I have changed the Primary Type Style and Region Style to match that format:
I have selected similar output fields.
Because field names and formats are a bit different, I need a couple of Query transforms before both streams can be merged and all results written out to a single target table. See the next picture for the results (increase your browser’s zoom level if you cannot read!). As you notice, both transforms have done quite a good job.