SAP HANA Text Analysis, SAP HANA

Integrating Twitter with SAP HANA for Text Analysis

Just thought to share my learning experience on Streaming of Tweets using Java & inserting all in HANA for further Text/Token Analysis, Idea is simple and straight forward how you can leverage the capabilities/Power of inbuilt capability of text analysis of SAP HANA on some real-time information & I found twitter is better place for collecting some real-time information for understanding the text analysis in better way. So below is a short implementation which I wanted to share with everyone. This has already been implemented by multiple people/organization hence I am just adding my experience/learning & challenges here. So, at the instance you think for implementing text analysis technology – Please keep in mind following things.

  • In which language, you are going to write the code. it is Java in my case you can use Python as well.
  • How will you get real time data (Do you have access to any API which can provide you some real-time information) Answer is Twitter API’s are ones to provide all the real-time information which you are looking for? e.g. – You can perform analysis on Political tweets, Sports Tweets, Technological Tweets & Geo Tweets.

I opt for analyzing tweets related to SAP HANA (#SAPHANA, #IoT, #SAP) So these Hash tags will be used later for fetching tweets using Twitter API.

  • Basic Understanding of Text Analysis Capability of SAP HANA
  • Eclipse IDE Installed if not then download the latest version using below –http://www.eclipse.org/downloads/eclipse-packages/
  • Create developer account at Twitter https://dev.twitter.com/

SAP HANA Text Analysis, SAP HANA

You will be navigated to developer page at Twitter. Click on create New App & fill the below required information.

 

SAP HANA Text Analysis, SAP HANA

SAP HANA Text Analysis, SAP HANA

Create your Twitter Application

SAP HANA Text Analysis, SAP HANA

Next step is to keep all the security tokens with you for Consuming Twitter API’s, below is a Snap of the Security tokens of mine.

SAP HANA Text Analysis, SAP HANA

SAP HANA Text Analysis, SAP HANA

Now Click on create Access Token

SAP HANA Text Analysis, SAP HANA

Your Access token will be generated Successfully.

Download latest version of Twitter API for using it into your project. please click on below to Download latest version of Twitter 4j.

http://twitter4j.org/en/index.html.

below is a snap of latest Twitter4j API –

SAP HANA Text Analysis, SAP HANA

Twitter API libraries will be used later.

Install the SAP HANA Client if not installed, Get it from SAP Service Market place which would be having the jdbc library for accessing the HANA from java.

Go to Service Marketplace -> Software Downloads -> Installation and Upgrades – > Browse Our Download Catalog -> SAP in Memory (SAP HANA) -> SAP HANA Platform and download the HANA Client

below is a snap of HDB Client, Important thing to notice is – it must have JDBC inside this.

SAP HANA Text Analysis, SAP HANA

Install HDB Client on your machine(32 or 64 Bit check this before download)

Once done with above activities open eclipse IDE then open java perspective in package explorer -> right click here -> Import

SAP HANA Text Analysis, SAP HANA

SAP HANA Text Analysis, SAP HANA

SAP HANA Text Analysis, SAP HANA

SAP HANA Text Analysis, SAP HANA

Click Finish -> You project will be imported into package explore

SAP HANA Text Analysis, SAP HANA

Switch to HANA Development perspective for creating table which will store the Tweets information. execute the below commands of SAP HANA SQL Console.

SET SCHEMA “<YOUR_SCHEMA>”;

CREATE COLUMN TABLE TWEETS(

“ID” INTEGER NOT NULL,

“USER_NAME” NVARCHAR(100),

“CREATED_AT” DATE,

“TEXT” NVARCHAR (140),

“HASH_TAGS” NVARCHAR (100),

PRIMARY KEY(“ID”));

SAP HANA Text Analysis, SAP HANA

After creating the table in HANA, switch to configuration folder – change the config for HANA & Twitter connectivity. Open Java Configuration file & Perform the changes connecting the HANA Server.

SAP HANA Text Analysis, SAP HANA

1- Check if there is any proxy then make the proxy variable true & enter proxy details

2- Hana Database Host, Port, User, Schema & Password

3- Twitter tokens received above including Consumer keys & Secret keys.

4- Search Term What you want to fetch from Twitter like #SAP or #SAPHANA

After updating above details

Open the TwitterConnection.java & execute the file –

Test Connection to Twitter

SAP HANA Text Analysis, SAP HANA

Test Connection to SAP HANA

Open theHDBConnection.java & execute the file –

SAP HANA Text Analysis, SAP HANA

Before executing the TwitterSearch.java file, Configure TwitterApi properly then only you would be able to execute the Application else you will encounter errors like the Source of this class is not found hence i thought to mention how to configure source path for Twitter Api’s.

Right Click on Project.

SAP HANA Text Analysis, SAP HANA

Click on Configure build path -> Click on Java build Path -> Add External Jars -> Go to libraries folder of Twitter4j -> Select All Jars.

make sure All jars are available in libraries folder.

SAP HANA Text Analysis, SAP HANA

SAP HANA Text Analysis, SAP HANA

Click on Apply this will make all the classes available for your application. you can see in reference library folder all the Jars are available.

>TweetDAO.java will be used for inserting the tweets data into HANA System, here SQL Statement is prepared first & then executed.

SAP HANA Text Analysis, SAP HANA

After completing all the config & code now it’s time to invoke the twitter API for fetching the data from Twitter & insert the Tweets into HANA System. Execute the TwitterSearch.Java file.

SAP HANA Text Analysis, SAP HANA

Go to HANA System & and put a select on “Tweets” table

SAP HANA Text Analysis, SAP HANA

Now Leverage the text analysis capabilities of SAP HANA create Full Text Index on Tweets table here is the Syntax for that.

Create FullText Index “TWEETS_FTI” On “TWEETS”(“TEXT”)

TEXT ANALYSIS ON CONFIGURATION ‘EXTRACTION_CORE’;

As you execute the above command a FullText Index will be created on this table & text analysis will be on the Data of the table & additionally a $TA_TWEETS_FTI table will be created this table would be containing the token information for the Tweets data table.

SAP HANA Text Analysis, SAP HANA

Below is the structure of table $TA_TWEETS_FTI –

SAP HANA Text Analysis, SAP HANA

Now you can preview the data of $TA_TWEETS_FTI for getting the better understanding of the text analysis by SAP HANA.

SAP HANA Text Analysis, SAP HANA

So here is the Analysis done by SAP HANA Text Analysis capability –

SAP HANA Text Analysis, SAP HANA

In Above image you can see Search term #SAPHANA is highlighted & got the highest count in table now you can build your data model based on this $TA_TWEETS_FTI table & can put different where clause for analysis like Combination of tweets of SAP HANA & IOT or SAP HANA & Cloud etc.

SAP HANA Text Analysis, SAP HANA

Leave a Reply

Your email address will not be published. Required fields are marked *