Welcome Guest, you are in: Login

QVSource - The QlikView API Connector (Docs)

»

RSS RSS

Navigation

For Everyone
For QlikView Users
For Advanced Users
For .NET Developers
If you like QVSource please consider showing your interest and support.





Search the wiki
»

Wiki Links


PoweredBy

Sentiment Analysis & Text Analytics Connector For QlikView

RSS
Modified on 2012/05/07 16:09 by chrisb Categorized as connector, social

Usage

The sentiment analyser and text analytics connector allows you to add a sentiment score for text fields in your application. Additionally, some of the APIs we connect to (see list below) also provide other text analytics features such as POS (parts of speech), topic, concept and keyword extraction etc.

We currently connect to 5 different APIs which between them cover sentiment for the following languages:

  • English
  • Dutch
  • French
  • German
  • Portuguese
  • Spanish

These APIs are:

  • Alchemy (sign up for an API key here). Please contact AlchemyAPI for pricing. (Also supports Concepts, Categories, Keywords (with or without sentiment) and Language extraction).

  • Chatterbox (sign up for an API key here). Note this API supports English, Spanish, Portuguese, French and Dutch. For pricing see here.

  • Repustate (sign up for an API key here). Supports English, Spanish, German and French (test interface here). Arabic in development. For pricing see here. (Also supports 'Parts of Speech' (POS) extraction of verbs and adjectives) and extraction of the sentiment of individual parts of the document.

  • Twitter Sentiment. Note that the QVSource connector translates the sentiment scores from this API to -1 (negative), 0 (neutral) or +1 (positive). Note also that this API is only for classifying short 'Tweet length' strings only - long strings will result in an error and a sentiment classification of 0. This API is free to use.


Please see the documentation for the relevant API for further information on how the score is calculated. You should also register with the provider for an API key which you will then need to enter into QVSource.

Please also check out our Semiocast connector for sentiment analysis.

If you have a suggestion for other APIs/algorithms we can include please let us know.

In order to use this connector you should have already loaded a table into your QlikView application containing the text you wish to analyse. In the example below we have loaded a table named 'Timeline' with a text field named 'text' earlier in the load script.

Method 1 - Linking on Text

We then use the following script to loop through each row in the 'Timeline' table and create augment it with a new field named 'sentiment_score' for each text value.


Sub urlEncode(str)

	let str=replace(str, '%', '%25'); // should be first
	let str=replace(str, '#', '%23');
	let str=replace(str, ' ', '%20');
	let str=replace(str, '$', '%24');
	let str=replace(str, '&', '%26');
	let str=replace(str, '+', '%2B');
	let str=replace(str, ',', '%2C');
	let str=replace(str, '/', '%2F');
	let str=replace(str, '\', '%5C');
	let str=replace(str, ':', '%3A');
	let str=replace(str, ';', '%3B');
	let str=replace(str, '=', '%3D');
	let str=replace(str, '?', '%3F');
	let str=replace(str, '@', '%40');
	let str=replace(str, '[', '%5B');
	let str=replace(str, ']', '%5D');
	let str=replace(str, '>', '%3E');
	let str=replace(str, '<', '%3C');
	let str=replace(str, chr(39), '%27');
	
	call=str;
		
End sub

LET noRows = NoOfRows('Timeline'); // get the total number of rows in the table
   
for i=1 to $(noRows) // loop through every row
	
     let text = FieldValue('text',$(i)); // get the value for "text" field on each row
     let textEncoded = text;
     call urlEncode(textEncoded);

     Sentiment:
     LOAD
    	'$(text)' as text, // Use this if you want to link your table on the text
    	status as sentiment_status,
    	score as sentiment_score
     FROM
     [http://localhost:5555/QVSource/SentimentAnalyser/?table=Sentiment&SentimentEngine=Repustate&Message=$(textEncoded)] // specify url param
     (txt, utf8, embedded labels, delimiter is '\t', msq);
next

Note - In the above script the SentimentEngine=Repustate which sentiment API to access.

If we wish to use one of the other sentiment APIs we can simply change this as follow:

http://localhost:5555/QVSource/SentimentAnalyser/?table=Sentiment&SentimentEngine=Alchemy&Message=$(text)
http://localhost:5555/QVSource/SentimentAnalyser/?table=Sentiment&SentimentEngine=Viralheat&Message=$(text)
http://localhost:5555/QVSource/SentimentAnalyser/?table=Sentiment&SentimentEngine=TwitterSentiment&Message=$(text) http://localhost:5555/QVSource/SentimentAnalyser/?table=Sentiment&SentimentEngine=Chatterbox&Message=$(text)

Pro Tip

Performance improvements have been found by 'cleaning' a copy of the status text in the load statement for the table where it is actually first loaded, rather than in the 'for / next' loop above.

So for example on loading the status updates from Facebook, we might introduce an additional column named 'Statuses_message_sentiment' where we remove these problematic characters.

We would then use this column in the script above, removing the need for the urlEncode function.

Note that the following replace statement should really include all the replace statements in the urlEncode function above.


FacebookConnector_Statuses:
LOAD
	userid as userid,
	id as Statuses_id,
	replace(replace(replace(replace(replace(replace(replace(replace(replace(message,'#', ''),'[',''),']',''),'&','and'),'',''),'',''),'',''),';',''),chr(39),'') AS Statuses_message_Sentiment,
	message as Statuses_message,
	...etc...
FROM
[http://localhost:5555/QVSource/IndustrialCodeBox_FacebookConnector/?table=Statuses]
(txt, utf8, embedded labels, delimiter is '\t', msq);



Caching

The sentiment analyser currently caches the sentiment scores. If you want to clear the cache follow the instructions here. This might be useful, for example, if the algorithm used to score the sentiment changes.

Backing Up The Cache

The cache will likely grow to a considerable size and retrieving items from the cache will be significantly faster than hitting the sentiment API. For this reason it is recommended that you take regular backups of this cache so that it can be restored in the future if necessary.

By default the cache should be found at the following location (unless you have changed it):
[INSTALLATION_DIR]\QVSource\Data\SentimentAnalyser\Cache

Change Log

1.0.7 - 07/05/12
  • Added new ChunkedSentiment table to Repustate provider.
  • Repustate Sentiment table now supports other languages (NOTE that you also need an account/API key with Repustate that supports this).
  • Added Usage table to Viralheat and AlchemyAPI providers to indicate remaining quota.

1.0.6 - 16/04/12
  • Added RankedConcepts, Categories, Keywords and Language tables to Alchemy connector.
  • Adjectives and Verbs tables in Repustate connector now show status column result.
  • Renamed to 'Sentiment Analysis & Text Analytics Connector'.
  • Now links to correct help page.
  • Refactored code to access Mashape APIs (e.g. Chatterbox) into SDK for use by other connectors.

1.0.5 - 02/03/12
  • Minor performance updates to Repustate and ChatterBox connectivity.

1.0.4 - 15/02/12
  • Added missing outgoing endpoint (free-dev.cbanalytics.co.uk) to list on Advanced tab (for proxy configuration etc.)

1.0.3 - 13/01/12
  • Added note to AlchemyAPI about free key.
  • Minor fix to UI when switching between sentiment APIs.

1.0.2 - 20/12/11
  • Now remembers the last sentiment API used.
  • ChatterBox API support added which provides sentiment for short messages (< 300 characters) in English, Spanish, Portuguese, French and Dutch.
  • Fixed issue where error responses had error message in incorrect results columns.

1.0.1 - 24/11/11
  • The value of the SentimentEngine parameter passed in the url is now case insensitive.

1.0.0 - 19/11/11
  • First non beta release.
  • Now throws an error if the sentiment engine passed in the url (previously defaulted to TwitterSentiment)
© Copyright 2011 - 2012 Industrial CodeBox Ltd