Linear Regression modeling with Tableau

Tableau despite being an excellent tool to quickly visualize the data can also be used for creation and verification of Linear regression models used for predictive analytics. The ability of Tableau to integrate with external statistical languages like Python or R allows it to use the Regression models built in those languages to directly be used in Tableau.

 

Integration of R and Tableau

  • Download and install software:

To integrate R with Tableau, we would need R Studio:

R download link: https://cran.r-project.org/bin/windows/base/

R Studio download link: https://www.rstudio.com/products/rstudio/download/

We would also need Tableau desktop: https://www.tableau.com/products/desktop

  • Open R Studio and Type below commands on R command line:

install.packages(“Rserve”);

library(Rserve);

Rserve()

  • Open Tableau desktop and goto Help Menu -> Settings and performance -> Manage External Service Connection

  •  Select Localhost and port 6311

  • Test Connection and Ok.

Development of Linear Regression Model:

After integrating RServe and Tableau, we are all set to embed the R code for linear regression model creation into R calculated fields.

The sample data used here is an open source data available for download from Duke university’s website: http://www2.stat.duke.edu/~mc301/data/movies.html

The data contains a sample of 651 movies, their reviews, critics score etc. (The data dictionary is also present at above link).

Let us try to develop a regression model to predict the audience score from various other dependent variables like IMDB Rating or Critics Score.

We will first analyze the relationships among these variables via a scatter plot among them in Tableau:

 

Above figure shows two plots:

IMDB votes Vs Audience Score

Critics Score Vs Audience Score

Clearly, Critics Score seems to have a greater linear relationship with Audience Score. That simply means that Critics Score is a better predictor of Audience Score rather than IMDB Votes.

Let us write a calculated field called “Predicted Audience Score”

Tableau’s SCRIPT_REAL function can be used to embed R or Python code in Tableau’s calculation.

Here we have used Critics Score to Predict Audience Score.

Let us plot the Predicted Audience Score Vs Audience Score.

As can be seen clearly the plot of the Predicted Audience Score Vs Audience Score comes out as a perfect straight line.

 

Conclusion:

Given above is a very basic example of achieving a simple linear regression model using Tableau and R.

The advanced and much more sophisticated Linear regression model has been developed in R and can be located at below GitHub URL:

https://github.com/shashibhushan86/Linear_Regression/blob/master/reg_model_project.Rmd

 

 

Posted by Shashi Bhushan, 0 comments

SQL vs NoSQL Database

Difference between SQL and NoSQL

It is widely known that 80 percent of the world’s data is unstructured or semi-structured data, NoSQL concept has beceome more and more popular. This artical help you understand basic difference between SQL and NoSQL data and which database should you choose to use.

There are two important terminology you need to know.

ACID

ACID as shorthand for Atomicity, Consistency, Isolation, and Durability. These 4 are basic principle of SQL database. You can click on the above link of ACID to know more about the details.

In a simple usecase, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. The transaction can be either successful or fail, And the state can only be from one valid status to another.

BASE

A BASE system gives up on consistency. – Basically available indicates that the system does guarantee availability, in terms of the CAP theorem. – Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model. – Eventual consistency indicates that the system will become consistent over time, given that the system doesn’t receive input during that time.

Simple Example to explain

Eventual consistency:

  1. I watch the weather report and learn that it’s going to rain tomorrow.
  2. I tell you that it’s going to rain tomorrow.
  3. Your neighbor tells his wife that it’s going to be sunny tomorrow.
  4. You tell your neighbor that it is going to rain tomorrow.

Eventually, all of the servers (you, me, your neighbor) know the truth (that it’s going to rain tomorrow), but in the meantime the client (his wife) came away thinking it is going to be sunny, even though she asked after one or more of the servers (you and me) had a more up-to-date value.

As opposed to Strict Consistency / ACID compliance:

  1. Your bank balance is $50.
  2. You deposit $100.
  3. Your bank balance, queried from any ATM anywhere, is $150.
  4. Your daughter withdraws $40 with your ATM card.
  5. Your bank balance, queried from any ATM anywhere, is $110.

At no time can your balance reflect anything other than the actual sum of all of the transactions made on your account to that exact moment.

Reason to use a SQL DATABASE

Even as NoSQL databases gain popularity for their speed and scalability, there are still situations where a highly structured SQL database may be preferable. Here are a few reasons you might choose an SQL database:

  • You need to ensure ACID compliancy ACID protects the integrity of your database by prescribing exactly how transactions interact with the database. Generally, NoSQL databases sacrifice ACID compliancy for flexibility and processing speed, but for many e-commerce and financial applications, an ACID-compliant database remains the preferred option.
  • Your data is structured and unchanging If your business is not experiencing massive growth that would require more servers and you’re only working with data that’s consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume.

REASONS TO USE A NOSQL DATABASE

When all of the other components of your server-side application are designed to be fast and seamless, NoSQL databases prevent data from being the bottleneck. Big data is the real NoSQL motivator here, doing things that traditional relational databases cannot.

  • Storing large volumes of data that often have little to no structure. A NoSQL database sets no limits on the types of data you can store together, and allows you to add different new types as your needs change. With document-based databases, you can store data in one place without having to define what “types” of data those are in advance.
  • Rapid development. If you’re developing within two-week Agile sprints, cranking out quick iterations, or needing to make frequent updates to the data structure without a lot of downtime between versions, a relational database will slow you down. NoSQL data doesn’t need to be prepped ahead of time.
Posted by Sun Fei, 0 comments

Choose Right Tableau Authentication Method

Authentication and Authorization

To Understand how various login method that tableau server supports. We need to understand two basic concept:

  • Authentication is the process of ascertaining that somebody really is who he claims to be.
  • Authorization refers to rules that determine who is allowed to do what

Let’s put these 2 big words in tableau context. Tableau can use various authentication method, local authention, AD, SAML,OpenID etc. These are ways to let tableau server determine who you are. After user is authenticated. Tableau server then decide what resource can be accessed by that user.

In other words, whichever method you choose to authenticate users. it manages only how to validate somebody really is who he claims to be. And Determine who is allowed to do what is always managed by tableau server.

Local Authentication VS Active Directry

When you install Tableau Server, you must select the process that the server will use to manage user authentication: local authentication or Active Directory. Before you install Tableau Server, you should understand how these two options impact your overall authentication strategy.

After this configuration is complete, you cannot change the authentication method. In fact, to change this configuration, you must uninstall the server, delete the configuration on the computer, and then reinstall the server.

Use local authentication if any of the following are true:
  • Your organization does not manage users with Active Directory
  • You do not want to use Active Directory
  • You want to use SAML/OpenID for authentication and single sign-on
Use Active directory if following is true:

If you are already using Active Directory to manage users in your organization, then we recommend selecting Active Directory authentication during Tableau setup to make user provisioning and management easier.

Create SSO experience

Both local authentication and active directory can create experience that user don’t have to explicitly sign in to Tableau Server. Instead, the credentials they’ve used to authenticate already are used to authenticate them to Tableau Server, and they can skip the step of entering a username and password to access Tableau Server.

Local Authentication Active Directory
SAML SAML
OpenID Integrated Windows Authentication
Kerberos
SSO supported by local authentication
  • SAML: With SAML, an external identity provider (IdP) authenticates the user’s credentials, and then sends a security assertion to Tableau Server that provides information about the user’s identity
  • OpenID: OpenID Connect is a standard authentication protocol that lets users sign in to an identity provider (IdP) such as Google. After they’ve successfully signed in to their IdP, they are automatically signed in to Tableau Server
SSO supported by active directory
  • SAML Same as above
  • Integrated Windows Authentication: Automatic logon uses Microsoft SSPI to sign in your users based on their Windows username and password. Users are not prompted for credentials, which creates an experience similar to single sign-on (SSO).
  • Kerberos If Kerberos is enabled in your environment and if the server is configured to use Active Directory authentication, you can provide users with access to Tableau Server based on their Windows identities.
Posted by Sun Fei, 0 comments

Resolve Tableau rows dropping issue

Reading data correctly from a data source is the basis of the correct and meaningful visualization! This is true for all visualization tools and Tableau is no exception.

Issue

A peculiar Issue was faced with Tableau data reading once when it suddenly started dropping the data rows that it was reading from a csv file. The environment setup was as follows:

There were ^ delimited csv files getting generated daily from Oracle database by using PL/SQL code.

The files were fairly large (700 MB).

There were more than 100 columns in the csv file with almost 20 measure columns.

These csv files were then used to be appended to the Tableau Data extract (TDE) daily.

The Issue was caught during the Post production phase when the sum of Nominal Flow and Interest Flow columns in Tableau data extract was not matching with that in the database.

Careful study of the data showed a curious observation that only the rows containing the decimal numbers were missing and only the Integers were being showed up in the data extract. This Issue persisted even if we change the data type of the measure column in Tableau to Number(Decimal). The two measure column data has been highlighted below:

Further research proved that this was actually an Issue with the Microsoft’s Jet database engine which is used by Tableau to connect to all Microsoft provided data sources like xlsx, access, csv, txt etc.

http://kb.tableau.com/articles/Issue/resolving-incorrect-data-issues-related-to-jet

A brief limitations of the Microsoft’s Jet database engine are listed in below article:

http://kb.tableau.com/articles/Issue/limitations-to-data-and-file-sizes-with-jet-based-data-sources

Also, a reply from Tableau support confirmed the same:

When we connect to a CSV file as data source by default connection(open the file directly), the data type of a column is determined by the first 1024 rows. We can see that the column “Nominal Flow” has a decimal number 51.63 at row 331 and the column “Interest Flow” has no decimal numbers within the first 024 rows. So Tableau Desktop determines the data types for “Nominal Flow” and “Interests Flow” are “Number(Decimal) and “Number(Whole)” respectively. However, “Interests Flow” actually has both decimal numbers and whole numbers(integers). Those integers are properly displayed while those decimal numbers are displayed as NULL. “

 

 

Solution:

The solution was to force Tableau to read the data as a decimal number. One way to do it was to insert a dummy record at the beginning of the dataset while exporting the data but this would not be a clean solution specially when we try to join/blend two data sources.

There was another solution, to export the measure columns as floats while exporting from the database and it required a tweak in the PL/SQL code. It was done by a code like this:

case when mod(“Nominal Flow”,1)=0 then trim(to_char(“Nominal Flow”))||’.00′ else trim(to_char(“Nominal Flow”)) end AS “Nominal Flow”,

CASE WHEN MOD(“Interest Flow”,1)=0 THEN TRIM(TO_CHAR(“Interest Flow”))||’.00′ ELSE TRIM(TO_CHAR(“Interest Flow”)) END AS “Interest Flow”

 

Hence the final output looked like:

When Tableau was connected with the above data output and converted into TDE, It worked like a charm.

Tableau was able to Identify the column as float data type and no rows were found to be getting dropped.

Posted by Shashi Bhushan, 0 comments
Reading TAP 3.12 (ASN1) Files with Dataverse

Reading TAP 3.12 (ASN1) Files with Dataverse

When I first worked at Lavastorm, many years ago now, it was most commonly used for Revenue Assurance and Fraud Analytics in Telecommunications companies. Today, the same visual mapping of data processes allows analysts from all industries rapidly build data models and apply analytics to data from many disparate sources.

Revenue Assurance commonly requires that you acquire data as close to the source as possible, the objective being that you would identify any discrepancy in how business rules had been applied to the data as early as possible in the data value chain. The consequence being that downstream processes, including billing, would use this information and if errors existed this could result in a financial loss or gain, rework and/or impact customer experience.

That’s where Dataverse’s ability to read complex data types, and build your own complex data readers, can come in handy. It allows you to quickly take a vendor or standard specifications (such as the GSMA’s TAP specification for sharing data about roaming customers) and read the data, converting it into records.

Continue reading →

Posted by Michael Lowe in Dataverse, Telecommunications, 0 comments
Importing global weather data since 1850 into Tableau

Importing global weather data since 1850 into Tableau

Optimizing your supply chain is a never ending task. Recently a client asked us to explore localized weather patterns with Tableau. This article shows how we were able to massage the data before importing it into Tableau.

We will use the NOAA (National Centers for Environmental Information) and GSOM (Global Summary of the Month) weather data into Tableau to do some analytics.

Continue reading →

Posted by Michael Lowe in Dataverse, 0 comments
Load more