Wednesday, 28 October 2015

Logistic Regression - Interpret Output - Part 2

In the previous blog, we have explained initial set of logistic regression output & statistics. Now , in this blog next set of logistic regression output & statistics are discussed. We have received great response from data scientist and analytics profressionals - 39 Facebook likes.

Analysis of Maximum Likelihood Estimates

Parameters in logistic regression are estimated using Maximum Likelihood Estimation (MLE) functions.  The significance of individual exploratory variable parameters is assessed using Wald Chi Square test.
Parameter:  Intercept and exploratory variables used in a logistic model, the weight of these are estimated using MLE
DF: Degree of Freedom. This is required for testing variable significance
Estimate: Estimates are beta coefficients for each exploratory variable.  The logistic regression function models the log odds of the binary dependent variable. By default it estimates for the dependent variable value 0 but can be changed by using Descending option in PROC LOGISTIC.
Logistic Regression Model, parameters and independent variables.
Log [p / (1-p) ] = Intercept (B0) + B1*Gender + B2*GeogBks + B3 *ItalArt+B4*Recency
Standard Error: Estimated error of beta coefficient
Wald Chi-Square: Wald Chi-Square Statistics calculated as Estimate/Standard Error. It is used for finding significance of each of the exploratory variables.
Pr>ChiSq:  For the calculated Wald Chi Square Statistics, two tailed P value of Chi Square distribution for the given degree of freedom (DF)is shown.
If P value is less than 0.05, it can be concluded that there is less than 5% evidence to support the hypothesis of Beta coefficient for a predictor is zero.  In the below example, all the variables can be selected at 5% significance level.
Maximum Likelihood

Monday, 26 October 2015

Logistic Regression: How do you interpret output?

Originally published on RamG Data Analytics & Insights (www.ramganalytics.com)
In the previous blog, we  elaborated on Why and How to learn Predictive Modelling?
One of the commonly used statistical techniques is Logistic Regression.  In this blog focus is to understand logistic regression out. We are using SAS for executing logistics regression but similar results & statistics will be published for logistic regression in other tools such as SPSS and R.
The output of a logistic regression is explained in a simplified way. Logistic Regression output has model selection and performance criteria or statistics.  We have used default SAS Logistic Regression output to illustrate important statistics. Number of additional options could be used  for any specific requirements such as getting ROC curve or C table.
SAS has following important section in Logistic Regression output
    • Model Information
    • Response Profile and Model Convergence Status
    • Model Fit Statistics and Testing Global Null Hypothesis: BETA=0
    • Analysis of Maximum Likelihood Estimates and Odds Ratio Estimates
    • Association of Predicted Probabilities and Observed Responses

Model Information

Model information provides details on the input dataset name and response variable used. Logistic Regression can be used for building an ordinal and multinomial regression model. So, it has information on whether a binary logit or different model is built.
Below is an example of two Logistic Regression Outputs one with Response variable (Target or Dependent Variable)  “Florence” as Binary and Second Logistic Regression Model with  Response Variable (Target or dependent variable) “F”  as Ordinal
Model Information

Sunday, 18 October 2015

R Interview Questions

R is an open source Statistical Computing Environment and R Studio is IDE which use R for Data Science and Analytics.
Increased number of organizations are migrating statistical & data analytics to R and are looking for analytics professionals and Data Scientist who have R experience.
R Quiz
In a job interview, the organizations will be testing candidates for Logical Thinking, knowledge of Statistical & Machine Learning Techniques and  R/R Studio skills.
Depending on the role and expectations, the mix of questions will vary. Some of the questions which could be asked to  the candidates for evaluating R programming skills are
  1. What are the different data types in R?
  2. What is different between Matrix and Data Frame in R?
  3. Why do you need apply() family of functions? What is difference between sapply() and lapply()?
  4. How do you import data into R?

T Tests using R: Explained with Examples

Statistical tests are used for making correct interpretation of data analysis results and making conclusions based on significance not on chance occurrences.
Other important rationale is that the conclusions are typically based on sample and we aim to ensure the inferences are relevant for the population (or other sample).
In one of the scenarios, average spend for one sample was $1200 and the other sample has average spend value as $1210. Mathematically, two average values are different, but
  • Is the difference significant?
  • Is the difference due to chance or has sufficient evidence?
For answering these questions, we could test the difference using statistical tests.
T Test is used for testing mean of a sample or comparing means of two samples.
means of two samples

Thursday, 15 October 2015

Survival Modeling using R - Simplified




Survival Modelling is a family of techniques which are used when time to even becomes important.
Survival Models can be used for predicting time of an event ( when customer will take up a product), estimating duration until next event occurs (customer visit to a retail store).
Some of the applications of Survival Modeling across industry verticals.
Some of the concepts related to Survival Modeling are
Survival Function: Probability of surviving until time *t* is called survival function. It is normally represented as S(t).
Hazard Rate: Event rate for time *t* given survival until *t*. This is also called Hazard or failure rate.
Censoring: When event information for the cases under analysis/investigation is missing, it is called censoring.
SurvivalFunctionPlot

Thursday, 8 October 2015

Customer Analyst - A Glimpse

Customer Analyst is involved in providing data driven insights for customer or segment managers. Customer analysts works on customer analytics which involves understanding customer behavior using data across customer life cycle – Customer Acquisition, Customer Growth/Development and Customer Retention. Some of the business decision drives in customer analytics are based on market segmentation, customer behavioural segmentation, Life Style Segmentation, Value Based segmentation & strategies, and predictive analytics – cross-sell/up-sell, attrition modeling, next best action framework, customer life time value (CLTV) modeling and a few others.
Customer Life Cycle
Typically Salary of Customer Analyst
Role and Expectations from a Customer Analyst
Skills Required for a good Customer Analyst
and much more.. read further 

Sunday, 4 October 2015

Facebook Groups- How engaged are Data Scientists?

In this blog, the aim is to understand engagement of users on Facebook groups. Facebook groups are created to target audience or fellow members with a “relevant” message. Group members could be targeted for
  • Building brand or creating awareness
  • Sharing blogs or gaining audience for a website
  • Indirect product or services selling
  • Building profile by sharing knowledge
The group members get a relevant update for them to be aware of the latest development, learn about various techniques & applications, and get help from the fellow members specially on Data Science and Analytics Applications.
A few groups are analysed and a high level summary of number of members of these groups
Members
 Read more on  Who are the high contributors? and much more