Friday, 20 November 2015

Statistics vs. Machine Learning: Dilemma of Analytics Practitioner

Author: Rajneesh Pathak

Today Analytics industry uses multiple disciplines which help in solving problems by learning from data.  Techniques from Statistics, Operations research, Machine Learning / Statistical learning, Econometrics along with Market research can solve some similar and very diverse problems which analytics practitioners face today. Though a seasoned user of analytics handles this confluence of disciplines and availability of competing and complementary algorithms with ease, people continue to debate on the differences and superiority of these disciplines. Given the fact that many big names from Industry are betting big on Machine learning, this debate intensifies even further.



Read further

Saturday, 14 November 2015

Reading Large Files into R

Consistently 3 questions are asked and discussed across forums and we wanted to summarize for you. And these questions are:
  • How do you read large files in R?
  • How do you merge data frames in R?
  • How do you sort a data frame in R?
R is an open source Statistical Computing Environment and provide a number of R packages to perform advance analytics and data science applications.
In the course of proceeding to advance analytics or data science, we have to perform a number of data manipulation or munging activities. One of the first step is to read large files.
So one of the first questions typically asked in number of forums is "How to read large file in R?" In this blog, we summarize the different methods available and commonly used for reading large files into R/R Studio.
read data using R 1

Wednesday, 4 November 2015

Random Forest using R: Tutorial

Random Forest: Overview

Random Forest is an ensemble learning  based classification and regression technique. It is one of the commonly used predictive modelling and machine learning technique. Before understanding random forest algorithm, it is recommended to understand about decision tree algorithm & applications. A non-technical description of decision tree.
A simple explanation of why is it called “Random Forest”.
Random Forest Infographic





























Tuesday, 3 November 2015

64% posts in a Facebook group do not have single like

In the previous blog, we have shared  insights on the posts which group member have shared in a few Facebook groups. The groups are related to SAS programming, SAS Statistics, Big Data News, Big Data Analytics & Decision Science, R Programming.
We understood that a few folks are contributing in terms of posting into a Facebook group. We are shared statistics on number of members in each of the groups and who are posting more in each of the groups.
In this blogs, we wanted to understand engagement of other members and relevance of the posts. We want to understand % of the posts attracts  attention and engagement of the fellow group members.
The engagement is defined as sum of likes, comments and sharing of a post. We would expect that any relevant posts for the members will be able to attract likes, shares or/and comments. Though comments could be negative but for simplistic scenario, we would still consider as engagement.
Our Approach
  • Extract Facebook Data for a few Facebook groups
  • Append the data from all the Facebook
  • Create Engagement variable
  • Analyse & Visualize engagement level for the posts
We are using Excel, R and Tableau for extracting data, summarizing and visualizing the key metrics.
Over 16k posts have been posted in the 5 Facebook groups considered for the analysis and insights. And significant % of the posts are "Link" shares.

Posts Type1