5 common biases in big data

2017-12-07_14-52-08.png

Today, businesses are aware that a huge part of their decision-making is impacted by big data. The large availability of data does not warrant its relevancy and neither does the analysis of big data by data scientists and analysts, as human judgment can sometimes be flawed. Moreover, several factors may impact data, either positively or negatively. As a result, data may fluctuate from time to time. That is why it becomes crucial for data teams to know how to make the right inferences from big data. This is only possible when data analysts and scientists are aware of the existential biases and the solutions to them.

Special thanks to Nate DW for the link to this article.  The best one of the five of these is “Simpson’s Paradox”.  No, not the one where Homer smashed his little boy’s piggy bank and is wondering what he’s done. It’s when you notice a pattern in groups of data that favors a trend but, when you look at the cumulative patterns of the groups, the trend looks totally different.  This is an excellent read for those of you who are labeling yourselves “Data Scientist”.  I’m just a “Data Tinkerer”.

AAIAAQDGAAgAAQAAAAAAAAr7AAAAJDU0MmY1YzVkLWQ0NDgtNDRmMC05MzM1LTIxZTI4Njg3ZjE4Nw.jpg

Via: 5 common biases in big data

SQL Server Machine Learning Services – Part 1: Python Basics

Image result for microsoft machine learning services images

With the release of SQL Server 2017, Microsoft changed the name of R Services to Machine Learning Services (MLS) and added support for Python, a widely implemented programming language known for its straightforward syntax and code readability. As with the R language, you can use Python to transform and analyze data within the context of a SQL Server database and then return the modified data to a calling application.

Here it is again “Python” being used for the programming language of data.  This series will introduce you to the renamed “R” now called “Machine Learning Services”.  MLS is a simple recipe, take the “R” stats engine, add a pinch of Python, add a cup of training.  This might be an interesting concept.  Guess I need to break out the Python in 30 Days book again.

 

Data Visualization Basics for Data Scientists

“A picture is worth a thousand words”, the old saying goes, and in some cases, a picture is worth even more than that. The human eye is composed of some 30 or more discrete components, and along with the optical nerves and the brain functions that process sight, can take in a contrast ratio of around 100,000:1 (over time) and can distinguish about 10 million colors. That sight-brain-pathway is a pattern-matching wonder and has “regions of interest” that the eye/brain connection focuses on (http://www.cambridgeincolour.com/tutorials/cameras-vs-human-eye.htm).

Making up one of our primary senses, sight is immeasurably important to conveying information, and it’s vital to the Data Scientist to understand how to best use various visualizations to display and discuss data.

There is a book reference in this article from 2013 that still is a must-read for anyone attempting data visualization at any level.  The best lesson is to look through other people eyes to appreciate how the information must “Look”.  I go by a simple rule, “If my wife, who is not technical or a data scientist, can’t understand the visual it probably needs more work”

Via: Microsoft Developer, Buck Woody

Manning free programming eBooks

freebooks.gif Just click the icon to the left, follow the instructions to sign up, and you’ll be added to Manning’s Deal of the Day mailing list. This means that during the month of December you’ll receive a special discount for a particular Manning publication in your email each day, good for that day only. So, remember to check your inbox regularly and act fast if you wish to get the deal!

3D Graphics JS Library WhiteStorm is coming

Whitestorm.png

Imagine writing a JavaScript application with full 3D graphics capability.  Sounds simple but when you include the physics of an object, shadows and where the light is coming from it starts to get overwhelming.  This article by Alexander introduces the WhiteStorm JS library.  After going to WhiteStorm website and trying some of the examples there, I can see this as an extension to the D3.JS charts library to create business intelligence charts that have three dimensions that you look at from 360 degrees by just dragging the mouse about.  Whitestorm with some other components is going to be a significant improvement to complex analytics.   Yes,  I know they are touting WhiteStorm as a gaming library but think a little outside the box. I have added this to my review and test list.  New at 11PM.

 

Master Big Data, Master Data Science first

DataScience

This is a very good article to read.   It is academic based but still is very relevant to business.  Data Science was a term coined in 1974, it was one of the courses I took in college.  Now it is back again, to define some skills you should consider learning to help manage and use any “Big Data” you may have.