Is your data ready for Data Science?

01_Is your Data ready

Data need to be

  1. Relevant
  2. Connected
  3. Accurate
  4. Enough to work with

02_Relevant Data

Relevant Data- numbers in each row are relevant to each other.

03_Connected Data

Most data sets are missing some values. It’s common to have holes like this and there are ways to work around them.

If you look at the table on the left, there’s so much missing data, it’s hard to come up with any kind of relationship between grill temperature and patty weight. This is an example of disconnected data.

04_Accurrate Data

‘X’ is theĀ center of these arrows. If ‘X’ is close to the bull’s eye, it is accurate.

Precise is tight grouping of the arrows. Imprecise is when arrow are spread out.

Think of each data point in your table as being a brush stroke in a painting. If you have only a few of them, the painting can be pretty fuzzy – it’s hard to tell what it is.

If you add some more brush strokes, then your painting starts to get a little sharper.

When you have barely enough strokes, you can see just enough to make some broad decisions.

As you add more data, the picture becomes clearer and you can make more detailed decisions.

Video can be watched at the following link:

https://docs.microsoft.com/en-us/azure/machine-learning/studio/data-science-for-beginners-is-your-data-ready-for-data-science

Many thanks to Brandon Rohrer [Senior Data Scientist] from Microsoft Azure Machine Learning