Azure Machine Learning Studio: the best place for initial data analysis?

While looking at a (relatively small, 1.7 million records) big data example of New York Yellow Cab taxi trips, I am coming to the conclusion that the best place (if as we do you are using Microsoft tools) for initial analysis, including the all important first step of finding outliers/errors, is Azure Machine Learning Studio (Azure ML, as opposed to Excel, Power BI or bespoke analysis using e.g. Kendo UI).

Why Azure ML for initial analysis?

  1. It loads data quite quickly (e.g. just over a minute to import almost 2 million records from an Azure SQL database). This is currently much quicker than Power BI.
  2. It automatically produces histograms and box plots of numeric fields (see the images below, and above, where the field FareAmount has been selected). We can tell immediately from the box plot that there are several outliers (and in fact probable errors that will need to be either corrected or removed, in that FareAmount should not have negative values!).

Why do data scientists use R and Python, as opposed to other languages like C#?

As a “proper” programmer, used to programming in heavy duty, compiled languages like C# (and before that C++ and C), my reaction on discovering during my Data Science journey that R and Python are heavily used by data scientists was: why??

Why would anyone use an interpreted language, which is therefore bound to be slower, and why would anyone go to the trouble of using yet another language when there are perfectly good compiled languages around like C#, F# and VB.net?

The answer seems to be partly that R and Python are free (open source), and also because R and Python have excellent visualisation tools, which the other languages currently lack.