As I continue on my #datascience, #bigdata and #ai journey, I am pleased to have just completed my 6th course on the Microsoft Professional Program for Big Data with 84%: Delivering a Data Warehouse in the Cloud.
There are 4 more courses left on the program, and I am still on track to complete the program by my target of the end of January 2019.
As I continue on my #datascience, #bigdata and #ai journey, I am very pleased to have just completed my 5th course on the Microsoft Professional Program for Big Data with 98%: Introduction to NoSQL Data Solutions.
There are 5 more courses left on the program, and this means I am still on track to complete the program by my target of the end of January 2019.
I am delighted to have received a President’s Award for input on data science from outgoing Institute and Faculty of Actuaries President Marjorie Ngwenya, FIA at yesterday’s AGM at Staple Inn in London.
The IFoA is a tremendously vibrant organisation and I believe IFoA and other actuaries have an important role to play in helping businesses and organisations make the most from the torrents of data becoming available, whilst also helping protect consumers from unethical use of such data. In particular, I am very pleased that the IFoA is collaborating with the Royal Statistical Society in the vital area of the ethical use of data in data science. (For example a joint event was held earlier this month on the Industrialisation and Professionalisation of Data Science)
#The2040Economy : in my opinion, every business, charity, school, government department, etc needs to harness appropriate data for their organisation as soon as possible and extract value from it, or it will be overtaken by competitors who do.
And we (society across the whole planet) need to prepare for a new economy in which very few work.
Think about it: how will your children (or grandchildren) cope in a world where the economy doesn’t need them to work? How will society as a whole finance support for them?
While looking at a (relatively small, 1.7 million records) big data example of New York Yellow Cab taxi trips, I am coming to the conclusion that the best place (if as we do you are using Microsoft tools) for initial analysis, including the all important first step of finding outliers/errors, is Azure Machine Learning Studio (Azure ML, as opposed to Excel, Power BI or bespoke analysis using e.g. Kendo UI).
Why Azure ML for initial analysis?
- It loads data quite quickly (e.g. just over a minute to import almost 2 million records from an Azure SQL database). This is currently much quicker than Power BI.
- It automatically produces histograms and box plots of numeric fields (see the images below, and above, where the field FareAmount has been selected). We can tell immediately from the box plot that there are several outliers (and in fact probable errors that will need to be either corrected or removed, in that FareAmount should not have negative values!).