Uploading Big Data: it’s very different from normal data

The above screenshot shows an initial analysis (in Microsoft Power BI) of 1,723,099 records of New York taxi trip records uploaded to the cloud.  The top chart shows a scatter plot of Trip Distance in miles against the Total Fare Amount (in US $).  This useful chart shows straightaway that there are some outliers in the data (e.g. some trips cost over $1,000 despite being only for short distances).  These records are almost certainly errors (where e.g. the fare was entered with the decimal point in the wrong place, e.g. $1000.00 instead of $10.00) and should be corrected or removed. Similar errors in the Trip Distance fields had already been removed in that 2 records had implausible distance values (e.g. 300,833 miles for a total fare of $14.16, and 1,666 miles for a total fare of $10.30).

In order to analyse big data, it often needs to be moved from its original sources (e.g. separate csv or txt files, or a stream) to somewhere where it can be collated and processed (e.g. an online database, or Microsoft PowerBI, or an xdf, extensible data format, file that can be analysed by Microsoft R Server).

Why adding chatbots makes financial sense for your organisation

Adding a chatbot to your organisation’s website can provide a more interactive experience for your users while at the same time reducing demands on your staff’s time. Chatbots can help to:

  • free your team to deal with more complex enquiries or tasks
  • speed up employee training by providing a very accessible and intuitive source for staff to obtain information internally
  • automate complex workflows (such as providing quotes or booking services)
  • provide availability 24/7, 365 days a year
  • provide an alternative user interface for your apps than the traditional point and click menu/button system

Why do data scientists use R and Python, as opposed to other languages like C#?

As a “proper” programmer, used to programming in heavy duty, compiled languages like C# (and before that C++ and C), my reaction on discovering during my Data Science journey that R and Python are heavily used by data scientists was: why??

Why would anyone use an interpreted language, which is therefore bound to be slower, and why would anyone go to the trouble of using yet another language when there are perfectly good compiled languages around like C#, F# and VB.net?

The answer seems to be partly that R and Python are free (open source), and also because R and Python have excellent visualisation tools, which the other languages currently lack.

PowerBIDashboard

Creating a corporate dashboard (using OData and Microsoft PowerBI)

Building a corporate dashboard so that you have key management information at a glance

(This article was first posted on 17 June 2017 on a different blog site, but migrated here 05 Feb 2018).

I have recently been building some corporate dashboards (as recommended by Daniel Priestley in his best selling book “24 Assets: Create a digital, scalable, valuable and fun business that will thrive in a fast changing world”). From chapter 15 of the book:

A key asset is a dashboard that allows the team to see how the business is performing. Carefully select some of the metrics that drive performance and make sure they show up prominently on your dashboard. You might select metrics like cash at bank, payments collected, expected invoices, revenue per employee or monthly users; the general rule is that whatever you measure will improve.

Accessing your valuable and key data

Dashboards need data, and this data will almost certainly need to come from a variety of sources in your organisation. There are lots of different ways of exposing your data sources so that the key information can be pulled into your dashboard. I reviewed several different options (including direct connections to databases, WebApi or MVC from websites and OData). My conclusion was that OData seemed the best current approach. Your data is valuable, so whatever method you use needs to be secure (i.e. with access protected via encryption and passwords) and you can do this with OData (and the other methods I have mentioned too). (Contact us if you need help with this. )