The above screenshot shows an initial analysis (in Microsoft Power BI) of 1,723,099 records of New York taxi trip records uploaded to the cloud. The top chart shows a scatter plot of Trip Distance in miles against the Total Fare Amount (in US $). This useful chart shows straightaway that there are some outliers in the data (e.g. some trips cost over $1,000 despite being only for short distances). These records are almost certainly errors (where e.g. the fare was entered with the decimal point in the wrong place, e.g. $1000.00 instead of $10.00) and should be corrected or removed. Similar errors in the Trip Distance fields had already been removed in that 2 records had implausible distance values (e.g. 300,833 miles for a total fare of $14.16, and 1,666 miles for a total fare of $10.30).
In order to analyse big data, it often needs to be moved from its original sources (e.g. separate csv or txt files, or a stream) to somewhere where it can be collated and processed (e.g. an online database, or Microsoft PowerBI, or an xdf, extensible data format, file that can be analysed by Microsoft R Server).
“Uploading Big Data: it’s very different from normal data” Read More
(Posted by Patrick Lee on 1 August 2017 at a different location, but migrated here on 05 Feb 2018).
Why is there a range of answers, even using a given set of assumptions? Are these differences real, or artificial?
It would clearly make a difference whether a company’s pension liabilities were £475m, £500m or £525m …
The value of an organisation’s defined benefit (final salary or CARE – career average revalued) pension plan promises normally depends on many uncertainties, including:
- how long the plan members and their partners are expected to live
- what proportions of active members will leave service, or retire on ill health, before reaching normal retirement age
- what the future rates of salary growth and price inflation (and hence pension increases) will be
- assuming that a perfectly matching asset portfolio can’t be found (normally such a portfolio doesn’t exist), then what the future rates of reinvestment (for cashflow mismatches) will be.
“How corporate pension liabilities could vary by 10% or more, even on an agreed set of assumptions” Read More