Data mining requires data
Data mining requires and justifies huge investments. The smallest part is the data mining software itself. A much bigger part is the investment in data warehouse technology, a subject about which I’ve been posting extensively recently on DBMS 2.com. But there’s yet another part to the picture, namely investing in actually gathering data for analysis, that I’ve written about, most recently in a blog I posted elsewhere and am now copying below.
Analytic business processes — or the areas of overlap between analytics and business process — are poorly understood. Business Activity Monitoring and Operational BI? Great buzzwords, but there’s way too little thought put into figuring out exactly which metrics are most useful for making which kinds of business decisions. Continuous planning/budgeting? The surface has only been scratched. A numerate, “one-truth” enterprise culture? Hah. When we identify an enterprise that truly has a pervasive numbers-oriented culture, it usually is one that winds up pathologically managing to a purely short-term set of goals. (But some exceptions to that rule are among the great corporations of the world.)
One area that really needs more consideration is data capture. You can’t analyze data you don’t have. Certain industries have indeed recognized this. E.g., travel and gaming have been hugely successful with loyalty cards; indeed, casino giant Harrah’s probably gets over 100% of its profits via targeted marketing based on the mining of its loyalty card data. Credit transaction data and the like is of course also heavily exploited. I made this whole case in a Computerworld column a year ago, and if you missed it I suggest still checking that column out today.
But that’s all transactional data. The story for text data is much worse. Indeed, survey forms typically try to force people away from just saying what they think, instead giving them endless checklists that bring back unhappy memories of SATs and #2 pencils. Yet text mining technology now exists that makes it possible to glean crucial information from free-form text. If you haven’t already checked it out, you should.
Particularly interesting, I think, are some examples in the area of text data and analytics.
Comments
5 Responses to “Data mining requires data”
Leave a Reply
In most cases, data mining does not require “huge investments”. The biggest investment neccessary for data mining is in paying for someone qualified to do the data mining. Assuming the data to be analyzed already exists in some sort of database, all that is needed is a decent PC (at most about $3,000) and software ($2,500 or less). This is what I’ve used for several years to build predictive models used to manage several billion dollars worth of risk. One can pay more, but I’m not sure what benefit that provides.
Will,
The investment I was referring to was in building and maintaining the data stores, which get up to 100s of terabytes these days in some cases. (Petabytes get mentioned occasionally too, but I don’t know of a single instance where data mining is truly carried out on that scale.)
But yes, that’s more true in some businesses — especially ones with LOTS of customers or prospects — than others.
Thanks for your comment,
CAM
[…] is a much smaller challenge than enforcing procedural discipline on your decision-makers. Double-quoting myself this time, Analytic business processes — or the areas of overlap between analytics and business […]
[…] gave some examples of creating new data to analyze back in 2005 and […]
[…] can provide support, in technology or data gathering, for one of the other […]