I’m playing with alternating between the publishing page and blog entries and am not sure which I like better.
Which businesses really need a data scientist? https://bit.ly/2JInbhj
My focus on business intelligence the last few years, my long term interest in artificial intelligence and the growth of machine learning came together to drive the content for my latest Forbes article.
It’s been a while since I watched a webinar, but since business intelligence (BI) and (AI) are overlapping areas of interest, I watched Tuesday’s TDWI webinar on Machine Learning (ML). As the definition of machine learning expands out of the pure AI because of BI’s advanced analytics, it’s interesting to see where people are going with the subject.
The host was Fern Halper, VP Research, at TDWI. The guests were:
- Mike Gualtieri, VP, Forrester,
- Askhok Swaminathan, Senior Director, Product Management, SAP,
- Chandran Saravana, Senior Director, Advanced Analytics, SAP.
Ms. Halper began with a short presentation including her definition of ML as “Enabling computers to learn patterns with minimal human intervention.” It’s a bit different than the last time I reviewed one of her webinars, but that’s ok because my definition is also evolving. I’ve decided to use my own definition, “Technology that can investigate data in an environment of uncertainty and make decisions or provide predictions that inform the actions of the machine or people in that environment.” Note that I differ from my past, purist, view, of the machine learning and adjusting algorithms. I’ve done so because we have to adapt to the market. As BI analytics have advanced to provide great insight in data discovery, predictive analytics and more, many areas of BI and the purist area of learning have overlapped. Learning patterns can happen through pure statistical analysis and through self-adaptive algorithms in AI based machines.
The most interesting part of Fern Halper’s segment was a great chart showing the results of a survey asking about the importance of different business drivers behind ML initiatives. What makes the chart interesting, as you can see, is that it splits results between those groups investigating ML and those who are actively using it.
What her research shows is that while the highest segments for the active categories are customer related, once companies have seen the benefits of ML, the advantages of it for almost all the other areas jump significantly over views held during the investigation phase.
A panel discussion then proceeded, with Ms. Halper asking what sounded like pre-discussed questions (especially considering the included and relevant slides) to the three panelists. The statements by the two SAP folks weren’t bad, they were just very standard and lacked any strong differentiators. SAP is clearly building an architecture to leverage ML using their environment, but there weren’t case studies and I felt the integration between the SAP pieces didn’t bubble up to the business level.
The reason to listen to this segment is Mr. Gualtieri. He was very clear and focused on his message. While I quibble with some of the things he said about data scientists, that soap box isn’t for here. He gave a nice overview of the evolving state of ML for enterprise. The most important part of that might have been missed by folks, so I’ll bring it up here.
Yes, TensorFlow, R, Python and other tools provide frameworks for machine learning implementations, but they’re still at a very technical level. They aren’t for business analysts and management. He mentioned that the next generation of tools are starting to arrive, one that, just like the advent of BI, will allow people with less technical experience to more quickly use models in and gain insights from machine learning.
That’s how new technology grows, and I’d like to see TDWI focus on some of the new tools.
This was a good webinar, worth the time for those of you who are interested in a basic discussion of where machine learning is within the enterprise landscape.
Cloudera held a pretty impressive web event this morning. It was a mini-conference, with keynotes, some breakout tracks and even a small vendor area. The event was called Cloudera Now, and the link is the registration one. I’ll update it if they change once it’s VOD.
The primary purpose was to present Cloudera as the company for data support in the rapidly growing field of Machine Learning (ML). Given the state of the industry, I’ll say it was a success.
As someone who has an MS focused on artificial intelligence (ancient times…) and has kept up with it, there were holes, but the presentations I watched did set the picture for people who are now hearing about it as a growing topic.
The cleanest overview was a keynote presentation by Tom Davenport, Professor of IT and Management, Babson College. That’s worth registering for those who want to get a well presented overview.
Right after that, he and Amy O’Conner, Big Data Evangelist at Cloudera, had a small session that was interesting. On the amusing side, I like how people are finally beginning to admit that, as Amy mentioned, that the data scientist might not be defined as just one person. I’ll make a fun “I told you so” comment by pointing to an article I published more than three years ago: The Myth of the Data Scientist.
After the keynotes, there were three session of presentations, each with three seminars from which to choose. The three I attended were just ok, as they all dove too quickly into the product pitches. Given the higher level context of the whole event, I would have liked them all to spent more time discussing the market and concepts far longer, and then had much briefer pitch towards the end of the presentations. In addition, they seem too addicted to the word “legacy,” without some of them really knowing what that meant or even, in one case, getting it right.
However, those were minor problems given what Cloudera attempted. For those business people interested in hearing about the growing intersection between data, analytics, and machine learning, go to Cloudera’s site and take some time to check out Cloudera Now.
I’d been thinking of writing a column on p-values, since the claim that data “scientists” can provide valuable predictive analytics is a regular feature of the business intelligence (BI) industry. However, my heavy statistics are years in my past. Luckily, there’s a great Vox article on p-values and how some scientists are openly stating that P<.05 isn’t stringent enough.
It’s a great introduction. Check it out.
Data Lakes, the renamed ODS, aren’t the only solution for accessing data. Think actual need, understand supporting metadata, then build your data ingestion plan. Read my latest TechTarget column.
I’ve seen a few company webinars recently. As I have serious problems with their marketing, but don’t wish that to imply a problem with technology, this post will discuss the issues while leaving the companies anonymous.
What matters is letting business decision makers separate the hype from what they really need to look at when investigating products. I’m in marketing and would never deny its importance, but there’s a fine line between good marketing and misrepresentation, and that line is both subjective and fuzzy.
As the title suggests, I’ll discuss the line by describing my views of two buzzwords in business intelligence (BI). The first has been used for years, and I’ve talked about it before, it’s the concept of self-service BI. The second is the fairly new and rapidly increasing use of the word “machine” in marketing
Self-Service Still Isn’t
As I discussed in more detail in a TechTarget article, BI vendors regularly claim self-service when software isn’t. While advances in technology and user interface design are rapidly approaching full self-service for business analysts, the term is usually directed at business users. That’s just not true.
I’ve seen a couple of recent presentations that have that message strewn throughout the webinars, but the demonstrations show anything but that capability. The linking of data still requires more expertise that the typical business user needs. Even worse, some vendors limit things further. The analysts still create basic reports and templates, within which business people can wander with a bit of freedom. Though self-service is claimed, I don’t consider that to approach self-service.
The result is that some companies provide a limited self-service within the specified data set, a self-service that strongly limits discovery.
As mentioned, that self-service is either misunderstood or over promised doesn’t obviate that the technology still allows customers to gain far more insight than they could even five years ago. The key is to take the promises with a grain of salt.
When you see it, ignore the phrase “self-service.”
Prospective BI buyers need to focus on whether or not the current state of the art presents enough advantages over existing corporate methodologies to provide proper ROI. That means you should evaluate vendors based on specific improvements on your existing analytics and the products should be rigorously tested against your own needs and your team’s expertise.
Machine learning, to be discussed shortly, has exploded in usage throughout the software industry. What I recently saw, from one BI vendor, was a fun little marketing ploy to leverage that without lying. That combination is the heart of marketing and, IMO, differs from the nonsense about self-service.
Throughout the webinar, the presenter referred to the platform as “the machine.” Well, true. Babbage’s machines were analytic engines, the precursors to our computers, so complex software can reasonable be viewed as a machine. The usage brings to mind the concept of machine learning while clearly claiming it’s not.
That’s the difference, self-service states something the products aren’t while machine might vaguely bring to mind machine learning but does not directly imply that. I am both amused and impressed by that usage. Bravo!
Machine Learning and Natural Language Processing
This phrase needs a larger article, one I’m working on, but I would be remiss to not mention it here. The two previous sections do imply how machine learning could solve the self-service problem.
First, what’s machine learning? No, it’s not complex analytics. Expert systems (ES) are a segment of artificial intelligence focused on machines which can learn new things. Current analytics can use very complex algorithms, but they just drive user insight rather than provide their own.
Machine learning is the ability for the program to learn new things and to even add code that changes algorithms and data as it learns. A question to an expert system has one answer the first time, and a different answer as it learns from the mistakes in the first response.
Natural Language Processing (NLP) is more obvious. It’s the evolving understanding of how we speak, type and communicate using language. The advances have meant an improved ability for software to responds to people without clicking on lots of parameters to set search. The goal is to allow people to type or speak queries and for the ES to then respond with information at the business level.
The hope I have is that the blend will allow IT to set up systems that can learn the data structures in a company and basic queries that might be asked. That will then allow business users to ask questions in a non-technical manner and receive information in return.
Today, business analysts have to directly set up dashboards, templates and other tools that are directly used by business, often requiring too much technical knowledge. When a business person has a new idea, it has to go back to a slow cycle where the analyst has to hook in more data, at new templates and more.
When the business analyst can focus on teaching the ES where data is, what data is and the basics of business analysis, the ES can focus on providing a more adaptable and non-technical interface to the business community.
Machine learning, i.e. expert systems, and NLP are what will lead to truly self-service business applications. They’re not here yet, but they are on the horizon.
The new books section of my library had a text I almost didn’t check out. Unfortunately, I did. It’s “The Content Trap” by Bharat Anand, and it’s another great example of what academics miss about the real world. The book, from the fly leaf and introduction, presents itself as attempting to say that social networks are important and content isn’t. While the recent presidential election might imply that’s true, the author is supposedly knowledgeable about business and is focused on helping management strategy.
The problem is that I didn’t get twenty pages into the book before Mr. Anand displayed his complete misunderstanding of the business of technology. His chapter three is about “networks” and the first example purports to explain why Apple lost to Microsoft in the 1980s. He provides some semantically nil blather about “direct network effects” and “indirect network effects,” while assiduously avoiding what happened.
There are a number of reasons for Apple’s failure to get a significant market share at that time, among which are:
- Jobs and Wozniak ran a perfectionist organization while Gates and Allen quickly got “good enough” products to the market.
- Microsoft’s founders understood what IBM’s off-the-shelf production meant for rapidly entering a market while Apple wanted complete control of hardware, software and networking.
- Apple went for high-end price and élan rather than the factors that attract a business market quickly looking to move many things off of the mainframe and onto a manager’s desk.
- While Microsoft quickly adapted to larger screens, more functional mice, and other newer technology easier for business users, Apple stuck with the Mac’s small screen, one button mouse, and other limitations for far too long.
While the author talks about “network effects,” he doesn’t seem to show any understanding of the key products that provided that for Microsoft: The elements that became Microsoft Office: In particular, the spreadsheet. To talk about networks at a high, completely theoretical, level while claiming to give a case study does nothing to display an understanding of the issues involved.
That brings us back to Mr. Anand’s primary, fallacious, point. The PC didn’t create a network. Mainframe reports already provided to the network. His page 13 graphic about the hub and spoke versus multiple connection network has a simplistic accuracy but again misses the point. In the traditional method, most content was centralized. What he misses is that it’s not just users talking to each other around central content, as he presents, but each user having his or her own content that needs to integrate which changed.
The spreadsheet, and so many things since then, allowed individual managers to create their own content and then share it, faster than they previously could do the same. That led to a speed-up of business reactions.
However, it also led to multiple versions of content and the question of “versions of truth” that those of us in business intelligence daily address. We understand the power of networks, but also understand that without content and control over it business will have serious problems.
Content and networks can be seen as two halves of a coin. However, as the Apple example shows, they’re really two faces of a die, with many other factors that also matter. Bharat Anand doesn’t seem to comprehend that, but seems to instead to be quickly taking advantage of a market condition to abuse a network without content. It’s clear that, if you’re only interested in making money, networks will help. For example, an impressive academic title might get a lot of libraries to buy your book. However, to be truly of value, there must be content. The Content Trap lacks content. The author has made money, he’s added another line to his CV, but he’s added nothing of value to the ecosystem.
Nobody in business should pay attention to this book.
My latest article on Tech Target is about how Business Intelligence and the Internet of Things (IoT) overlap.