Author Archives: David Teich

P-values and what they mean for business intelligence and data scientists

I’d been thinking of writing a column on p-values, since the claim that data “scientists” can provide valuable predictive analytics is a regular feature of the business intelligence (BI) industry. However, my heavy statistics are years in my past. Luckily, there’s a great Vox article on p-values and how some scientists are openly stating that P<.05 isn’t stringent enough.

It’s a great introduction. Check it out.

BI Buzzwords for business management: Self-service and machine learning briefly explained

I’ve seen a few company webinars recently. As I have serious problems with their marketing, but don’t wish that to imply a problem with technology, this post will discuss the issues while leaving the companies anonymous.

What matters is letting business decision makers separate the hype from what they really need to look at when investigating products. I’m in marketing and would never deny its importance, but there’s a fine line between good marketing and misrepresentation, and that line is both subjective and fuzzy.

As the title suggests, I’ll discuss the line by describing my views of two buzzwords in business intelligence (BI). The first has been used for years, and I’ve talked about it before, it’s the concept of self-service BI. The second is the fairly new and rapidly increasing use of the word “machine” in marketing

Self-Service Still Isn’t

As I discussed in more detail in a TechTarget article, BI vendors regularly claim self-service when software isn’t. While advances in technology and user interface design are rapidly approaching full self-service for business analysts, the term is usually directed at business users. That’s just not true.

I’ve seen a couple of recent presentations that have that message strewn throughout the webinars, but the demonstrations show anything but that capability. The linking of data still requires more expertise that the typical business user needs. Even worse, some vendors limit things further. The analysts still create basic reports and templates, within which business people can wander with a bit of freedom. Though self-service is claimed, I don’t consider that to approach self-service.

The result is that some companies provide a limited self-service within the specified data set, a self-service that strongly limits discovery.

As mentioned, that self-service is either misunderstood or over promised doesn’t obviate that the technology still allows customers to gain far more insight than they could even five years ago. The key is to take the promises with a grain of salt.

When you see it, ignore the phrase “self-service.”

Prospective BI buyers need to focus on whether or not the current state of the art presents enough advantages over existing corporate methodologies to provide proper ROI. That means you should evaluate vendors based on specific improvements on your existing analytics and the products should be rigorously tested against your own needs and your team’s expertise.

Machine

Machine learning, to be discussed shortly, has exploded in usage throughout the software industry. What I recently saw, from one BI vendor, was a fun little marketing ploy to leverage that without lying. That combination is the heart of marketing and, IMO, differs from the nonsense about self-service.

Throughout the webinar, the presenter referred to the platform as “the machine.” Well, true. Babbage’s machines were analytic engines, the precursors to our computers, so complex software can reasonable be viewed as a machine. The usage brings to mind the concept of machine learning while clearly claiming it’s not.

That’s the difference, self-service states something the products aren’t while machine might vaguely bring to mind machine learning but does not directly imply that. I am both amused and impressed by that usage. Bravo!

Machine Learning and Natural Language Processing

This phrase needs a larger article, one I’m working on, but I would be remiss to not mention it here. The two previous sections do imply how machine learning could solve the self-service problem.

First, what’s machine learning? No, it’s not complex analytics. Expert systems (ES) are a segment of artificial intelligence focused on machines which can learn new things. Current analytics can use very complex algorithms, but they just drive user insight rather than provide their own.

Machine learning is the ability for the program to learn new things and to even add code that changes algorithms and data as it learns. A question to an expert system has one answer the first time, and a different answer as it learns from the mistakes in the first response.

Natural Language Processing (NLP) is more obvious. It’s the evolving understanding of how we speak, type and communicate using language. The advances have meant an improved ability for software to responds to people without clicking on lots of parameters to set search. The goal is to allow people to type or speak queries and for the ES to then respond with information at the business level.

The hope I have is that the blend will allow IT to set up systems that can learn the data structures in a company and basic queries that might be asked. That will then allow business users to ask questions in a non-technical manner and receive information in return.

Today, business analysts have to directly set up dashboards, templates and other tools that are directly used by business, often requiring too much technical knowledge. When a business person has a new idea, it has to go back to a slow cycle where the analyst has to hook in more data, at new templates and more.

When the business analyst can focus on teaching the ES where data is, what data is and the basics of business analysis, the ES can focus on providing a more adaptable and non-technical interface to the business community.

Machine learning, i.e. expert systems, and NLP are what will lead to truly self-service business applications. They’re not here yet, but they are on the horizon.

Book Review: The Content Trap

The new books section of my library had a text I almost didn’t check out. Unfortunately, I did. It’s “The Content Trap” by Bharat Anand, and it’s another great example of what academics miss about the real world. The book, from the fly leaf and introduction, presents itself as attempting to say that social networks are important and content isn’t. While the recent presidential election might imply that’s true, the author is supposedly knowledgeable about business and is focused on helping management strategy.

The problem is that I didn’t get twenty pages into the book before Mr. Anand displayed his complete misunderstanding of the business of technology. His chapter three is about “networks” and the first example purports to explain why Apple lost to Microsoft in the 1980s. He provides some semantically nil blather about “direct network effects” and “indirect network effects,” while assiduously avoiding what happened.

There are a number of reasons for Apple’s failure to get a significant market share at that time, among which are:

  • Jobs and Wozniak ran a perfectionist organization while Gates and Allen quickly got “good enough” products to the market.
  • Microsoft’s founders understood what IBM’s off-the-shelf production meant for rapidly entering a market while Apple wanted complete control of hardware, software and networking.
  • Apple went for high-end price and élan rather than the factors that attract a business market quickly looking to move many things off of the mainframe and onto a manager’s desk.
  • While Microsoft quickly adapted to larger screens, more functional mice, and other newer technology easier for business users, Apple stuck with the Mac’s small screen, one button mouse, and other limitations for far too long.

While the author talks about “network effects,” he doesn’t seem to show any understanding of the key products that provided that for Microsoft: The elements that became Microsoft Office: In particular, the spreadsheet. To talk about networks at a high, completely theoretical, level while claiming to give a case study does nothing to display an understanding of the issues involved.

That brings us back to Mr. Anand’s primary, fallacious, point. The PC didn’t create a network. Mainframe reports already provided to the network. His page 13 graphic about the hub and spoke versus multiple connection network has a simplistic accuracy but again misses the point. In the traditional method, most content was centralized. What he misses is that it’s not just users talking to each other around central content, as he presents, but each user having his or her own content that needs to integrate which changed.

The spreadsheet, and so many things since then, allowed individual managers to create their own content and then share it, faster than they previously could do the same. That led to a speed-up of business reactions.

However, it also led to multiple versions of content and the question of “versions of truth” that those of us in business intelligence daily address. We understand the power of networks, but also understand that without content and control over it business will have serious problems.

Content and networks can be seen as two halves of a coin. However, as the Apple example shows, they’re really two faces of a die, with many other factors that also matter. Bharat Anand doesn’t seem to comprehend that, but seems to instead to be quickly taking advantage of a market condition to abuse a network without content. It’s clear that, if you’re only interested in making money, networks will help. For example, an impressive academic title might get a lot of libraries to buy your book. However, to be truly of value, there must be content. The Content Trap lacks content. The author has made money, he’s added another line to his CV, but he’s added nothing of value to the ecosystem.

Nobody in business should pay attention to this book.

DBTA Webinar: Too many cooks, yet again

Sadly, DBTA is becoming known for taking interesting companies, putting them in a blender and having each lose their message. A recent webinar included Cask, Attunity and HPE Security – all in a one hour time slot – again shows the problem. It was a mess.

Cask is a young Hadoop company with an interesting opportunity (Disclosure: As I’m discussing marketing, I need to mention I recently interviewed for a position at Cask). The company is working to put wrappers around Hadoop code to make it easier for IT to use the data platform. One of their products is Cask Hydrator, to help populate the database. That begins to move the message of Hadoop out of the early adopter phase and into a business message, but the presentation was still far to technical.

Attunity then presented and a key point was that they make data ingest easy. If that sounds like a similar message to Cask’s, you’re right. Why the two were together on the webinar when much of what they said sounded like competition wasn’t clear. On the good side, Attunity did a far better job at presenting a business message, both in how the presenter talked about the products and in which case studies were used.

HPE Security made another appearance, tacked onto the end of a presentation. Data security is critical, and HP has put together a very good message on it, but it didn’t vaguely fit the tone and arena of the previous presenters.

When Companies Should Share a Stage

The smaller companies seem to have a problem. It’s simple: Their involvement in webinars might be driven by marketing, but it’s being controlled by bean counters. Each of the three companies had something good to say, and each should have taken the time to say it in a stand-alone webinar. However, sharing costs was made to be the primary issue and so the mess ensued.

When should firms share the spotlight? That should happen when the item missing from the top of my presentation is there. The missing piece is having a joint story to tell. None of the case studies mentioned the companies working in partnership. None. When multiple vendors work to provide a complete solution to a client, even if the vendors might sometime compete, there’s a strong case for multiple companies in a webinar.

This webinar was not that. It was companies not feeling strongly enough about themselves for the other executives to overrule the COO’s or CFO’s and push a solid webinar about themselves.

All of these companies are worth looking at within the big data arena, just not in such a forced together setting. Stand on your own or show a joint project.