Tag Archives: teich communications

WhereScape at BBBT: Another Intriguing Product Without a Clear Message

Last Friday’s BBBT presentation was by Michael Whitehead, CEO, WhereScape. The company seems to have a very interesting and useful product, but there’s a huge communications gap that needs to be addressed.

What They Do

One marketing issue to start was that I got most of this section from my own experience and WhereScape’s web site, not from Michael’s presentation. When someone begins a presentation by proudly announcing it is ““guaranteed there’s no corporate marketing in the presentation at all” while you’re presenting to a group of analysts, there’s a disconnect and it shows.

WhereScape has two products, Red and 3D, to help build and maintain data structures. The message is focused on data warehouses, but I’ll discuss that more in the next section. One issue was that their demonstration didn’t work as there seemed to be a problem connecting between their tablet and the BBBT display system, so much of what I’m saying is theory rather than anything demonstrated.

Red is their tool to build data warehouses. Other tools exist and have been around for decades, Informatica being just one competing firm.

3D is where the differentiation comes in. Everyone in IT understands that nightmare that is upgrading major software installations such as ERP, CRM and EDW systems. Even migrating from one version to the next of a single vendor can involve months of planning, testing and building, followed by more months of parallel runs to be safe. A better way of analyzing and modifying data structures that can compress the time frame can have a large positive impact upon a corporation. That’s what WhereScape is attempting.

What They Say

However, their message is all “Automation! Automation! Automation!” and the short part of the demo that worked showed some automated analysis but a lot of clicks necessary to accomplish the task. From what I saw, it will definitely speed up the tasks, if as advertised, with clear time and money savings, but it’s not as automated as implied and I think a better message is needed.

In addition, their message is focused on data warehouses while Michael said “We’re in the automation business not the data warehouse business,” which really doesn’t say anything.

Michael did talk for a bit about the bigger data picture that includes data warehouses as part of the full solution, but again there’s no clear message. While saying that he doesn’t like the term Data Lake, he’s another that can’t admit that it’s just the ODS. There’s also a discussion of the logical data warehouse, also not something new.

One critical and important thing Mr. Whitehead mentioned was something I’ve heard from a few people recently, the point that Hadoop and other “unstructured databases” aren’t really unstructured, they support late binding, the ability to not have to define a structure a priori but to get the data and then understand and define a useable structure for analysis.

What They Need to Say

This is the tough one and not something I’m going to solve in a short column. The company is targeting a sweet spot. Data access has exploded and that includes EDW’s not going away, the misnamed concept of Big Data and much more. Many products have been created to build databases to manage that data but the business intelligence industry is still in the place packaged, back-end systems were in the 1990s. Building is easier than maintaining and upgrading. A firm that can help IT manage those tasks in an efficient, affordable and accurate way will do well.

WhereScape seems to be aimed at that. However, their existing two-fold focus on automation and data warehousing is wrong. First, it doesn’t seem all that automated yet and, even if it was, automation is the tool rather than the benefit. They need to focus on the ROI that the automation presents IT. Second, from what was discussed the application has wider applicability than just EDW’s. It can address data management issues for a wider area of business intelligence sources and the message needs to include that.

Summary

Though the presentation was very disjointed, WhereScape seems to have focused on a clearly relevant and necessary niche in the market: How to better maintain and upgrade the major data sources needed to gain business understanding.

Right now, while there is a marketing staff at the company, WhereScape’s message seems to be solely coming from the co-founder and CEO. While that was ok in the very early days, they have some good customer stories, having led with Tesco’s success in this presentation, and it’s time to leverage a stronger and clearer core message to the market.

Where the issue seems to be is the problem I’ve repeatedly seen about messaging. The speed of the industry has increased and business intelligence is, on a whole, crossing Jeffrey Moore’s chasm. That means even younger firms need to transition from a startup, technically focused, message to a broader one much more rapidly than vendors needed to do so in the past.

While WhereScape has what seems to be the strong underpinnings of a successful product, they need to do some seriously brainstorming in order to clarify and incorporate a business oriented messaged throughout their communications channels – including in presentations by founders.

An ODS by any other name still smells like data

Data warehouse theory originally posited extracting data from systems, performing transformations on them and loading the resulting schemas into the data warehouse. It was a straight flow of information. However, the difference between theory and practice quickly reared its head. Today, people are talking about Data Lakes and Data Swamps. They’re not new, they’re just the ODS updated for modern data.

Data Warehouses and the ODS

Academics don’t have to deal with operational systems. In the 1980s and 1990s, those systems were growing, with ERP, CRM and other systems increasing the complexity and volume of data. Those mission critical systems, however, weren’t designed for extraction of information. They were primarily running on RDBMS systems that had locking schemas that could grind process and transactional systems to a halt while and extraction program kept open large blocks of records while transforming basic data in star schemas. Something needed to be done.

There was also a secondary effect that was very important to some people. IT, just as with ever other department in a large enterprise, isn’t monolithic. The people managing the operational systems knew their systems were mission critical and also knew how, in reality, those systems were big but fragile. They weren’t happy with opening their operational systems to other IT folks who were interested in non-operational things. Those folks answering other business problems? They were viewed as intruders, getting in the way of the “real work.”

For both reasons, intrusions into the operational systems were something to be kept to a minimum. IT organizations began using an Operational Data Store (ODS) to quickly open the operational systems, suck all the data out, willy-nilly (yes, I decided to use that term in a tech article…), and then go back to prime performance in an isolated system.  ODS 1

It was then the ODS that was the source of the data warehouse ETL process. On a tangent, this is why the people now arguing about ETL v ELT amuse me. It’s been ELETL for decade, if we want to be honest; but who cares? I’d rather have a BLT than spend so many cycles over slightly different acronyms for concepts that ETL handily describes, even in permutations.

The ODS comes into its own

The IT folks who were working to provide reports for mid- and high-level managers were always trying to tweak enterprise software reports, trying to extract nuggets of value. The data warehouse was a step forward and helped build a bigger picture. However, the creation of star schemas and other DW techniques aggregated data and lots a lot of detail. A manager would see an issue and want to backtrack, to drill-down into the data to know more.

The ODS became the way to do so. Very quickly, the focus changed from ODS in front of the data warehouse to both working side-by-side. Having all that raw data available gave the business analysts a way of providing much more detail and information to the business user. The first big BI companies, those such as Cognos, Business Objects and more, leveraged the two data stores to provide an ability to drill down past the aggregate information into the more detailed data.ODS 2

Having that large volume of data from multiple operational systems also intrigued people who weren’t data warehouse focused. They wanted to sift the raw data for technical or performance trends, things that weren’t of interest to the typical DW designers and users, but were important to mid-level management in manufacturing, marketing and other departments. Business analysts supporting those people began to turn to more and more analysis directly on the ODS data

The ODS comes to the fore – by another name

That was happening in the 1990s, at the same time another key phenomenon was growing: The Web. The growth of the web meant a lot more data about a lot more things. Web sites are operational systems to marketing in just as critical a way as an assembly line is to manufacturing. People became interested in ensuring that what visitors to web sites did was captured and available for analysis. However, as the volume of web traffic grew exponentially, new issues had to be looked at to handle that data.

Columnar databases were one solution, a way to speed up analysis of dimensions of information across individual records. The vastly larger amount of data also helped push emerging MPP technologies and drove creation of Hadoop and other technologies that could manage much larger data sources much faster and more cost efficiently than could individual Unix servers.

However, the web folks were new to IT and grew up in a different generation than the folks who designed and drove data warehousing. It’s natural to ODS 3want to take ownership of concepts, especially those on the edge. So the folks working with these new data sources began talking about Big Data as somehow completely different than what came before. If that was the case, they needed to think of some term for the database where they dumped all the data extracted from web sites. Data Lakes became one term. We’ve heard data swamp and other attempts to create unique terms so a company can differentiate itself from others. However, there’s already a name.

The ODS exists. It’s evolved. It’s moved forward. But it’s still the ODS.

Yes, really

“But,” you say, “an ODS is operational information and the data lake is so much more!” Well, not quite. There are two main problems with that argument.

First, times change. When the ODS was coined, the focus was on the back-end systems such as ERP, CRM, accounting and other fairly closed systems. It was before the web, before the ubiquity of mobile devices, before the wall between back-end and customer-facing systems was destroyed.

As mentioned, not just web sites are but even the internet is an operational system for your business – and not just for ecommerce companies. From lead generation, to maintenance and training, the internet is a key tool for providing operational support and generating business critical operational details.

Second, just as ETL can mean a number of things, so can ODS extend past a pure theory while still being relevant. CRM systems are considered operational but still contained sentiment and other information in comments fields. Just so, the vast volume of data from a call center’s voice recording system being dumped into the ODS have two components. There are basic details about the operation of the call center, things such as number of calls, call length and other details that are purely operational. There are also additional details about customers that can be distilled for strategy purposes, including the ability to provide sentiment analysis. Just because an operational system captures data that can be used for more than purely operational decision making doesn’t obviate that the information extracted resided in an ODS.

Summary

It’s a need of information technologists from all generations to realize that things change but retain context. The ODS isn’t what it was thirty years ago, but the data lake also isn’t some new creation born full blown from the web. There are few truly revolutionary technologies. You can be a brilliant person and contribute much to technology and business and still not be a revolutionary.

The ability to manage the vastly larger amounts of data than we had twenty years ago is critical. There are many innovative things being done. However, I consider the first expert systems, the first MPP algorithms and other similar technologies to be revolutionary. The fact that what is being done to allow business to gain insight combining more and more data from even more diverse sources is no less valuable to the industry because it is instead an evolutionary change.

The ODS has evolved. It doesn’t need a new name, just a tad more respect.

TDWI Webinar and Best Practices Report: Real-Time Data, BI and Analytics

TDWI held a webinar this morning to promote their new Best Practices Report on real-time data, BI and analytics. It’s worth a glance.

The report and presentation were team efforts by Philip Russom, David Stodder, and Fern Halper. The report, as usual, was centered around a survey and was a survey of IT people rather than business users. The report relates, “The majority of survey respondents are IT professionals (63%), whereas the others are consultants (20%) and business sponsors or users (17%).” Not much room there for the opinions of the people who need to use BI. Still, for understanding the IT perspective, it’s interesting.

The most valuable pointer in the presentation was given by Dave Stodder, who pointed out what too many folks ignore: Much of the want for real-time data is bounded by the inability of the major operational systems, such as ERP and CRM, to move from batch to real-time support. While BI firms can prepare for that, it’s the other vendors providing and the users adopting systems that allow real-time extraction in an effective manner that is the big bottleneck to adoption.

One issue that the TDWI folks and many others in our industry have is a misconception around the phrase “operational systems.” Enterprise software folks have grown up thinking of operations as synonymous with business operations. That’s not the case. All three of the analysts made that error even while discussing the fact that the internet of things means more devices are becoming data sources.

Those people who provide manufacturing software understand that and have for years. There’s much that can be leveraged from that sector but I don’t hear much mentioned in our arena. Fern Halper did mention IT operations as an area already using basic analytics, but I think the message could be stronger. Network management companies have decades of experience in real time monitoring and analysis of performance issues and that could be leveraged.

Build, buy or borrow are options for software as well as other industries, but I only see people considering building. We should be looking more to other software sectors for inspiration and partnerships.

There was also a strange bifurcation that Dave Stodder and Fern Halper seem to be making, by splitting BI and analytics. Analytics are just one facet of BI. I don’t see a split being necessary.

At the end of the presentation, they reviewed their top ten priorities (page 43 of the report). Most are very standard but I’ll point to the second, “Don’t expect the new stuff to replace the old stuff.” It’s relevant to the discussion vendors seem to think that revolutionary trumps evolutionary. It doesn’t. Each step in new forms of BI, such as predictive analytics, extends the ability to help business users make better decisions. It’s layered on top of the rest of the analysis to build a more complete picture, it doesn’t replace it.

Qlik Sense at the BBBT: Setting Up for the Future

Qlik was at the BBBT last week to talk about Qlik Sense. The presenters were Josh Good, Director of Product Marketing, and Donald Farmer, VP of Innovation and Design. It was a good presentation and Qlik Sense seems like the start of a good product, but let me start by discussing a tangent.

A startup’s voice: A marketing tangent

Startups usually have a single voice, the founder, CTO or somebody who is the single and sole owner of the vision. Sometimes it’s somebody who is put forward as the visionary, correctly or incorrectly. It takes a level of maturity in a company to clarify a core message to the level where it’s replicable by a wider variety of people and for the original spokespeople to let go. While the modern BI industry is still fairly young and every analyst group talks about the untapped market, Qlik is one of the biggest players in our nascent business.

Donald Farmer is a great presenter, a smart man and has been, until recently, the sole Qlik voice I hear in every presentation. While I don’t always agree with him, he’s a pleasure to hear. Yet I continually thought “why him, always?” There might be somebody else briefly doing a demo, but he was THE voice of Qlik.

It’s not only because of my product marketing experience that I was pleased to hear from Josh. He wasn’t the demo dolly, but let the presentation with Donald chiming in. They worked well together. It’s clear that both of the startup issues I mentioned are being addressed by a maturing Qlik marketing organization who are now using multiple voices well.

Qlik Sense

I’ve blogged about other companies recently, talking about the focus on UI. Thankfully, it’s spreading. Companies who focused, in the early days, on the business analysts are realizing that they need to better address the business knowledge worker. Qlik Sense has a nice, clean interface. It’s nowhere near the overcrowded confusion of most products from a few years back. For those who want to see it, the client software is freely downloadable to you can try it out.

The one issue I have is, again, the same one I’ve mentioned with many other vendors: ETL. Josh was another person who started the demo by importing a spreadsheet. Yes, I know there’s a lot of data in them and all products need to access spreadsheets, but it’s one way of avoiding the ETL issue. Other than very basic, departmental data, more complex decision making always involves other sources. It’s the heterogeneity of data that is today’s big issue. However, that’s a weak spot hidden by just about everyone.

What was nice was the software’s intelligence in building an initial data relationship diagram base on field name relationships. It’s a start and if they keep at it the feature can grow to something that can more easily show the business user the links between different pieces of information.

A number of vendors have recently begun to have their software look at data and propose initial visualizations based on data type. It’s an easy way for users to get going. Qlik Sense doesn’t do that and the response was marketing fluff, but the display to choose types is better than most. Rather than drop down to select charts, it displays the types with mini-images. That will do for now.

Mobile done well

One fantastic part of the demo was in how well they’ve integrated mobile into the system. They were going to show it anyway, but before Josh could get to it there was a problem with his PC. He quickly pulled up his iPad and, using the same account, continued on his way with the same information that was well formatted to the new display. A key point to that is that Qlik isn’t just using mobile devices for display, he was working to create visualizations on the device.

That other data…

I’ve already mentioned heterogeneity. A number of younger companies, focused on the Cloud, have created clear links to Salesforce and other cloud data sources to easily let SMB and departments access those data sources. Qlik does not have that capability, furthermore access to major ERP and CRM systems. That will still take strong interactions with IT to create links and access for the users.

That matters to me, for one example, because of the repeated demo examples from the sales arena. Yes, sales managers remain heavy users of spreadsheets, but SFA systems have made strong inroads and the ability to combine those sources quickly for sales management is critical.

Data Governance: Thinking ahead

One area where Qlik seems to excel is in thinking about the issues of data governance. Even in this early version of Qlik Sense they’ve included some powerful ways of controlling access, both from administration and a business user standpoints. I’ve seen other vendors talk about it and only some of them willing to show if questioned. Josh and Donald brought it up as part of their basic presentation and showed a nice interface.

Just as with the growth of PCs giving individuals power while hurting data governance, BI needs to get ahold of those issue and help the end user and IT work together to manage corporate data to follow business and legislative polices. Qlik’s focus on that is an important differentiator.

Summary

Qlik Sense is a new product. It has very good visualization, which should be expected from Qlik, and has moved forward to an improved UI for ease of use. While they still have issues of concern with data access, their data governance implementation seems to be ahead of the curve and is well thought out. It’s an early generation product, so it doesn’t bother me that it has some holes. The critical thing is to look at the products in the perspective of your timeframe of needs and see if it’s right for you.

Just as importantly, from my marketing perspective, is the maturation of the marketing message and team. I’m hearing multiple voices speaking the same message. On the product and corporate fronts, Qlik is moving ahead in a good direction.

SiSense at the BBBT: High Performance BI at Low Cost?

The latest presentation at the BBBT was by Amit Bendov, CEO, Sisense. First marketing warning: If you’re going to their web site, be prepared. Maybe it’s only for some weird Halloween thing, but the yellow and black background of the web site is the one of the ugliest thing I’ve seen for a professional company. However, let’s look under the covers, because it gets better.

The company was founded in 2004 and Amit says the first sales were in 2010. There’s a good reason for that delay. They are yet another young company who talks about being a full stack BI provider, being more than a visualization tool but also supposedly providing ETL, data storage and the full flow for your information supply chain from source systems to display. That technology took a while to develop.

Technology: Better integration of memory and Disk

The heart of their system is a patent pending technology that tightly integrates cpu cache, RAM and disk to better leverage all storage methods for higher performance. The opportunities that theory provides are enough that they’ve received $50 million (USD) in venture funding, $30 million in their latest round, earlier this year.

As they are a startup, it’s no surprise that the case studies given were for SMB or departments within enterprises. That’s the normal pattern, where a smaller group takes advantage of flexibility to try new products to solve focused problems. As their customer list includes companies such as Ebay, Wix, ESPN and Merck, companies with lots of data, those early entrants increase the potential if Sisense continues to perform.

Another key technology component is their columnar database. They created a proprietary one to be able to support their management technology. That’s completely understandable as their database isn’t purely on disk or memory, but in a combined mix that needs special database management.

The final key to their technology is that they worked to ensure the software runs on commodity chips from the X86 heritage. That means it runs on normal, affordable, off the shelf servers, not on high priced appliances.Sisense hardware price comparison

The combination of the speed and affordability of the technology is justification for the rounds of funding they’ve received.

Really full stack?

One fuzziness that I’ve mentioned with other full stack vendors is the ETL side of the process. The growth of Cloud companies such as Salesforce, and the accessibility of their APIs, means that you can get a lot of information out of systems aimed at SMB. However, true enterprise ETL means accessing a very wide variety of systems with much less easy or open APIs. When Mr. Bendov talked about multiple systems, it seems, from presentation and demo, that he’s talking about multiple instances of simple databases or open APIs, and not a breadth of source types. There wasn’t a lot of choice in the connection section of his application.

That’s not a problem for companies at Sisense’s state of maturity, as long as there’s a business plan to expand to more enterprise sources. They need to focus on proving the technology in the short term and having more heterogeneous access in their tool bag for the future.

Another issue is the question of what, exactly, their database is. Amit Bendov made a brief comment about not needed data warehouse, but as I and others quickly brought up, there are two problems with that statement. First, they would seem to be a data warehouse. They’re extracting information from source systems, transforming that information even if not into the old star-schema structures, and providing the aggregate information for analysis. Isn’t that a high level description of a warehouse? Second, as they’re young and focused on SMB or departments, as with other companies who serve visualization, they might need to look at customer demands and get access to corporate data warehouses as another source.

The old definition of a federated data warehouse seems to be evolving into today’s environment where sometimes an EDW is a source, other times a result and sometimes it’s made up of multiple accessible components such as Sisense and other databases. Younger companies who disparage EDWs need to be careful if they wish to address the enterprise market. The EDW is evolving, not dying off.

User interface and more

One of my first trips to Israel was, in part, when my boss and I had to bring a couple of UI specialists to show Mercury Interactive’s programmers why it might be nice to rethink application interfaces. It’s wonderful what twenty years have wrought. Amit Bendov says that Sisense has one UI specialist for every two programmers, and the user interface shows that. While I mentioned that they need broader ETL access, the simplicity of getting to sources is clear. While you still will need a business analyst to understand some column names, it’s a very easy to use interface.

The same is true in the visualization portions of their application. While it’s still a simpler tool, it has all the basics and is very clear to understand and use.

Paving the way for their spread into enterprise, the Sisense team also supports single-sign on, basic data access control, both in global administration and in the user interface, and other things that will be needed to convince a larger corporation to spread the technology.

Summary

Sisense looks like a startup in a great position. Their technology is well thought out and seems to be performing very well in the early stages. Affordable, fast, business intelligence is something nobody will turn down.

The challenge is two-fold:

  • Do they have the technology plans to help them address larger enterprise issues?
  • Do they have the mindset to understand the importance not only in marketing, but in changing the marketing to a more business focus?

This is the same refrain you’ve heard from me before and which you’ll hear again. This is the Chasm challenge. Their technology has a great start, but their web site and presentation show they aren’t yet thinking bigger and we’ll have to see what the future holds both for the technology and the messaging.

Business intelligence is a very visible market and one growing quickly. While small companies need to focus on the early adopters, they must very rapidly learn how to address the enterprise, both in products and marketing.

High performance BI at a reasonable cost is a great sell, but Sisense isn’t yet read for full enterprise. Sisense has a great start but life is fluid.

TDWI / Actuate Webinar on Visualization: Not much there

Maybe it’s because of the TDWI conference now going on in San Diego, but this morning’s webinar on “Making Data Beautiful for Business Users” seemed a bit of an afterthought. The presenters were Dave Stodder, TDWI Director of Research, and Allen Bonde, VP Product Marketing and Innovation, Actuate. There were a few interesting moments, but not a lot of even basic content.

Dave Stodder began with a whole bunch of quotes from other people. I admit, it’s a quick way to put together a presentation, but then you should paraphrase and explain why the quotes matter rather than just reading them verbatim – we, the audience, are already doing that.

However, then he got to the three main goals of improving visualization in BI:

  • Improving self-service
  • Shortening the path to insight
  • Advancing business agility

To be honest, those are accurate but also valid for every other point in reporting throughout history. Businesses always want to enable decision makers to help make more accurate and timely decisions through better information.

What followed was one of the keys to TDWI success: An interesting slide based on one of their surveys.TDWI Visualization ROI Focus slide

Improved operational efficiency was a clear number one. The problem is that the data is most likely from IT respondents rather than from business users. I asked the question about that but it wasn’t answered. I predict that if you asked business users you’d find the second two items, faster response and identify new opportunities, would be at the top.

One important point Dave Stodder made was about alert fatigue. It’s tempting to have visualizations and other tactics that alert anytime things change, but too many alerts mean people stop paying attention. It reminded me of my days as a sales engineer, back in the days of pagers. Another SE and I had to sit down one of the sales people and explain that if he appended 911 to every page then nothing was important.

The only part purely focused on visualizations were two slides. One was just a collection of a few visualization types and the other was another TDWI survey about which visualization types are currently being implemented. There wasn’t a discussion of the appropriateness of the ones being used the most, any reason to better focus on some being ignored, or any discussion about how many are provided by packaged BI tools versus are home grown by the supposedly valuable data scientists.

Allen Bonde then took over and didn’t focus on visualization. He gave a rather generic Actuate sales pitch, mentioning platforms built for scale, the importance of an open community and didn’t show any visuals on visualization.

It wasn’t that the presentation was terrible, it’s only that it was far too generic. What was said about visualizations could be said about just about any reporting and there wasn’t really any direct focus on visualization. It’s one thing to quote Tufte, it’s another to have a discussion about current tools and what’s coming. That later was missed.

Maybe after the conference we’ll see another webinar with clearer focus.

SQL v Hadoop: The Wrong Conversation

“No SQL!”

“Hadoop doesn’t require you to work in SQL!”

The claims are everywhere, but do they mean anything? To ruin the suspense: No.

There seems to be a big misunderstanding or a big lack of communications in the realm of big data. I keep hearing company after company compare Hadoop to SQL, claiming the former is somehow better than the later. Sadly, that’s comparing apples to screwdrivers.

Hadoop is a database technology. It’s based on MPP architecture for the Cloud. Hadoop compares to flat files, relational databases and other methods for storing information in structures.

SQL is an query language. It’s similar to an API in that it’s just a way to communicate with the data source. Long ago, in the dawn of time, SQL was tightly tied to DB2 and the relational environment that spawned the syntax. However, along came the 1980s, Unix servers and PCs, and the need to access lots of different data sources and an unwillingness to have to have very separate query languages for each data source.

Along came ODBC to the rescue. It standardized core query syntax using the SQL paradigm and allowed, under the covers, the ODBC developer to use an API to translate almost standard queries into the language of each data source. It extended SQL to access new things.

In the meantime, as RDBMS technologies began to try to find ways around the basic limitations of relational databases, the companies added extra features such as stored procedures that extended SQL even further from the origins of basic definition and query of relational structures.

So now we have a mass of coders who have only worked with large, primarily Web oriented databases using non-RDBMS technology. No surprise, they had to code their own interfaces and queries, getting into the details of the newer systems. At the same time, they probably brushed through and overview of RDBMS and SQL in school and then never used it again.

That meant a misunderstanding of the difference between database and query. Therefore, the message of No SQL will retard their progress in integrating their solutions with the existing IT data infrastructure.

There’s a large need for people who can work with Hadoop and other younger data sources. There’s also a vast pool of people who know SQL. Yes, there will always be a need for Hadoop gurus just as there is for every technology, but the folks wanting to get information out of data sources don’t need to know the data sources, they need to get the information – and they know SQL.

A number of vendors have figured that out and are now offering SQL as a means to access Hadoop. It’s a natural fit, an extension of what the people pushing Hadoop are hoping to achieve. Hadoop and other distributed, non-row based architectures are there to expand knowledge. They’re great ways to better understand the vast body of data coming in from many new sources. However, until you can get that data to the business knowledge worker, it’s not information. SQL is the clearest way to quickly bridge that gap.

The people who realize that it’s not an either/or decision, who understand that Hadoop and SQL not only can but should work together are the people who will drive their companies forward by quickly addressing real business needs.

SQL v Hadoop is the wrong conversation. SQL and Hadoop is the right one.

Webinar: IBM, Actuate and Cirro describe faster analytics

Today a webinar was hosted by Database Trend and Applications. While there are important things to talk about, I’ll start with the amusing point of the inverse relationship between company size and presenter title found in every webinar, but wonderfully on display here. The three presenters were:

  • Mark Theissen, CEO, Cirro
  • Peter Hoopes, VP/GM, BIRT Analytics Division, Actuate
  • Amit Patel, Program Director, Data Warehouse Solutions Marketing, IBM

The topic was “Accelerating your Analytics for Faster Insights.” That is a lot to cover in less than an hour, made more brief by a tag team of three people from different companies. I must say I was pleasantly surprised with how well they integrated their messages.

Mark Theissen was up first. There were a lot of fancy names for what Cirro does, but think ETL as it’s much easier. Mark’s point is that no single repository can handle all enterprise data even if that made sense. Cirro’s goal is to provide on-demand distributed analytics, using federation to link multiple data sources in order to help businesses analyze more complete information. It’s a strong point people have forgotten in the last few years during the typical “the latest craze will solve everything” focus on Hadoop and minimizing the role of getting to multiple sources.

Peter Hoopes then followed to talk about doing the analytics. One phrase he used should be discussed in more detail: “speed wins.” So many people are focused on the admittedly important area of immediate retail feedback on the web and with mobile devices. There, yes, speed can win. However, not always. Sometimes though helps too. That’s one reason why complex analysis for high level business strategy and planning is different that putting an ad on a phone as you walk by a store. There are clear reasons for speed, even in analytics, but it should not be the only focus in a BI decision.

IBM’s Amit Patel then came on to discuss the meat of the matter: DB2 Blu. This is IBM’s foray into in-memory, columnar databases. It’s a critical ad to the product line. There are advantages to in-memory that have created a need for all major players to have an offering, and IBM does the “me too!” well; but how does IBM differentiate itself?

As someone who understands the need for integration of transaction and analytic systems and agrees both need to co-exist, I was intrigued by what Amit had to say. Transactions going into normal DB2 environment while being shadowed into columnar BLU environment to speed analytics. Think about it: Transactions can still be managed with the row-oriented technologies best suited for them while the information is, in parallel, moved to the analytics database that happens to be in memory. It seems to be a good way to begin to blend the technologies and let each do what works best.

For a slightly techhie comment, I did like what Mr. Patel was saying about IBM’s management of memory and CPU. After all, while IBM is one of the largest software vendors in the world, too many folks forget their hardware background. One quick mention in a sentence about “hardware vendors such as Intel and IBM…” was a great touch to add a message that can help IBM differentiate its knowledge of MPP from that of pure software companies. As a marketing guy, I smiled big time at the smooth way that was brought up.

Summary

The three presenters did a good job in pointing out that the heterogeneous nature of enterprise data isn’t going away, rather it’s expanding. Each company, in its own way, put forward how it helps address that complexity. Still, it takes three companies.

As the BI market continues to mature, the companies who manage to combine the enterprise information supply chain components most smoothly will succeed. Right now, there’s a message being presented by three players. Other competitors also partner for ETL, data storage and analytics. It sounds interesting, but the market’s still young. Look for more robust messages from single vendors to evolve.

HP Vertica at the BBBT: Technology v Solution

The latest BBBT presentation was from HP Vertica’s Will Cairns and Steve Sarsfield. I know it’s hard to miss HP’s presence in any market, but for those few of you who may have done so HP acquired Vertica in early 2011. Vertica is a columnar database focused on large data sources for analytics. Will and Steve were a good tag team, switching back and forth as need be; so unlike other presentation reviews I will rarely be noting who said what.

The smallest installation they mentioned runs on HP Vertica is 1.5 terabytes up to very large ones such as at Facebook, their largest customer. Without a doubt, HP plays at the larger end of the analytics market. They have a strong and powerful database and it seems HP’s hardware experience and Vertica’s database knowledge seems to have been integrated far better than other HP acquisitions in the previous decade.

The problem I often come back to discuss, whether talking about a startup or a company such as HP, is the issue of technical problems versus business solutions.

Will Cairns did say one thing that should be paid attention to by many who talk about unstructured data. His very accurate point is that “unstructured data doesn’t stay unstructured long.” We talk about conversations as unstructured, but to get information from those, we must part the syntax of sentences, look for key words and meaning, and extract semantics with meaning. Those items can then be similarly structured in order to compare, analyze and draw conclusions.

However, the weak spot in his eyes is his title. He constantly referred to “supporting data scientists” rather than supporting data science. As the programmers who know statistics create more and more packages that can analyze data, it’s the analytical capabilities being provided to business people that matters, not the people who call themselves data scientists who also just exist to serve the end business use.

One interesting techie note about their MPP database is that there isn’t an automatic lead node. While there’s no independent analysis for intelligence allocation of notes other than, it seems, basic load balancing, the idea that you can automatically define a lead node based on balancing, not before, does imply a good ability to manage distributed resources.

One thing I’ve asked a few folks who push columnar databases came up again in this presentation. They were talking about something called projections, which seemed to be ways to index the data for faster access. However, they claimed it’s not indexing but gave no clear explanation.

I then asked the question that always intrigues me. It’s clear that columnar databases have a great strength in analytics across records because indexes aren’t needed for columns, but it’s clear that both row and column based analyses have value, so getting a clearer picture how any database supports both would seem to be important. I pointed out that indexes in row-based databases exist to allow faster search of columns. The question is: What techniques are used to speed up row based searches in columnar databases if no indexes exist. They didn’t have an answer.

One slide that created a great conversation was one of the types of analytics and their definitions. Claudia Imhoff and others questioned the difference between predictive, prescriptive and pre-emptive analytics. While better clarity is definitely needed, the attempt is a great conversation starter for the industry.

HP Vertica - Hindsight to Foresight slide

Summary

HP Vertica seems to be a database that should be evaluated for large data volume analytics. However, they seem to have a focus on the technology not on why companies want the technology. There was no real discussion of results, or of partnerships with BI vendors to provide end user value. I expect that successful sales won’t be purely HP. They are focused purely on IT and programmers who are building very complex algorithms. They’ll need either a channel or ISV partner to round out the picture to an enterprise who needs to see the full business value chain.

It seems to be a very strong product, but only part of the solution.

TDWI, Claudia Imhoff and SAP: Data Architecture Matters

In a busy week for TDWI webinars, today’s presentation by Claudia Imhoff, Intelligent Solutions, and Lother Henkes, SAP, was about how the continuing discussion of the place in the data world for the data warehouse.

While many younger techies think the latest technology is a panacea and many older techies are far too skeptical for too long, the reality is that while the data warehouse is going nowhere, it has to integrate with the newer technologies to continue improving the information being provided to business knowledge workers.

One of Claudia’s early slides talked about data sources. While most people are focused on both the standard packaged software and the rush of non-structured data from the Web, call centers, etc, Claudia makes clear the item that companies are just beginning to realize and address: Sensor data is just as important as the rest and also driving data volumes. Business information continues to come from further afield and a wider variety of sources and all must be integrated.

Much of her talk, she mentioned, has come out of a couple of years of work between herself and Colin White, in formalizing the changing data architecture environment. Data warehouses are still the place for production reports and analytics, where data provenance and clarity are absolutely necessary while the techniques used on early stage data such as in streaming, Hadoop analytics, etc, are more exploratory and investigative. The duo posit that the combination of data integration, data management (including EDWs), data analysis and decision management are the “glue in the middle,” those things that bind sources, deployment and distribution technologies, and reporting and analytics options into a real system that provides value.

The picture they put together is good and Claudia Imhoff’s presentation should be looked at for a better understanding of where we are; but I wouldn’t be me if I didn’t have a couple of issues.

The first is a that she is a bit too enamored of mobile technology. It’s here and must be addressed, but statements such as “nobody has a desktop, everything is mobile” must be corrected. A JD Power survey last year showed that only 20% of tablets are used for work. On the other side, Forrester Research has pointed out a strong majority of business people are now using two devices for their information.

The issue for business intelligence is not that people are switching from desktops (including laptops in docking stations) but that smart providers of information need to build UIs that address the needs of large monitors, tablets and smartphones, addressing each device’s uniqueness while ensuring a similarity of user experience.

The second issue is a new term thrown out during the presentation. It’s “data refinery” and, as Claudia mentioned in her presentation, it’s the same thing others are calling a data swamp, data lake or numerous other terms. There’s an easy term everyone has used for years: Operational Data Store (ODS). I’m a marketing guy and I understand the urge for everyone to try to coin a term that will catch on, but it’s not needed in this case.

While it’s a separate topic (yeah, another concept for a column!), I’ll briefly point out my objections here. Even back in the late 1990s, during my brief sojourn at Informatica, we were talking about how the ODS can be used for more than only a place to use in order to quickly extract information from operational system so as not to stress them by doing transformations directly from such systems. They’ve always been a place to take an initial look at data before beginning transformations into star schemas and the like. The ODS hasn’t changed. What’s changed is the underlying technologies that support larger data stores and the higher level analytics that let us better analyze what’s in the ODS.

That brings us to one main point Claudia Imhoff made during her wrap-up, the section on business considerations. She points out that people really need to understand the importance of each data source and the data within it. Just because we can extract everything doesn’t mean we need to save everything. Her example was with customer sampling. Yes, you can get all the customer data, but only that which you need to narrow cast. For higher level decision making, those who understand confidence levels know that sampling can get to very high levels of certainty so sampling can still speed decision making and save costs. Disk space might be less expensive in the Cloud, but it’s not free. We’re in the job of helping businesses improve themselves, so we need to look at the bigger picture.

Her presentation was clearly strategic: We need to rethink, not reinvent, data modeling. Traditional techniques aren’t going away and neither are many of the new ones. Data management people need to understand how they combine.

No surprise, that was a great transition to Lother Henkes’ presentation. His key point is that SAP BW now can run on SAP HANA. It’s important even if all the capital letters look like shouting. HANA is SAP’s in memory, columnar database that’s their entry into the Cloud market to manage the high volumes of modern data. It’s a move to bridge the gap between the ODS and relational database arenas with one underlying infrastructure.

In such a brief webinar, it’s hard to see more than the theory, but it’s a clear move by SAP to do what Claudia Imhoff suggested, to take a fresh look at data models in order to understand how to better support the full range of data now being incorporated into business decision making.