Author Archives: David Teich

Actian at the BBBT: Hadoop Big Data for the Enterprise Mass Market?

In the mid-90s, Sybase rolled out its new database. It was a great leap forward in performance and they pushed it like crazy. Sybase’s claims were justified, but it was a new way to look at databases and Sybase loudly announced how different it was from what people were used to using. Oops. They sold almost none of it and hit a financial wall and they never quite recovered.

That came to mind during yesterday’s BBBT presentation by Actian. Their technology foundation goes back to Ingres and that means they’ve been in the database market a long time. The question is whether or not they’ve learned from past case studies.

The presenters were John Santaferraro, VP of Solution and Product Marketing, and Emma McGrattan, SVP Engineering. They gave a great technical overview of Actian’s offerings. Put simply, they’re providing a platform for Big Data access. At the core is Hadoop, but they’ve taken their deep understanding of RDBMS technology and incorporated SQL access. That clearly opens up two things:

  • Better access to partners for ETL and analytics
  • The ability for the mass of business analysts to get at Hadoop data to more easily perform their jobs.

That’s a great thing and I’ll discuss later whether they’re taking that technology to the right markets. Before that, however, I should point out the main competitive point they repeatedly hit on. TPC benchmarks are public, so they went out and compared themselves to who they consider, rightly, to be their main competition: Cloudera Impala. Their results are seen in the chart below.

Actian performance comparison

Actian’s TPC-DS comparison with Cloudera Impala

 

They returned to this time and time again. On the other hand, they discussed the full platform intelligently but only briefly.

They also covered more of the technology, and there’s a lot of it. As a Computer Associates company, they grow by acquisition. It’s not just a renamed Ingres, but has acquired, VectorWise, Versant, Pervasive and ParAcell. Many companies have had trouble acquiring and integrating firms, but the initial descriptions seem to be showing a consolidated platform.

One caveat: We had no demo. The explanation was the Hadoop Summit demo went so well that they’re in the middle of moving it to a new server and IT didn’t give a heads up. Believable, but again I personally am not too worried. As a former field guy, I know how little emphasis to put into a short demo.

So what did I think was the key technology, if not performance? That’s next.

Hadoop meets SQL

To folks focused on the largest data sets and others, as in car ownership, who like speed for the pure sake of it, the performance is impressive. To me, that’s not the key. Rather, it’s the ability to bridge the Hadoop-SQL divide. As John Santaferraro pointed out, orders of magnitude more business analysts and business users know SQL than know MapReduce and the related underpinnings of Hadoop.

Actian Hadoop platform for big data

Actian platform

While other Big Data companies have been building bridges to ETL, data cleansing, analytics and other tools in the ecosystem, custom work to do that is time consuming. Opening the ability to use standard, existing SQL tools means you can more quickly build a stronger ecosystem.

Why does that matter?

What is the market

During the presentation, the Actian team was asked about their sweet spot. Is it folks already playing with Hadoop who want better access to enterprise data or is it companies who’ve heard about Hadoop but haven’t stepped in yet to try because of all the questions. Their answer was the first group. I think that’s wrong, however, I understand why they are

Another statement from John was that they are in Silicon Valley and everyone there thinks everyone uses Hadoop because everyone there does. He admitted that’s not true out of the small region. However, sometimes it’s hard to fight the difference between what you intellectually know and what you’re used to. I’ve seen it in multiple companies, and I think it’s happening here.

The mass of global businesses haven’t yet touched Hadoop. It’s very different from what the typically overburdened and underfunded IT organization does, and that much change is scary. Silicon Valley is full of early adopters, it attracts them. In addition, there are plenty of early adopters out there for the picking. However, there are now a lot of vendors in the BI and big data spaces and we’re getting close to a tipping point. The company that figures out how to cross the chasm first is the one who will make it big.

It’s not pure performance that will attract the mass market, it’s how to get the advantages of big data in the most affordable way with the easiest transition path. It’s the ability to quickly leverage existing IT infrastructure and to join it with the newest technology.

Once again, it’s evolution rather than revolution that will win the day.

Summary

From what I saw of the platform, it’s a great start. The issue I see is the focus on the wrong market. The technology will always be important, but though it’s critical it only exists to solve the business problems. Actian seems to have a good handle on the technology and are on a path to integrate and leverage all the acquisitions into a solid platform, but will they be able to explain why that matters to the right market?

There is hope for that. One thing discussed is that their ability to bridge SQL and Hadoop means they are working on building partnerships with major vendors to extend their ecosystem. If they focus on that, they have a great chance of being very successful and being the company that brings Hadoop to the wider IT market.

Twitter: @actiancorp, @santaferraro & @emmakmcgrattan

Salient at the BBBT: The Thin, Blue Suited Line

The problems with the USA’s Veterans Administration are in the news. Much of the scheduling issues have to do with large volumes of modern data being run through decades old systems built by systems integrators (SI). Custom built systems can be the choice during the early stages of a software solution category’s life cycle. However, they are very difficult to upgrade and modernize.

Last Friday’s BBBT presentation was by Salient Management Company. Salient is a consulting group with a product and all the potential hazards of SI’s came to mind. The presenters, David F. Gianneto and Jim McDermott, are both in professional services. The obvious question was how much of their solution is customized versus how much is truly a software solution that provides the ability to upgrade and adapt as needs change.

The Software-Services Balance

Modern business software is complex. Every software firm must have professional services, either internally or through partners, to help with implementation. Many software founders think their software is so wonderful that they don’t need serious professional services. Many professional services companies think every client is so different that software must always be heavily customized. How do technology executives balance the difference issues of ISV software development and the need for consulting? More importantly, for this article, how does Salient management look at that?

From the cursory experience of a three hour presentation, the balance is a strength of Salient. One slide, in particular, pointed to a logical split. They point that the business user is not the person who has to understand the technology. That’s something everyone agrees is true, but not many companies seem to understand how to address.

David Giannetto and Jim McDermott presented a company that claims to focus consulting at ensuring an understanding of the client’s business model, helping to ensure that the implementation does address business needs, while demonstrating a product that looks like a standard interface.

While that was the focus of the talk, the product demo implied other consulting. They did not cover the complexity of ETL, even though questioned, so I’m also assuming there is significant technical professional services needed to link to data sources. That assumption is backed up by the fact that Salient uses a proprietary database that wasn’t discussed in detail.

One critical point about their technology insight is that Salient began with an in-memory architecture thirty years ago. It was a risky choice as most companies thought that the price/performance ratio benefits of disk would grow far more than RAM. The drop in RAM prices and the growth of parallel computing software are providing strong backing for their initial gamble. They have a clear focus on technology and products.

Their offering seems to be a sandwich of services to understand the business and implement the data acquisition on the outside with robust software for the BI users at the heart. I can see a continued strength in the business consulting, but the robustness of some newer vendors as far as simplifying the back end, and thereby lowering those costs and shortening implementation times, is a potential risk to the Salient model.

The Interface

Jim’s demo showed both dashboards that allow management to slice & dice basics themselves and a designing interface with more capabilities for power users and analysts. While they claim that the software changes the roll of IT from control to governance, I still didn’t see anything that allows the end users to integrate new sources. IT is still required for more than just governance.

There was also a very good example of how geospatial information is being integrated into the analysis to better understand demographics and logistics. In the CPG market, that can provide a crucial advantage.

One key point that some competitors might knock is that most of the charts and graphs aren’t as fancy as in some BI tools. However, my response is “so what?” First, they accomplish the same things. Second, focusing on how fancy graphs look sometimes creates overly complex displays that can slow understanding. When we’re dealing with executives who have worked with Microsoft Excel or with Crystal Reports for decades, a way of seeing new analytics clearly and simply, almost in the style they’ve seen can help adoption. The focus is on understanding the information and I thought the simpler graphics had the benefit of a comfort level for managers.

Summary

Overall, I was impressed by Salient. The combination of strong business consulting, a good BI interface and a history in in-memory data management means that they’re well positioned to address Global 1000 firms. Any large organization should evaluate the Salient offering for the combination of product and services.

The risk I see is that the service/software balance is right for large companies, I don’t see them getting into the SMB market anytime soon. While that might not concern them in the short term, one word: Salesforce.com. There are new vendors coming up which are much easier to implement. If they can grab a large chunk of the SMB market, they can then move up the food chain to challenge the large companies, as Salesforce has done in their markets.

I see Salient growing, I’m just not sure if they’ll be able to grow as fast as the market is growing. Depending on their plans, that could be good or bad.

The Myth of the Data Scientist

I’ve been waiting for notification and searching on toolbox.com’s site itself. Silly me, thinking their search would work. Now that I’ve moved across the country and am getting settled in, after a few weeks of “fun,” I did a yahoo search. Found it!

Before Ziff-Davis decided they didn’t want more IT Management track articles from a writing house I worked with, they bought two of my articles. The first is The Myth of the Data Scientist. I’ve muttered about it before and will again, but that’s a longer article. Enjoy.

TDWI and HP Webinar: Modernizing the Data Warehouse

After a couple of mediocre webinars, it was nice to see TDWI get back on track. This week’s seminar was sponsored by HP Vertica and discussed Data Warehousing Modernization. The speakers were Philip Russom, from TDWI, and Steve Sarsfield, Product Marketing Manager, HP Vertica.

Philip led with the five key reasons organizations need to modernize Enterprise Data Warehouses (EDWs):

  • Analytics
  • Scale
  • Speed
  • Productivity
  • Cost Control

He pointed out that TDWI research show the first three to be far more of a key focus for companies than that others. One key point was that cost control should have more of an impact than it does. Mr. Russom pointed out that even if your EDW peforms properly today, much of the new technology is based on open source and less expensive servers, so a rethink of your warehouse can bring clear ROI, as he pointed out with ““Modernization is a great opportunity to rethink economics.”

Another major point was the simple fact, overlooked by many zealots, that EDWs aren’t going anywhere. Sure, there are newer technologies that allow for analytics straight from operational data stores (ODSs) and other places, but there will always be a place for the higher latency accumulation of information that is the EDW.

After that setup, Steve Sarsfield gave the expected sponsor pitch for how HP Vertica helps companies modernize. It’s also good to say that his presentation was better than most. It walked the right line, avoiding the overly-salesy and too technical extremes of many sponsor pitches.

Sarsfield’s main point is that Hadoop is great for ODSs but implementations still haven’t gotten up to speed in joins and other data manipulation capabilities seen in the mature SQL environment. He described HP Vertica as having the following key components:

 TDWI HP Vertica Secret Sauce

I think the only one that needs explanation is the last, Projections. If not, please let me know and I’ll expand on the others. Projections are, simply put, the HP method for replacing indices. Columnar databases don’t provide the index structures that standard RDMS systems based on rows provide.

It was a good overview that should bring HP into the mix for anyone looking to modernize their EDW environment.

The final point that came up during Q&A was about Big Data. It’s one many folks have made, but we know how much you listen to analysts pontificating…

Philip Russom pointed out, as many have, that Big Data isn’t about the size of the data but about managing the complexity of modern data. He did that point pitching the most recent TDWI Best Practices Report, Evolving Data Warehouse Architectures in the  Age of Big Data. What Philip pointed out was that the judges regularly came back with clear opinions that complexity was more important than database size. Very large databases where people were just doing aggregations of columns weren’t interesting. It was the ability to link to multiple sources and provide advanced insight through analytics that the judges felt most reflected the power in the concept of Big Data.

All told, it was a smooth and informative presentation that hopefully helped its IT audience understand a bit more about the issues involved in modern data warehousing. It was time well spent.

GoodData at the BBBT

Today’s BBBT presentation was by GoodData and I’m still waiting. Vendor after vendor tells us that they’re very special because they’re unique when compared to “traditional BI.” They don’t seem to get that the simple response is “so what?” Traditional BI was created decades ago, when offering software in the Cloud was not reasonable. Now it is. Every young vendor has a Cloud presence and I can’t imagine there’s a “traditional” company that isn’t moving to a Cloud offering. BI is not the Cloud. I want to hear why they have a business model that differentiates them from today’s competitions, not from the ones in the 1990s. I’m still waiting.

Almost all the benefits mentioned were not about their platform, they weren’t even about BI. What was mentioned were the benefits that any application get by moving to the Cloud. All the scalability, shareability, upgradability and other Cloud benefits do not a unique buying proposition make. Where they will matter is if GoodData implemented those techniques faster and better in the BI space than the many competitors who exist.

Serial founder, Roman Stanek wants his company to provide a strong platform for BI based on Open Source technology. The presentation, however, didn’t make clear if he really had that. He had the typical NASCAR slide, but only under NDA, with only a single company mentioned as an open reference. His technological vision seems to be good, but it’s too early to say whether or not the major investments he has received will pay off.

What I question is his business model. He and his VP of Marketing, Jeff Morris, mentioned that 2/3 of their revenue comes from OEM agreements, embedding their platform into other applications. However, his focus seems to be on trying to grow the other third, the direct sales to the Fortune 2000. I’m not sure that makes sense.

Another business model issue is that the presenters were convinced that the Cloud means they can provide a single version of product to all customers. They correctly described the headaches of managing multiple versions of on-premises software (even if they avoided saying “on-premise” only a third of the time). However, the reason that exists is because people don’t want to switch from comfortable versions at the speed of the vendor. While the Cloud does allow security and other background fixes to easily update to all customers, any reasonable company will have to provide some form of versioning to allow customers a range of time to convert to major upgrades.

A couple of weeks ago, 1010data went the other direction, clearly admitting that customers prefer that. I didn’t mention that in my blog post on that presentation, even though I thought they went too far in the other direction of too many versions, but combined with GoodData’s thinking there should only be one, now’s as good a time as any to mention that. Good Cloud practices will help minimize the number of versions that need to be active for your customers, but it’s not reasonable to think that will mean a single version.

At the beginning of the presentation, Roman mentioned a company, as a negative reference: Crystal Reports. At this point, I don’t think that comparison is at all negative. Nothing that GoodData showed us led me to believe that they can really get access to the massively heterogeneous data sources in true enterprise business. He also showed nothing that indicates an ability to provide top level analysis and display as required in that market. However, providing OEM partners a quick and easy way to add basic BI functions to their products seems to be a great way to build market share and bring in revenue. While Crystal Reports seems archaic, it was the right product with the right business plan at the right time, and the product became the de facto standard for many years.

The presentation left me wondering. There seems to be a sharp team but there wasn’t enough information to see if vision and product have gelled to create a company that will succeed. The company’s been around since 2008, just officially released the product, yet have a number of very interesting customers. That can’t be based just on the strong reputation of Mr. Stanek, there has to be meat there. How much, though, is open to question based on this presentation. If you’re considering an operational data store in the Cloud, talk with them. If you want more, get them to talk to you more than they talked to us.

Cloudera at the BBBT: The limits of Open Source as a business model

Way back, in the dawn of time, there were ATT and BSD, with multiple flavors of each base type of Unix. A few years later, there were only Sun, IBM and HP. In a later era, there was this thing called Linux. Lots of folks took the core version, but then there were only Redhat and a few others.

What lessons can the Hadoop market learn from that? Mission critical software does not run on freeware. While open source lowers infrastructure costs and can, in some ways, speed feature enhancements, companies are willing to pay for knowledge, stability and support. Vendors able to wrap the core of open source up in services to provide the rest make money and speed the adoption of open-source based solutions. Mission critical applications run on services agreements.

It’s important to understand that distinction when discussing such interesting companies as Cloudera, whose team presented at last Friday’s BBBT session. The company recently received a well-publicized, enormous investment based on the promise that it can create a revenue stream for a database service based on Hadoop.

The team had a good presentation, with Alan Saldich, VP Marketing, pointing out that large, distributed processing databases are providing a change from “bringing data to compute” to “bringing compute to data.” He further defined the Enterprise Data Hub (EDH) as the data repository that is created in such an environment.

Plenty of others can blog in detail about what we heard about the technology, but I’ll give it only a high level glance. The Cloudera presenters were very open about their product being an early generation and they laid out a vision that seemed to be good. They understand their advantages are the benefits of Cloud and Hadoop (discussed a little more below) but that the Open Source community is lagging in areas such as access and control to data. It’s providing such key needs to IT that will help their adoption and provide a revenue stream, and their knowing that is a good sign.

I want to spend more time addressing the business and marketing models. Cloudera does seem to be struggling to figure out how to make money, hence the need more such a large investment from Intel. Additional proof is the internal confusion of Alan saying they don’t report revenues and then showing us only bookings, while Charles Zedlewski, VP Products, had a slide claiming they’re leading their market in revenue. Really? Then show us.

They do have one advantage, the Cloud model lends itself to a pricing model based on nodes and, as Charles pointed out, that’s a ““business model that’s inherently deflationary” for the customer.  Nodes get more powerful so the customers regularly get more bang for the buck.

On the other side, I don’t know that management understands that they’re just providing a new technology, not a new data philosophy. While some parts of the presentation made clear that Cloudera doesn’t replace other data repositories except for the operational data store, different parts implied it would subsume others without giving a clear picture of how.

A very good point was the partnerships they’re making with BI vendors to help speed integration and access of their solution into the BI ecosystem.

One other confusion that Cloudera, and the market as a whole, seems to be clearly differentiating that the benefits of Hadoop come from multiple technologies: Both the software that helps better manage unstructured data and simple hardware/OS combination that comes from massively parallel processing, whether the servers are in the Cloud or inside a corporate firewall. Much as what was said about Hadoop had to do with the second issue, and so the presenters rightfully got pushback from analysts who saw that RDBMS technologies can benefit from those same things and therefore minimizing that as a differentiator.

Charles did cover an important area of both market need and Cloudera vision: Operational analytics. The ability to quickly massage and understand massive amounts of operational information to better understand processes is something that will be enhanced by the vendor’s ability to manage large datasets. The fact that they understand the importance of those analytics is a good sign for corporate vision and planning.

Open source is important, but it’s often overblown by those new to the industry or within the Open Source community. Enterprise IT knows better, as it has proved in the past. Cloudera is a the right place at the right time, with a great early product, the understanding as to many of the issues that are needed in the short term. The questions are only about the ability to execute both on the messaging and programming sides. Will their products meet the long term needs of business critical applications and will they be able to explain clearly how they can do so? If they can answer correctly, the company will join the names mentioned at the start.

Birst and Blue Hill Research webinar misses the message

There’s only one time to break the marketing rule to never mention your competition: When you’re behind. Avis mentioned Hertz, not the other way around. Of course, its “We’re Number 2, We Try Harder” campaign ran for years and never significantly increased market share, but it was still a reasonable attempt. If someone in the market automatically thinks about a bigger vendor, you want them to at least consider you.

That must have been the reasoning behind today’s Birst webinar comparing itself and Tableau. The problem was the presentation didn’t fit the title.

The first part of the presentation was by James Haight of Blue Hill Research. He did discussed, at a very high level, a report issued and available on Birst’s site. The critical take-away was that slight differences between visualization capabilities is not something that is on the critical decision path. What matters is the back end access and integration of multiple sources that drive the need for BI in the first place. That’s clearly important to enterprises considering modern business intelligence systems.

The problem with the webinar is that the Birst section didn’t support that message or show why the firm’s solution was any different than Tableau’s. The first part of the Birst segment was by a guy who needed to slow down and take a few more breathes. He talked the theory of why Birst designed their product with an emphasis on the backend but the slides really didn’t support it.

As for “disentangling discovery and BI,” the only thing that was said was that discovery is a subset of BI, something that many analysts (including yours truly) have said and which keeps the two tangled.

The demo? Given the key point from the Blue Hill Research paper and the mild setup, I would have expected to see a focus on how heterogeneous sources are meshed. Instead, I saw the typical fly-by of how swoopy-doopy the visualization is. At the very end, there was a quick shot and less than a minute on what seems to be a wizard to start the data integration process.

I’ve blogged before about Birst, and they seem good. However, with many small companies they don’t seem to be able to create a consistent and coherent message. If the intent was just about getting people to think Birst when they think Tableau, it should have been a higher level presentation. Given the title and lead-in, it should have been a deeper dive. Instead, it picked a middle ground that really didn’t clarify anything.

I’m not sure the presentation did anything to help make Birst stick in the minds of people considering Tableau.

VisualCue at BBBT: A New Paradigm for Operational Intelligence

The latest presentation to the BBBT was by Kerry Gilger, President and Founder of VisualCue™ Technologies. While I find most of the presentations interesting, this was real eye-opener.

Let’s start with a definition of operational intelligence  (OI): Tools and procedures to better understand ongoing business operations. It is a subset of BI focused on ongoing operations in manufacturing, call centers, logistics and other physical operations where the goal is not just to understand the high level success of processes but to better understand, track and correct individual instantiations of the process.

A spreadsheet with a row of data for each instantiation is a cumbersome way to quickly scan for the status of individual issues. The following image is an example of VisualCue’s solution: A mosaic of tiles that show different KPIs of the call center process, with a tile per operator, color coded for quick understanding of the KPIs.

VisualCue call center mosaic

 The KPIs include items such as call times, number of calls and sales. The team understands each element of the tile and a review shows the status of each operator. Management can quickly drill down into a tile to see specifics and take corrective actions.

The mosaic is a quick way to review all the instantiations of a given process, a new and exciting visualization method in the industry. However, they are a startup and there are issues to watch as they grow.

They have worked closely with each customer to create tiles that meet needs. They are working to make it easier to replicate industry knowledge to help new customers start faster and less expensively.

The product has also moved from full on-site code to a SaaS model to provide shared infrastructure, knowledge and more in the Cloud.

VisualCue understands operational intelligence is part of the BI space, and has begun to work with standard BI vendors to provide integration with other elements that make up a robust dashboard including the mosaic and other informational elements, that’s rightfully in its infancy given the company’s evolutionary stage. If they keep building and expanding the relationships there’s no problem.

However, the thing that must change to make it a full-blown system is really how they access the data. It’s understandable that a startup expects a customer to figure out all its own data access issues and provide a single source database to drive the mosaics, they’re going to have to work more closely with ETL and other vendors to provide a more open access methodology as they grow and a more dynamic, open data definition and access model than “give us a lump of data and we’ll work with it.”

Given where the company is right now, those caveats are more foibles than anything else. They have the time to build out the system and their time has, correctly, been spent in creating the robust visualization paradigm they demonstrated.

If Kerry Gilger and the rest of his team are able to execute the vision he’s shown, VisualCue will add a major advancement in the ability for business management to quickly understand operations in a way that can provide instant feedback that can improve performance.

TDWI Webinar on in-memory data, another miss

I’ve been skipping the last few TDWI webinars, not exactly knowing how to politely criticize some poor ones. However, I feel as I’m performing a disservice to those who read, so I’ll have to discuss today’s.

The title was “In-Memory Computing: Expanding the Platform Horizon Beyond the Database.” The pitch was that in-memory is so good for databases we should think about doing everything from ETL to everything else in the information chain there. One word critique: Oy.

In-memory analytics has been great for very fast processing. Having the data resident in memory is obviously a great way of providing rapid response for users of reports and analytic tools. However, it’s no panacea.

Simply put, two demands are limiting the cost effectiveness and even ability to do in-memory analytics: The amount of data and the number of users.

One of the repeated refrain of in-memory proponents is “memory is cheap!” Yes it is. However, massively parallel servers with the ability to efficiently link multiple cpus to large amounts of memory while providing coherence for multiple users aren’t. They quickly get very expensive with the costs of high end machines being much more on a pure memory amount level than commodity servers. There’s also an upper bound and with much of the larger data analytics today, multiple servers will be needed.

The other issue is the growth of self-service BI and mobile access to reports means that more memory is needed for non-database usage. A number of in-memory solution providers tell you that each user takes space in memory to satisfy individual needs. The more users the more space is taken from database availability.

The growth of server farms, being created in the Cloud now, is how the blend of in-memory versus space requirements will be addressed. “Fast enough” matters more than millisecond response time. With what we are constantly learning about both data manipulation and presentation, the strongest Cloud providers will win by keeping the most used information in-memory and sharing the rest among caches on multiple servers.

In-memory isn’t new and needs are much different than when it was. Listen to people talk about it and pay attention: If it’s the only thing discussed, they’re not being honest; if it’s a core part of the solution with its caveats addressed, the vendor, analyst or pundit is helping you.