Data Lakes, the renamed ODS, aren’t the only solution for accessing data. Think actual need, understand supporting metadata, then build your data ingestion plan. Read my latest TechTarget column.
My latest article on Tech Target is about how Business Intelligence and the Internet of Things (IoT) overlap.
People at technology startups love to call the industry giants dinosaurs. The analogy fails for a number of reasons. The funniest is that the dinosaurs existed for many millions of years. As the large companies exist now, are the startups are saying the big companies will only disappear if we’re hit by a meteor? Companies became large by filling a need. While many might not be as nimble, their experience, especially in enterprise software, means they often see the needs of the business community while the small companies are focused too much on their “cool” technology.
This week’s Oracle webinar, hosted by the DBTA, was a good example of that. The speakers were Rich Clayton, VP Business Analytics Product Group, and Omri Traub, VP Software Development, and the subject was, no surprise, Oracle Big Data Cloud Service (OBDC. Yeah, I know. Too close to ODBC…). Before we get into the details, people need to be aware that Oracle is fully committed to the cloud, as pointed out in a recent advertorial in Forbes. Oracle is clearly competing with Amazon for enterprise cloud business. Big data is only one part of that.
Rich Clayton began the presentation by pointing towards Thomas Edison’s laboratory as an example of using the ideas from many people to not only invent things but also to figure out how to market those inventions. He brought that directly into the evolution of corporate data labs. The biggest problem, Rich stated, is that that labs are usually only populated by very technical people while they require a broader array of talents. That requirement is one of the data labs principles he defined and one I’ve also described as the missing component of many corporate data labs.
A related problem is that most products are so complex and silo’d that very technical people are needed. At this stage in business intelligence and big data, that’s the horse that needs to be addressed before the broad access cart can move.
Omri Traub then took over for the demonstration portion of the presentation. Unfortunately, he unintentionally proved the point about technical folks missing business needs by the setup he used for the demonstration. The demo was built around an enormous amount of information on New York City taxi information. While manipulating a billion record data set is cool and powerful, he never presented a business message. He pointed to the large volume of data, talked about other data sources he combined, and then played with the data to show correlations.
The problem? Omri, claimed we were gaining insight. Correlations aren’t insight. Understanding how those correlations might impact your business and ideas how to adapt business to meet what you find is insight. Nothing in the demonstration pointed towards insight.
Fortunately, Rich Clayton earlier had given a couple of case studies showing business insight gained by OBDC early customers. It would have been much better if Mr. Traub had focused on one of those cases or something similar.
The best point of the demonstration was when Omri showed how, in the middle of playing with some relationships, he easily incorporated some analysis created by a different person. As mentioned above, collaboration is critical and it looks like Oracle hasn’t limited that to just a marketing message but has worked to make sure that Oracle’s product helps the team. As many companies claim to do that and it was only an overview, your mileage might vary. Make sure when you talk to them to follow through and see whether the collaboration (not to mention the entire product…) meets your needs.
The final section was the Q&A. I’m a marketing person, so I have to be honest and state that it sounded like canned questions they wanted to address, as there was way too much about the full Oracle ecosystem brought into discussion at this point compared to what I’d expect from customers. Still, there was one important point.
A question was asked about what advanced analytics might be added. Mr. Taub had the perfect response. After quickly mentioning that, yes, Oracle was always looking at advanced analytics and how to add them, he made a much more important point. Collobaration is key and OBDC is designed to get business people involved. All analytics need to be added in a usable manner, in a way that is understandable and can be leveraged by more people than just the technical resources.
That is the critical viewpoint that a large, enterprise focused company can bring to BI, the cloud and big data. That’s why it’s foolish to write off the large companies, the ones with expertise in not just technology, but in business and business relationships. They might not move as fast, but they can move to the right places with the right products and the right business messages.
The most recent TDWI webinar had a guest analyst, David Loshin of Knowledge Integrity. The presentation was sponsored by Liaison and that company’s speaker was Manish Gupta. Given that Liaison is a cloud provider of data integration, it’s no surprise that was the topic.
David Loshin gave a good overview of the basics of data integration as he talked about the growth of data volumes and the time required to manage that flow. He described three main areas to focus upon to get a handle on modern integration issues:
- Data Curation
- Data Orchestration
- Data Monitoring
Data curation is the organization and management of data. While David accurately described the necessity of organizing information for presentation, the one thing in curation that wasn’t touched upon was archiving. The ability to present a history of information and make it available for later needs. That’s something the rush to manage data streams is forgetting. Both are important and the later isn’t replacing the former.
The most important part of the orchestration Mr. Loshin described was in aligning information for business requirements. How do you ensure the disparate data sources are gathered appropriately to gain actionable insight? That was also addressed in Q&A, when a question asked why there was a need to bother merging the two distinct domains of data integration and data management. David quickly pointed out that there was no way not to handle both as they weren’t really separate domains. Managing data streams, he pointed out, was the great example of how the two concepts must overlap.
Data monitoring has to do with both data in motion, as in identifying real-time exceptions that need handling, and data for compliance, information that’s often more static for regulatory reporting.
The presentation then switched to Manish Gupta, who proceeded to give the standard vendor introduction. It’s necessary, but I felt his was a little too high level for a broader TDWI audience. It’s a good introduction to Liaison, but following Mr. Loshin there should have been more detail on how Liaison addresses the points brought up in the first half of the presentation – Just as in a sales presentation, a team would lead with Mr. Gupta’s information, then the salesperson would discuss the products in more detail.
Both presenters had good things to say, but they didn’t mesh enough, in my view, and you can find out far more talking to each individually or reading their available materials.
Yesterday’s TDWI webinar was sponsored by Liaison Technologies, who did the same thing last year. It’s a push for another acronym. While the acronym isn’t needed, the concept is. Data Platform as a Service is just using the cloud to help with data integration. Gosh, complex, ‘eh? I think it’s the natural progression of technology and business, it’s just data management on the cloud. But forget the marketing, let’s talk about the concept.
Cloud data management
The presentation’s first half was delivered by Phillip Russom. He started with some very trivial level setting but then quickly got to a key point. If you’ve been around for a while, you remember Best of Breed. That’s when each vendor focused product company, somewhere in the information supply chain, talked about their openness and how you could piece together a solution from different vendors. That made sense at the time, since many companies were each creating the early version of parts of a full solution.
As Phillip pointed out, times have changed. We now better understand business needs, have learned more about coding the requirements and can access far better hardware than we had fifteen years ago. That means IT is looking for what they couldn’t find back then: An integrated solution from a single or a far more limited number of vendors. They want something simpler than a hodgepodge of multiple systems.
The advantages of the cloud aren’t specific to data management. One very key business driver that was minimized in Mr. Russom’s presentation but brought out later by Patrick Adamiak during his presentation then revisited by both in the Q&A is capex versus opex – something often ignored by technical folks. Having your own hardware and data center is not just costly, it’s part of capital expenditure. Service contracts with a cloud vendor are operational expenses. That means the CxO suite and Board are often happier with that because it’s not as locked it and creates flexibility in the corporate financial picture.
One nit I had with Mr. Russom’s presentation was his statement that cloud is another architecture, like client/server or the web. The cloud and web are client server, that’s not the issue. It’s another architecture in two other key aspects: The already mentioned capex/opex divide, and the way it changes a software vendor’s ability to manage and update their software in comparison to on-premises installations.
One caution he gave that needed more explanation for folks new to the cloud was when Mr. Russom mentioned that you need to ask about the elasticity of the cloud implementation. For those who might not have heard the term, elasticity is the ability to grow or shrink cloud resources in order to match processing demands. In other words, if you get a big data dump from another source, can you quickly access more disk space? Or, from the Web side of the house: You’re hosting a big event or making a major announcement on your Web site: Can site resources be replicated quickly to handle the additional load then released when no longer needed?
I was impressed by the fact that capex was mentioned on Patrick Adamiak’s first slide. Cloud technology has multiple advantages that can be communicated to IT, but it’s the capex/opex issue that will help close the deal in an enterprise setting. Liaison seems to understand the need to blend technical and business messages.
However, most of Mr. Adamiak’s presentation seemed to be about justifying the new acronym. The main slide compared dPaaS with other supposed solutions without admitting there’s really a lot of overlap between them. The columns weren’t as different as he’d like them to be.
His company slides didn’t seem any different than those I’ve seen from the many other firms in the space. Forget all of that, it was in a short webinar with TDWI, so he had limited time.
The fact is that Liaison claims they are where the market is going. They are vertically integrating the information supply chain while leveraging the cloud for its business and technology advantages. For those in IT looking to simplify their world, Liaison is a company that should be investigated.
Today the BBBT held a special session. While most presentations are by companies with full products, existing sales and who typically have been around for a few years, today we had the pleasure of listening to Sherry Brown, President of AptiMap. This is a pure startup company, still tiny. She was looking for our always vocal analyst community’s opinions on her initial aim and direction. Not to surprise anyone who knows the BBBT, we gave that at full bore.
Ms. Brown’s goal is to provide a far easier way of mapping fields between source and target datasets for creating data warehouses and other data stores. It’s a great start and she has some initial features that will help. I’ll be blunt: I’m intentionally not going to say a lot. As mentioned, they are a very early startup and the software isn’t full fledged. That means any mention of what they have and don’t have could be inaccurate by next week. That’s not a bad thing, it’s what happens at that phase.
I will mention that the product is cloud based from the start.
The important question about whether or not to contact AptiMap is what who you are and what you need. Most of the feedback to Sherry was about that. It was helping to focus the message. If I have correctly understood the consensus of the attendees, here are the critical things to focus upon while defining a market for the initial product:
- Aimed at IT and business analysts
- Folks currently using modeling tools or spreadsheets at a start
- Focus on standard, enterprise data sources, from spreadsheets to RDBMS’s, Hadoop can wait
- Mid-sized companies integrating their first sets of systems or trying to get a handle on their existing data
- Might especially be good in the hands of consultants going into those types of companies.
- Many of the potential users are tablet users, so focus on that aspect of mobile
One final key, one that needs to be a full paragraph rather than a bullet and one that many technical startups don’t get while building their products based on user needs, is that users aren’t the only decision makers in the product. As mentioned, this is a cloud product and AptiMap will be expecting recurring revenue from monthly or annual fees. The business analyst is often not the person who approves those types of costs. The firm also needs to focus messages on the buyers, whether IT, line or consulting management, to build messages that help them understand the business benefit of providing the tool to their people.
Understanding your market matters. It will help the firm not only focus product, but also narrow down the marketing message and image to aim at the correct audience.
Too often, founders get a great technical idea and focus on a couple of users to fill out product features and then try to find a market. BI is moving too fast for that, the vision needs to be much more clearly set out much earlier than was needed in software companies twenty years ago.
Finally, I mentioned the cloud model but should also mention AptiMap is offering a 30-day free trial.
AptiMap has an initial product that can help people more rapidly and accurately create mappings between data sources and targets. It’s cloud based for easy access. It is, however, very early in the product and company life cycle.
I would suggest it primarily to analysts in mid-sized organizations or consultants who work with SMBs and want some quick hit functionality add to map data sources for the creation of data warehouses, ODS’s and other relationally oriented data repositories.
If you want to experiment inexpensively with an early product that could help, contact them.
At today’s investor meeting, IBM execs announced a target of $40 billion in revenue for cloud, analytics, mobile, social and security software by 2018. I’ve expect to see folks talk about dinosaurs not being able to turn fast enough and predicting failure to meet that goal. I don’t know if they can do it, but to make such ardent predictions you’d have to ignore history.
Mid-sized Unix servers came along and folks talked about IBM going away.
IBM blew a chance to own PC industry and the same predictions followed them.
Linux? Freeware was going to destroy the mainframe. Oops, Linux partitions run on mainframes.
Now we know the large growth of the cloud. Much of it has been on commodity boxes. However, as data gets larger, analytics more powerful and networks become more robust, there’s clearly space for a company with such a strong history in hardware, services and adapting to changes.
After all, too many people still think of IBM as a hardware company. While it’s too early for the 2014 report, you can check the 2013 Annual Report and check page 7. Look at what a tiny percentage of the bar is hardware. Software and services are fairly even in splitting the vast majority of the revenue stream.
It’s a strong goal and will take a lot of pushing. How many politely phrased “re-orgs” will happen to lay off staff? Who knows? Will they succeed? No clue. All I expect is that they’ll continue to grow and nobody should count them out.
Yesterday, Tableau Software held an analyst briefing. It wasn’t a high level one, it was really just a webinar where they covered some product futures under NDA. However, it was very unclear what was NDA and what wasn’t. When they discussed things announced at the most recent Tableau Conference in Seattle, that’s not NDA, but there was plenty of future discussed, so I’ll walk a fine line.
The first news is to cover their Third Quarter announcement from the beginning of the month. This was Tableau’s first quarter of over $100 million in recognized revenue. It’s a strong showing and they’re justifiable proud of their consistent growth.
Ajay Chandrdamouly, Analyst Relations, also said that the growth primarily results from a Land and Expand strategy, beginning with small jobs in departments or divisions, driven by business needs, then expanding into other organizations and eventually into a corporate IT account position. However, one interesting point is an expansion mentioned later in the presentation by Francois Ajenstat, Product Management, while giving the usual case studies seen in such presentations. He did a good job of showing one case study that was Land and Expand, but another began as a corporate IT account and usage was driven outward by that. It’s an indication of the maturity of both Tableau and the business intelligence (BI) market that more and more BI initiatives are being driven by IT at the start.
Francois’ main presentation was about releases, past and future. While I can’t write about the later, I’ll mention one concern based on the former. He was very proud about the large number of frequent updates Tableau has released. That’s ok in the Cloud, where releases are quickly rolled into the product that everyone uses. However, that’s a risk in on-premises (yes, Francois, the final S is needed) installations in the area of support. How long do you support products and how do you support them is an issue. Your support team has to know a large number of variations to provide quick results or must investigate and study each time, slowing responses and possibly angering customers. I asked about the product lifecycle and how they managed to support and to decide sunsetting issues, but I did not get a clear and useful answer.
The presentation Mr. Ajenstat gave listed six major focus themes for Tableau, and that’s worth mentioning here:
- Seamless Access to Data
- Analytics & Statistics for Everyone
- Visual Analytics Everywhere
- Fast, Easy, Beautiful
None of those is a surprise, nor is the fact that they’re trying to build a consistent whole from the combination of foci. The fun was the NDA preview of how they’re working on all of those in the next release. One bit of foreshadowing, they are looking at some issues that won’t minimize enterprise products but will be aimed at a non-enterprise audience. They’ll have to be careful how they balance the two but expansion done right brings a wider audience so can be a good thing.
The final presenter was Ellie Fields, Product Marketing, who talked more about solution than product. Tableau Drive is not something to do with storage or big data, it’s a poorly named but well thought out methodology for BI projects. Industry firms are finally admitting they need some consistency in implementation and so are providing best practices to their implementation partners and customers to improve success rates, speed implementation and save costs. Modern software is complex, as are business issues, so BI firms have to provide a combination of products and services that help in the real world. Tableau Drive is a new attempt by the company to do just that. There’s also no surprise that it uses the word agile, since that’s the current buzzword for iterative development that’s been going on long before the word was applied. As I’m not one who’s implemented BI product, I won’t speak to its effectiveness, but Drive is a necessity in the marketplace and Tableau Drive helps provide a complete solution.
The briefing was a technical analyst presentation by Tableau about the current state of the company and some of its futures. There was nothing special, no stunning revelations, but that’s not a problem. The team’s message is that the company has been growing steadily and well and that their plans for the future are set forward to continue that growth. They are now a mid-size company, no longer as nimble as startups yet don’t have the weight of the really large firms, they have to chart a careful path to continue their success. So far it seems they are doing so.
Today a webinar was hosted by Database Trend and Applications. While there are important things to talk about, I’ll start with the amusing point of the inverse relationship between company size and presenter title found in every webinar, but wonderfully on display here. The three presenters were:
- Mark Theissen, CEO, Cirro
- Peter Hoopes, VP/GM, BIRT Analytics Division, Actuate
- Amit Patel, Program Director, Data Warehouse Solutions Marketing, IBM
The topic was “Accelerating your Analytics for Faster Insights.” That is a lot to cover in less than an hour, made more brief by a tag team of three people from different companies. I must say I was pleasantly surprised with how well they integrated their messages.
Mark Theissen was up first. There were a lot of fancy names for what Cirro does, but think ETL as it’s much easier. Mark’s point is that no single repository can handle all enterprise data even if that made sense. Cirro’s goal is to provide on-demand distributed analytics, using federation to link multiple data sources in order to help businesses analyze more complete information. It’s a strong point people have forgotten in the last few years during the typical “the latest craze will solve everything” focus on Hadoop and minimizing the role of getting to multiple sources.
Peter Hoopes then followed to talk about doing the analytics. One phrase he used should be discussed in more detail: “speed wins.” So many people are focused on the admittedly important area of immediate retail feedback on the web and with mobile devices. There, yes, speed can win. However, not always. Sometimes though helps too. That’s one reason why complex analysis for high level business strategy and planning is different that putting an ad on a phone as you walk by a store. There are clear reasons for speed, even in analytics, but it should not be the only focus in a BI decision.
IBM’s Amit Patel then came on to discuss the meat of the matter: DB2 Blu. This is IBM’s foray into in-memory, columnar databases. It’s a critical ad to the product line. There are advantages to in-memory that have created a need for all major players to have an offering, and IBM does the “me too!” well; but how does IBM differentiate itself?
As someone who understands the need for integration of transaction and analytic systems and agrees both need to co-exist, I was intrigued by what Amit had to say. Transactions going into normal DB2 environment while being shadowed into columnar BLU environment to speed analytics. Think about it: Transactions can still be managed with the row-oriented technologies best suited for them while the information is, in parallel, moved to the analytics database that happens to be in memory. It seems to be a good way to begin to blend the technologies and let each do what works best.
For a slightly techhie comment, I did like what Mr. Patel was saying about IBM’s management of memory and CPU. After all, while IBM is one of the largest software vendors in the world, too many folks forget their hardware background. One quick mention in a sentence about “hardware vendors such as Intel and IBM…” was a great touch to add a message that can help IBM differentiate its knowledge of MPP from that of pure software companies. As a marketing guy, I smiled big time at the smooth way that was brought up.
The three presenters did a good job in pointing out that the heterogeneous nature of enterprise data isn’t going away, rather it’s expanding. Each company, in its own way, put forward how it helps address that complexity. Still, it takes three companies.
As the BI market continues to mature, the companies who manage to combine the enterprise information supply chain components most smoothly will succeed. Right now, there’s a message being presented by three players. Other competitors also partner for ETL, data storage and analytics. It sounds interesting, but the market’s still young. Look for more robust messages from single vendors to evolve.
Today’s presentation in front of the BBBT was by NuoDB’s CTO, Seth Proctor. NuoDB is a small company with big investments. What makes them so interesting? It’s the same thing as in many of the other platform presenters at the BBBT. How do we get real databases in the Cloud?
Hadoop is an interesting experiment and has clearly brought value to the understanding of massive amounts of unstructured data. The main value, though, remains that it’s cheap. The lack of SQL means it’s ok for point solutions that don’t stress its performance limitations. Bringing enterprise database support to the cloud is something else.
The main limitation is that Hadoop and other unstructured databases aren’t able to handle transactional systems while those still remain the major driver in operating businesses.
NuoDB has redesigned the database from the ground up to be able to run distributed across the internet. They’ve created a peer-to-peer structure of processes, with separate processes to manage the database and SQL front end transaction issues.
Seth pointed out that they ““Have done nothing new, just things we know put together in a new way.” He also pointed out they have patents. My gripe about patents for software is an issue for another day, but that dichotomous pairing points to one reason (Apple’s patent on a rounded rectangle is another example of the broken patent system, but off the soap box and onwards…).
It’s clear that old line RDMS systems were designed on major, on-premise servers. The need for a distributed system is clear and NuoDB is on the forefront of creating that. One intriguing potential strength, one about which there wasn’t time to discuss in the presentation, is a statement about the object-oriented structure needed for truly distributed applications.
Mr. Proctor stated that the database schema is in object definitions, not hard coded into the database. He added that provides more flexibility on the fly. What it also could mean is that the schema isn’t restricted to purely RDBMS schemas and that future versions of their database could support columnar and even unstructured database support. For now, however, the basic ability to change even a standard row-based relational database on the fly without major impacts on performance or existing applications is a strong benefit.
As the company is young and focused on the distributed aspects of performance, it was also admitted that their system isn’t one for big data, even structures. They’re not ready for terabytes, not to mention petabytes of data.
That’s the techie side, but what about business?
The company is focused on providing support for distributed operational systems. As such, Seth made clear they haven’t looked at implementations supporting both operational and analytical systems. That means BI is not a focus and so the product might not be the right system for providing high level business insight.
In addition, while I asked about markets I mainly got an answer about Web sites. They seem to think the major market isn’t Global 1000 businesses looking for link distributed operational systems but that Web commerce sites are their sweet spot. One example referred to a few times was in transactional systems for businesses selling across a country or around the world. If that’s the focus, it’s one that needs to be made more explicit on their web site, which really doesn’t discuss markets in the least.
It’s also an entry into the larger financial markets space. It and medical have always been two key verticals for new database technologies due to the volumes of information. That also means they need to prioritize the admitted lack of large database support or they’ll hit walls above the SMB market.
The one business thing the bothers me is their pricing model. It’s based on the number of hosts. As the product is based on processes, there’s no set number of processes per host. In addition, they mentioned shared hosting, places such as AWS, where hosts may be shared by multiple of NuoDB’s customers or where load balancing might take your processes and have them on one host one day and multiple hosts the next.
Host base pricing seems to be a remnant of on-premises database systems that Cloud vendors claim to be leaving. In a distributed, internet based setup, who cares how big the host is, where the host is, or anything else about the host? The work the customer cares about is done by the processes, the objects containing the knowledge and expertise of NuoDB, not the servers owned by the hosting firm. I would expect that Cloud companies would move from processors to process.
NuoDB is a company focused on reinventing the SQL database for the Cloud. They have significant investment from the VC and business markets. However, it would be foolish to think that Oracle, IBM and other existing mainstream RDBMS vendors aren’t working on the same thing. What NuoDB described to the BBBT used most of the right words from the technology front and they’re ramping up their development based on the investments, but it’s too early to say if they understand their own products and markets enough to build a presence for the long term.
They have what looks like very interesting technology but, as I keep repeating in review after review, we know that’s not enough.