My latest Tech Target article is up at http://searchdatamanagement.techtarget.com/tip/SQL-vs-NoSQL-database-design-debate-isnt-even-a-real-fight.
This week’s TDWI hosted webinar was about engaging business and, once again, it came from the standpoint of technologists rather than from business. There were some very good things said. However, until our industry stops thinking of business knowledge workers as children to be tutored and begins to think about them as people whose knowledge is the core of what we must encapsulate, we’ll continue to miss the mark and adoption of solutions will remain slow.
The main presenter was David Loshin, President of Knowledge Integrity. He began the presentation with a slide that describes his view of the definition of “data driven,” including three main points:
- Focus on turning data into actionable knowledge that can lead to increased corporate value.
- Aware of variance that can cause inconsistent interpretation.
- Coordination among data consumers to enforce standards for utilization.
We should all clearly understand that the first item is not new and was not created by the business intelligence (BI) industry. Business has always been data driven. What we’re able to do now is access far more data than ever before so that we can provide a more robust view of the corporation.
Inconsistent Data v Inconsistent Utilization
The second bullet is a core point. Mr. Loshin used a couple of example such as sales territory and other areas where definitions are fuzzy. One clear difference to me is one I directly experienced 25 years ago, and more directly addresses the visualization side of the BI conundrum. I was working for a major systems integrator (SI) and my client was, well, let’s just say it was a large, fruit based computing company.
A different SI had created an inventory system for the client’s manufacturing facility but the system was a failure though all the right data was in the system. The problem was that the reports were great for the accounting department, not for inventory and manufacturing. We interviewed the inventory team and then rewrote reports to address and present the information from their standpoint.
Too often, technologists get lost in the detailed data definitions and matching fields across data sources. That is critical, but it loses the big picture. Even when data is matched, different business people use data differently.
Which brings us to David Loshin’s third point. No, we don’t need to enforce exact standards for utilization. We need to ensure that the data each consumer refers to is consistent, but we must do a better job in understanding that different departments can utilize the exact same data in a variety of ways.
Business Drivers and Data Governance
David did get to the key issue a bit later, on a slide titled Operationalizing Business Policies. He points out that it’s critical to ensure that “Information policies model the data requirements for business policy.” This is key and should be bubbled up higher in the mindset of our industry. While I hear it mentioned often, it seems to be honored more in the breach.
Time was spent discussing the importance of understanding different users and their varying utilization of data. As I mentioned in the introduction, the solution to the new complexities then veers from addressing business needs to ignoring history. In a previous blog post, I discussed how many in the industry seem to be ignoring the lessons to be learned from the advent of the PC. Mr. Loshin seems to be doing that when he talks about empowering the business users to set their own usability rules. He splits IT and business in the following way:
- Business data consumers are accountable for the rules asserting usability for their views of the data.
- IT becomes responsible for managing the infrastructure that empowers the business user.
The issue I have with that argument is a phrase that didn’t appear in this webinar until Linda Briggs, the moderator, mentioned it in a poll question right before Q&A: Data Governance. Corporations are increasingly liable for how they control and manage information. It does not make sense to allow each user to define their own data needs in a void. Rather than allow for massively expanded and relatively uncontrolled access to data and then later have to contract access, as corporations had to regain a handle on what was being done on scattered desktop computers, BI vendors should be positioning data governance from the start.
Whether it’s by executive fiat, a cross-functional team, or some other method, companies need to clarify data governance rules. Often, IT is the best intermediary between groups, actively participating in data governance definition as an impartial observer and facilitator. It is then the job of IT to ensure that it provides as open access as possible to business workers given their needs and the necessity of following governance rules.
There was one question, during Q&A, on the importance of data governance. I thought David Loshin again understated its importance while Harald Smith, Director of Product Management at Trillium, the webinar sponsor, had the comment that “everyone is responsible for data governance.” That is my only mention of the sponsor, as I felt his portion of the presentation was a recitation of sound bites, talking points and buzz words that didn’t provide any value to the hour.
David Loshin has a clear view of engaging the business and gets a number of key things correct. However, that view is one of a technologist looking over a self-imagined bridge separating technology and business. There’s not a bridge separating IT and business. They overlap in many critical areas and both must learn from and work well with each other.
The launch of Yellowfin DashXML included a round of global webinars mid-week. Well, not “included,” it’s more that the webinar was the entire launch. The new product feature is useful, but as I’m a marketing person I do have to question how they’ve handles the launch.
Yellowfin, as with many business intelligence (BI) vendors, is focused on visualization, providing business knowledge workers the ability to easily see information. The presentation was by John Ryan, Director of Product Marketing, and Teresa Pringle, Product Specialist. As is obvious from the title of the webinar, it was to announce the availability of the first version of DashXML, a utility within Yellowfin that allows easy integration of custom XML into dashboards and reports.
While they do sell directly to IT organizations who provide their interface to their corporate users, they also have a strong OEM business. As Mr. Ryan pointed out, “Embedding BI is a large chunk of Yellowfin’s business.” While direct label clients also want to customize user interfaces, DashXML seems much more valuable to the OEM customer base, providing an easier way to integrate standards from existing applications in order to have a more consistent interface.
The key word in that last sentence was “easier,” not “easy,” and that’s just fine for what is needed. This is XML. As Ms. Pringle explained, programmers will need to be very familiar with CSS manipulation and also with Java Script. DashXML is there to assist developers in providing customized visualizations, it is not for end users. The feature is available with a server license, providing deployment capability, and with a developer license for investigating the feature. It is not available as part of the per-user, distribution license for end users.
DashXML adds power and flexibility to Yellowfin’s offering and will better help its clients customize visualizations.
A Very Quiet Launch
As much as the presenters seemed to be working to imply DashXML is a new product, it’s really a feature of their platform. While the title of the webinar was a launch, nothing in the presentation or on their site implies it’s really a launch.
Almost the entire presentation was about the existing Yellowfin offering. Teresa Pringle’s “demo” portion of the webinar started with a whole lot of customized interfaces and only spent a few minutes showing the DashXML features in design and only for a single report in a dashboard. You could get the idea that it would make things easier, but it was also clear that’s all it did. There’s nothing really new, nothing that Yellowfin clients aren’t doing now, it’s a way to save time and money. Mind you, those are very valuable things, but the presentation didn’t focus on any ROI those savings might present.
What’s more intriguing is that they held a webinar, yet their site doesn’t reflect that knowledge. As of the writing of this blog entry (24 hours after the webinar), a few things seem to be missing:
- No DashXML item in their home page rotating banner.
- No DashXML mentioned on the rest of the home page.
- No DashXML item on their news/blog page.
- No DashXML added to their site menu, even though John presented a slide that implied DashXML was on the same level as their platform and web services offerings.
If the feature isn’t important enough to discuss on the web site, why have a webinar? After all, the purpose of a webinar is to drive interest in the product and one of the key follow ups for webinars should also be gaining information on your site to hopefully drive customer tracking and contact information as lead qualification.
DashXML is a nice addition that can help IT and OEM developers blend point-and-click development and coding to provide a customized visualization interfaces with better ROI. However, a week webinar and no content is neither a silent launch nor a strong one. Sadly, the marketing doesn’t rise to the quality of the product enhancement.
TDWI held a webinar to announce their latest major report. While there are always a lot of intriguing numbers in the reports, it’s also important to remember the TDWI audience is self-selecting. People interested in the latest information lean towards the leading edge so their numbers should be taken as higher than would be in the general IT market place. Still, the numbers as they change over time are valuable and the views of the analysts are often worth hearing.
As the webinar was pushing a major report, the full tag team was in attendance: David Stodder, TDWI Director for BI, Fern Halper, TDWI Director for Analytics, and Philip Russom, TDWI Director for Data Management.
David Stodder presented his section first, and one important point he made had nothing to do with numbers. He briefly discussed one quote and user story and it was from a government employee. Companies using Hadoop to better understand internet business and relationships tend to get almost all the press, but David pointed out the importance of data and analytics in helping governments better address the needs of their citizens.
A very intriguing set of numbers David provided was on how many responders were on current versions of software versus older versions. While you can see that some areas are more quickly adopting the SaaS model, that’s not the key the he pointed out. Only 27% of respondents said they’re on the current version of their data security software. A later slide shows that security is one reason for hesitation in the move to mobile, but Mr. Stodder rightly points out that underlying all the information channels is the basis of data security. It’s not a question of if you’ll get hacked but when, so data security should be kept updated.
The presentation was then turned over to Fern Halper. I look a bit askance at the claim that the Internet of Things (IoT) is a “trend.” Her data shows only 18% taking advantage of it today and 40% might be using in within three years. We’ve been talking about IoT for a while, and it’s clearly being slowly integrated into business, I wouldn’t say it’s as fashionable as the word trend would imply.
On the more useful side is the table she showed that’s simply titles “Analytics hits mainstream.” It not only shows that massive adoption of the last decade’s focus on dashboards and BI tools, but around 30% of respondents are using many of the newer tools and techniques and the next three years indicate a doubling in usage.
Philip Russom gave the final segment of the presentation. His first slide on the adoption of newer technologies for data warehousing showed something that many have finally admitted in the last year or no: No-SQL is an excuse made by people who don’t understand how business technology works. While the numbers show 28% of respondents using Hadoop, it also shows 22% using SQL on Hadoop. The number over the next three years are even more interesting: 36% say they’ll be using Hadoop and 38% will be using SQL on Hadoop. That means existing No-SQL folks will be moving to SQL.
The presentation ended with the team of analysts presenting their list of ten priorities for those people interested in emerging technologies. To me, the first isn’t the first among equals, it is set far above all the rest: Adopt them for their business benefits. All the other nine items are how IT addresses the challenges of new technologies, but those things are useless unless you understand how technologies will support business. Without that, you can’t provide an ROI and you can’t get business stakeholders to support you for long. That’s strategy, all the other points are just tactics.
As usual, get the report and browse it.
IBM at BBBT
A recent presentation by IBM at the BBBT was interesting. As usual, it was more interesting to me for the business information than the details. As unusual, they did a great job in a balanced presentation covering both. While many presentations lean too heavily in one direction or the other, this one covered both sides very well.
The main presenter was Harriet Fryman, VP of Marketing, IBM Analytics Platform. Adding information during the presentation were Steven Sit, Director of Product Management, Open Source Based Analytics Systems, and Steve Beier, Program Director, Spark Technology Center.
The focus of the talk was IBM’s commitment to Apache Spark. Before diving deep into the support, Ms. Fryman began by talking about business’ evolving data needs. Her key point is that “we all do data hording,” that modern technologies are allowing us to horde far more data than ever before, and that better ways are needed to get value out of the data.
She then proceeded to define three key aspects of the growth in analytics:
- Applying analytics in more parts of business.
- Understand the time value of data.
- The growth of machine learning and cognitive systems.
The second two overlap, as the ability to analyze large volumes of data in near real-time means a need to have systems do more analysis. The following slide also added to IBM’s picture of the changing focus on higher level information and analytics.
The presentation did go off on a tangent as some analysts overthought the differences in the different IBM groups for analytics and for Watson. Harriet showed great patience in saying they overlap, different people start with different things and internal organizational structures don’t impact IBM’s ability to leverage both.
The focus then turned back to Spark, which IBM sees as the unifying layer for data access. One key issue related to that is the Spark v Hadoop debate. Some people seem to think that Spark will replace Hadoop, but the IBM team expressed clear disagreement. Spark is access while Hadoop is one data structure. While Hadoop can allow for direct batch processing of large jobs, using Spark on top of Hadoop allows much more real time processing of the information that Hadoop appropriately contains.
One thing on the slide that wasn’t mentioned but links up with messages from other firms, messages which I’ve supported, is that one key component, in the upper left hand corner of the slide, is Spark SQL. Early Hadoop players were talking about no-SQL, but people are continuing to accept that SQL isn’t going anywhere.
Well, most people. At least fifteen minutes after this slide was presented, an attending analyst asked about why IBM’s description of Spark seemed to be similar to the way they talk about SQL. All three IBM’ers quickly popped up with the clear fact that the same concepts drive both.
While the team continued to discuss Spark as a key business imitative, Claudia Imhoff asked a key question on the minds of anyone who noticed huge IBM going to open source: What’s in it for them? Harriet Fryman responded that IBM sees the future of Spark and to leverage it properly for its own business it needed to be part of the community, hence moving SystemML to open source. Spark may be open source, but the breadth and skills of IBM mean that value added applications can be layered on top of it to continue the revenue stream.
Much more detail was then stated and demonstrated about Spark, but I’ll leave that to the more technical analysts and vendor who can help you.
One final note put here so it didn’t distract from the main message or clutter the summary. Harriet, please. You’re a great expert and a top marketing person. However, when you say “premise” instead of “premises,” as you did multiple times, it distracts greatly from making a clear marking message about the cloud.
IBM sees the future of data access to be Apache Spark. Its analytics group is making strides to open not only align with open source, but to be an involved player to help the evolution of Spark’s data access. To ignore IBM’s combined strength in understanding enterprise business, software and services is to not understand that it is a major player in some of the key big data changes happening today. The IBM Spark initiative isn’t a marketing ploy, it’s real. The presentation showed a combination of clear business thought and strategy alongside strong technical implementation.
I just saw an amusing presentation by SAS. Amusing because you rarely get two presenters who are both as good at presenting and as knowledgeable about their products. We heard from Mike Frost, Senior Product Manager for Data Management, and Wayne Thompson, Chief Data Scientist. They enjoy what they do and it was contagious.
It was also interesting from the perspective of time. Too many younger folks think if a firm has been around for more than five years, it’s a dinosaur. That’s usually a mistake, but the view lives on. SAS was founded almost 40 years ago, in 1976, and has always focused on analytics. They have been historically aimed at a market that is made up of serious mathematicians doing heavy statistical work. They’re very good at what they do.
The business analysis sector has been focused on less technical, higher level business number crunching and data visualization. In the last decade, computing power has meant firms can dig deeper and can start to provide analysis SAS has been doing for decades. The question is whether or not SAS can rise to the challenge. It’s still early, but the answer seems to be a qualified but strong “yes!”
Both for good and bad, SAS is the largest privately held software company, still driven by founder James Goodnight. That means a good thing in that technically focused folks plow 23% of their revenue into R&D. However, it also leaves a question mark. I’ve worked for other firms long run by founders, one a 25+ year old firm still run by brothers. The best way to refer to the risk is that of a famous public failure: Xerox and the PC. For those who might not understand what I’m saying, read “Fumbling the Future” by Smith and Alexander. The risk comes down to the people in charge knowing the company needs to change but being emotionally wedded to what’s worked for so long.
The presentation to the BBBT shows that, while it’s still early in the change, SAS seems to be mostly avoiding that risk. They’re moving towards a clean, easier to use UI and taking their first steps towards collaboration. More work needs to be done on both fronts, but Mike and Wayne were very open and honest about their understanding the need and SAS continuing to move forward.
One of the key points by Mike Frost is one I’ve also discussed. While they disagree with me and think the data scientist does exist, the SAS message is clear that he doesn’t work in a void. The statistician, the business analyst and business management must all work in concert to match technical solutions to real business information needs.
LASR, VAE and a cast of thousands
The focus of the presentation was on SAS LASR, their in-memory analytics server. While it leverages Hadoop, it doesn’t use MapReduce because that involves disk access during processing, losing the speed advantage of in-memory applications.
As Mike Frost pointed out, “It doesn’t do any good to run the right model too late.”
One point that still shows the need to think more about business, is that TCO was mentioned in passing. No slide or strong message supported the message. They’re still a bit too focused on technology, not what sells the business decision makers on business intelligence (BI).
Another issue was the large number of ancillary products in the suite, including Visual Data Explorer, Data Loader and others. The team mentioned that SAS is slowly moving through the products to give them the same interface, but I also hope they’re looking at integrating as much as possible so the users don’t have the annoyance of constantly moving between products.
One nice part of the demo was an example of discussing what SAS has termed “poorly structured data” as opposed to “unstructured data” that’s the rage in Hadoop. I prefer “loosely structured data.” Mike and Wayne showed the ability to parse the incoming file and have machine intelligence make an initial pass at suggesting fields. While this isn’t new, I worked at a company in 2000 that was doing that, it’s a key part of quickly integrating such data into the business environment. The company I reference had another founder who became involved in other things and it died. While I’m surprised it took firms so long to latch onto and use the technology, it doesn’t surprise me that SAS is one of the first to openly push this.
Another advantage brought by an older, global firm, related to the parsing is that it works in multiple languages, including right-to-left languages such as Hebrew and Arabic. Most startups focus on their own national language and it can be a while before the applications are truly global. SAS already knows the importance and supports the need.
Great, But Not Yet End-To-End
The only big marketing mistake I heard was towards the end. While Frost and Thompson are rightfully proud of their products, Wayne Thompson crowed that “We’re not XXX,” a reference to a major BI player, “We’re end-to-end.” However, they’d showed only minimal visualization choices and their collaboration, admittedly isn’t there.
Even worse for the message, only a few minutes later, based on a question, one of the presenters shows how you export predicted values so that visualization tools with more power can help display the information to business management.
I have yet to see a real end-to-end tool and there’s no reason for SAS to push this iteration as more than it is. It’s great, but it’s not yet a complete solution.
SAS is making a strong push into the front end of analytics and business intelligence. They are busy wrapping tools around their statistical engines that will allow them to move much more strongly out of academics and the very technical depths of life sciences, manufacturing, defense and other industries to challenge in the realm of BI.
They’re headed in the right direction, but the risk mentioned at the start remains. Will they keep focused on this growing market and the changes it requires, or will that large R&D expenditure focus on the existing strengths and make the BI transition too slow? I’m seeing all the right signs, they just need to stay on track.
The most recent TDWI webinar had a guest analyst, David Loshin of Knowledge Integrity. The presentation was sponsored by Liaison and that company’s speaker was Manish Gupta. Given that Liaison is a cloud provider of data integration, it’s no surprise that was the topic.
David Loshin gave a good overview of the basics of data integration as he talked about the growth of data volumes and the time required to manage that flow. He described three main areas to focus upon to get a handle on modern integration issues:
- Data Curation
- Data Orchestration
- Data Monitoring
Data curation is the organization and management of data. While David accurately described the necessity of organizing information for presentation, the one thing in curation that wasn’t touched upon was archiving. The ability to present a history of information and make it available for later needs. That’s something the rush to manage data streams is forgetting. Both are important and the later isn’t replacing the former.
The most important part of the orchestration Mr. Loshin described was in aligning information for business requirements. How do you ensure the disparate data sources are gathered appropriately to gain actionable insight? That was also addressed in Q&A, when a question asked why there was a need to bother merging the two distinct domains of data integration and data management. David quickly pointed out that there was no way not to handle both as they weren’t really separate domains. Managing data streams, he pointed out, was the great example of how the two concepts must overlap.
Data monitoring has to do with both data in motion, as in identifying real-time exceptions that need handling, and data for compliance, information that’s often more static for regulatory reporting.
The presentation then switched to Manish Gupta, who proceeded to give the standard vendor introduction. It’s necessary, but I felt his was a little too high level for a broader TDWI audience. It’s a good introduction to Liaison, but following Mr. Loshin there should have been more detail on how Liaison addresses the points brought up in the first half of the presentation – Just as in a sales presentation, a team would lead with Mr. Gupta’s information, then the salesperson would discuss the products in more detail.
Both presenters had good things to say, but they didn’t mesh enough, in my view, and you can find out far more talking to each individually or reading their available materials.
The Internet of Things (IOT) is something more and more people are considering. Wednesday’s TDWI webinar topic was “Stream Processing: Streaming Data in Real Time, in Memory,” and the event was sponsored by both SAP and Intel. Nobody from Intel took part in the presentation. Given my other recent post about too many cooks, that’s probably a good thing, but there was never a clear reason expressed for Intel’s sponsorship.
Fern Halper began with overview of how TDWI is seeing data streaming progress. She briefly described streaming as dealing with data while still in motion, as opposed to data in warehouses and other static structures. Ms. Halper then proceeded to discuss the overlap between event processing, complex event processing and stream mining. The issue I had is that she should have spent a bit more time discussing those three terms, as they’re a bit fuzzy to many. Most importantly, what’s the difference between the first two?
The primary difference is that complex event processing is when data comes from multiple sources. Some of the same things are necessary as ETL. That’s why the in-memory message was important in the presentation. You have to quickly identify, select and merge data from multiple streams and in-memory is the way to most efficiently accomplish that.
Ms. Halper presented the survey results about the growth of streaming sources. As expected, it shows strong growth should continue. I was a bit amused that it asked about three categories: real-time event streams, IOT and machine data. While might make sense to ask the different terms, as people are using multiple words, they’re really the same thing. The IoT is about connecting things, which interprets as machines. In addition, the main complex events discussed were medical and oil industry monitoring, with data coming from machines.
Jaan Leemet, Sr. VP, Technology, at Tangoe then took over. Tangoe is an SAP customer providing software and services to improve their IT expense management. Part of that is the ability to track and control network usage of computers, phones and other devices, link that usage to carrier billing and provide better cost control.
A key component of their needs isn’t just that they need stream processing, but that they need stream processing that also works with other less dynamic data to provide a full solution. That’s why they picked SAP’s Even Stream Processor – not only for the independent functionality but because it also fits in with their SAP ecosystem.
One other decision factor is important to point out, given the message Hadoop and other no-SQL folks like to give. SAP’s solution works in a SQL-like language. SQL is what IT and business analysts know, the smart bet for rapid adoption is to understand that and do what SAP did. Understand the customer and sales becomes easier. That shouldn’t be a shock, but technologists are often too enamored of themselves to notice.
Neil McGovern, Sr. Director, Marketing, at SAP gave the expected pitch. It was smart of them to have Jaan Leemet go first and it would have been better if Mr. McGovern’s presentation was even shorter so there would have been more time for questions.
Because of the three presenters, there wasn’t time for many questions. One of the few question for the panel asked if there was such a thing as too much data. Neil McGovern and Jaan Leemet spent time talking about the technology of handling lots of streaming data, but only in generalities.
Fern Halper turned it around and talked about the business concept of too much data. What data needs to be seen at what timeframe? What’s real-time? Those have different answers depending on the business need. Even with the large volume of real-time data that can be streamed and accesses, we’re talking about clustered servers, often from a cloud partner, and there’s no need to spend more money on infrastructure than necessary.
I would have liked to have heard a far more in-depth discussion about how to look at a business and decide which information truly requires streaming analysis and which doesn’t. For instance, think about a manufacturing floor. You want to quickly analyze any data that might indicate failures that would shut down the process, but the volumes of information that allow analysis of potential process improvements don’t need to be analyzed in the stream. That can be done through analysis of a resultant data store. Yet all the information can be coming across the same IoT feed because it’s a complex process. Firms need to understand their information priority and not waste time and money analyzing information in a stream for no purpose other than you can.
Dataversity hosted a webinar by Matt Allen, Product Marketing Manager at MarkLogic. Mr. Allen’s purpose was to explain to the audience the basic challenges involved in big data which can be addressed by semantic analysis. He did a good job. Too many people attempting the same spend too much time on their own product. Matt didn’t do so. Sure, when he did he had some of the same issues that many in our industry have, of over selling change; but the corporate references were minimal and the first half of the presentation was almost all basic theory and practice.
Semantics and Complexity
On a related tangent, one of the books I’m reading now is Stanley McChrystal’s “Team of Teams.” In it, he and his co-authors point to a distinction between complicated and complex. A manufacturing process can be complicated, it can have lots of steps but have a clearly delineated flow. Complex problems have many-to-many relations which aren’t set and can be very difficult to understand.
That ties clearly into the message put forward by MarkLogic. The massive amount of unstructured data is complex, with text rather than fields and which need ways of understanding potential meaning. The problems in free text are such things as:
- Different words can define the same thing.
- The same word can mean different things in different contexts.
- Depending on the person, different aspects of information are needed for the same object.
One great example that can contain all for issues was given when Matt talked about the development process. At different steps in the process, from discovery, to different development stages to product launch, there’s a lot of complexity in meanings of terms not only in development organizations but between them and all the groups in the organization with whom they have to work.
Mr. Allen then moved from discussing that complexity to talking about semantic engines. MarkLogic’s NoSQL engine has a clear market focus on semantic logic, but during this section he did well to minimize the corporate pitch and only talked about triples.
No, not baseball. Triples are a syntactical tool to link subject (person), predicate (operates), object (machine). By building those relationship, objects can be linked in a less formal and more dynamic manner. MarkLogic’s data organization is based on triples. Matt showed examples of JSON, Turtle and XML representations of triples, very neatly sliding his company’s abilities into the theory presentation – a great example of how to mention your company while giving a thought leadership presentation without being heavy handed.
Semantics, Databases and the Company
The final part of the presentation was about the database structure needed to handle semantic analytics. This is where he overlapped the theory with a stronger corporate pitch.
Without referring to a source, Mr. Allen stated that relation databases (RDBMS’) can only handle 20% of today’s data. While it’s clear that a lot of the new information is better handled in Hadoop and less structured data sources, it’s a question of performance. I’d prefer to see a focus on that.
Another error often made by folks adopting new technologies was the statement that “Relational databases aren’t solving a lot of today’s problems. That’s why people are moving to other technologies.” No, they’re extending today’s technologies with less structured databases. The RDBMS isn’t going away, as it does have its purpose. The all or nothing message creates a barrier to enterprise adoption.
The final issue is the absolutist view of companies that think they have no competitor. Mark Allen mentioned that MarkLogic is the only enterprise database using triples. That might be literally true. I’m not sure, but so what? First, triples aren’t a new concept and object oriented databases have been managing triples for decades to do semantic analysis. Second, I recently blogged about Teradata Aster and that company’s semantic analytics. While they might not use the exact same technology, they’re certainly a competitor.
Mark Allen did a very good job exposing people to why semantic analysis matters for business and then covered some of the key concepts in the arena. For folks interested in the basics to understand how the concept can help them, watch the replay or talk with folks at MarkLogic.
The only hole in the presentation is that though the high level position setting was done well, the end where MarkLogic was discussed in detail had some of the same problems I’ve seen in other smaller, still technology driven companies.
If Mr. Allen simplifies the corporate message, the necessary addition at the end of the presentation will flow better. However, that doesn’t take away from the fact that the high level overview of semantic analysis was done very well, discussing not only the concepts but also a number of real world examples from different industries to bring those concepts alive for the audience. Well done.
I’ll start by being very clear: This is a slam on bad marketing. Do not take this column as a statement that the products have problems, as we didn’t see the products.
Database Trends and Application magazine/website held a webinar. The first clue there was something wrong is that an hour long seminar had three sponsors. In a roundtable forum, that could work, and the email mentioned it was a roundtable, but it wasn’t. Three companies, three sequential presentations. No roundtable.
It was titled “The Future of Big Data: Hybrid Architectures and Best-of-Breed”. The presenters were Reiner Kappenberger, Global Product Manager, HP Security Voltage, Emma McGrattan, SVP Engineering, Actian, and Ron Huizenga, ER/Studio Product Manager, Embarcadero. They are three interesting companies, but how would the presentations fit together?
Each presenter had a few minutes to slam through a pitch, which they did with varying speeds and content. There was nothing tying them into a unified vision or strategy. That they all mentioned big data wasn’t enough and neither was the time allotted to hear significant value from any of them.
I’ll burn through each as the stand-alone presentations they were.
HP Security Voltage
Reiner Kappenberger talked about his company’s acquisition by HP earlier this year and the major renaming from Voltage Security to HP Security Voltage (yes, “major” was used tongue-in-cheek). Humor aside, this is an important acquisition for HP to fill out its portfolio.
Data security is a critical issue. Mr. Kappenberger gave a quick overview of the many levels of security needed, from disk encryption up to authentication management. The main feature focus on Reiner’s allotted time is partial tokenization, being able to encrypt parts of a full data field. For instance, disguising the first five digits of a US Social Security number while leaving the last four visible. While he also mentioned tying into Hadoop to track and encrypt data across clusters, time didn’t permit any details. For those using Hadoop for critical data, you need to find out more.
The case studies presented included a car company’s use of both live, Internet of Things feeds and recall tracking but, again, there just wasn’t enough time.
The next vendor was Actian, an analytics and business intelligence (BI) player based on Hadoop. Emma McGrattan felt rushed by the time limit and her presentation showed that. It would have been better to slow down and cover a little less. Or, well, more.
For all the verbage it was almost all fluff. “Disruption” was in the first couple of sentences. “The best,” “the fastest,” “the most,” and similar unsubstantiated phrases flowed like water. She showed an Actian built graph with product maturity and Hadoop strength on the two axis and, as if by magic, the only company in the upper right was Actian.
Unlike the presentations before and after hers, Ms. McGrattan’s was a pure sales pitch and did nothing to set a context. My understanding, from other places, is that Actian has a good product that people interested in Hadoop should evaluate, but seeing this presentation was too little said in too little time with too many words.
In Q&A, Emma McGrattan also made what I think is a mistake, one that I’ve heard many BI companies get away from in the last few years. An attendee asked about biggest concern when transitioning from EDW to Hadoop. The real response should be that Hadoop doesn’t replace the EDW. Hadoop extends the information architecture, it can even be used to put an EDW on open source, but EDWs and big data analytics typically have two different purposes. EDWs are for clean, trusted data that’s not as volatile, while big data is typically transaction oriented information that needs to be cleaned, analyzed and aggregated before it’s useful in and EDW. They are two tools in the BI toolbox. Unfortunately, Ms. McGrattan accepted the premise.
Mr. Huizenga, from Embarcadero, referred to evidence that the amount of data captured in business is doubling every 1.2 years and how the number of related jobs is also exploding. However, where most big data and Hadoop vendors would then talk about their technologies manipulating and analyzing the data, he started with a bigger issue: How do you begin to understand and model the information? After all, schema-on-write still means you need to understand the information enough to create schemas.
That led to a very smooth shift to a discussion about the concept of modeling to Embarcadero. They’ve added native support for Hive and MongoDB, they can detect embedded objects in those schemas and they can visually translate the Hadoop information into forms that enterprise IT folks are used to seeing, can understand and can add to their overall architecture models.
Big data doesn’t exist in a void, to be successful it must be integrated fully into the enterprise information architecture. For those folks already using ERwin and those who understand the need to document modeling, they are a tool that should be investigated for the world of Hadoop.
Three good companies were crammed into a tiny time slot with differing success. The title of the seminar suggested a tie that was stronger than was there. The makings existed for three good webinars, and I wish DBTA had done that. The three firms and the host could have communicated to create an overall message that integrated the three solutions, but they didn’t.
If you didn’t see the presentation, don’t bother. Whichever company interests you check it out. All three are interesting though it might have been hard to tell from this webinar.