Author Archives: David Teich

AptiMap at BBBT: Improving Data Mapping

Today the BBBT held a special session. While most presentations are by companies with full products, existing sales and who typically have been around for a few years, today we had the pleasure of listening to Sherry Brown, President of AptiMap. This is a pure startup company, still tiny. She was looking for our always vocal analyst community’s opinions on her initial aim and direction. Not to surprise anyone who knows the BBBT, we gave that at full bore.

Ms. Brown’s goal is to provide a far easier way of mapping fields between source and target datasets for creating data warehouses and other data stores. It’s a great start and she has some initial features that will help. I’ll be blunt: I’m intentionally not going to say a lot. As mentioned, they are a very early startup and the software isn’t full fledged. That means any mention of what they have and don’t have could be inaccurate by next week. That’s not a bad thing, it’s what happens at that phase.

I will mention that the product is cloud based from the start.

The important question about whether or not to contact AptiMap is what who you are and what you need. Most of the feedback to Sherry was about that. It was helping to focus the message. If I have correctly understood the consensus of the attendees, here are the critical things to focus upon while defining a market for the initial product:

  • Aimed at IT and business analysts
  • Folks currently using modeling tools or spreadsheets at a start
  • Focus on standard, enterprise data sources, from spreadsheets to RDBMS’s, Hadoop can wait
  • Mid-sized companies integrating their first sets of systems or trying to get a handle on their existing data
  • Might especially be good in the hands of consultants going into those types of companies.
  • Many of the potential users are tablet users, so focus on that aspect of mobile

One final key, one that needs to be a full paragraph rather than a bullet and one that many technical startups don’t get while building their products based on user needs, is that users aren’t the only decision makers in the product. As mentioned, this is a cloud product and AptiMap will be expecting recurring revenue from monthly or annual fees. The business analyst is often not the person who approves those types of costs. The firm also needs to focus messages on the buyers, whether IT, line or consulting management, to build messages that help them understand the business benefit of providing the tool to their people.

Understanding your market matters. It will help the firm not only focus product, but also narrow down the marketing message and image to aim at the correct audience.

Too often, founders get a great technical idea and focus on a couple of users to fill out product features and then try to find a market. BI is moving too fast for that, the vision needs to be much more clearly set out much earlier than was needed in software companies twenty years ago.

Finally, I mentioned the cloud model but should also mention AptiMap is offering a 30-day free trial.

Summary

AptiMap has an initial product that can help people more rapidly and accurately create mappings between data sources and targets. It’s cloud based for easy access. It is, however, very early in the product and company life cycle.

I would suggest it primarily to analysts in mid-sized organizations or consultants who work with SMBs and want some quick hit functionality add to map data sources for the creation of data warehouses, ODS’s and other relationally oriented data repositories.

If you want to experiment inexpensively with an early product that could help, contact them.

Trifacta at the BBBT: Better Access and Understanding of Raw Hadoop Data

Trifacta is another business intelligence company to enter the horse race (yes, I know that reference is spelled differently…). They are focused on providing an early look at data coming out of Hadoop, to create some initial form and and intelligences for business use.

Last Friday, Trifacta was the presenting company at the BBBT. Their representatives were Adam Wilson, CEO, Michael Hiskey, Interim Marketing Lead, and Wei Zheng, VP Products. The presenters were there to discuss the company’s position in data wrangling. While some folks had problems with the term, as Michael Hiskey pointed out, it as term that they didn’t invent. Me? I think it makes more sense than another phrase our industry uses, data lake; but that’s another topic.

Simply put, Trifacta is working to more easily provide a view into Hadoop data by using intelligence to better understand and suggest field breaks, layouts and formats, to help users clean and refine the data in order for it to become useable information for analysis.

Michael and the others talk about self-service data preparation, implying end, business user involvement. The problem is that they’re messaging far ahead of the product. They, as lots of other companies are also doing, try too hard to imply an ease of use that isn’t there. Their users are analysts, IT or business. The product is important and useful, but it’s important to be clear about to whom it is useful. (Read more about self-service issues).

The Demo

While Michael Mr. Hiskey and Mr. Wilson gave the introduction, the meat was in Ms. Zheng’s presentation. As a guy who has spent years in product marketing, I have a bit of a love hate relationship with product management. Have had some great ones and very poor ones – and I’m sure the views of me also spread that spectrum. I’ll openly say that Wei Zheng is the most impressive example of a VP of Products I’ve heard in a long time. She not only knows the products, she was very clear about understanding the market and working to bridge that to development. How could any product marketer not be impressed? Her demo was a great mix between product and discussions about both current usage and future strategy.

One of the keys to the product Wei Zheng pointed out is that the work Trifata is doing does not include moving the data. It doesn’t update the data, it works by managing metadata that describes both data and transformations. Yes, I said the word. Transformations. Think of Trifacta as simplified ETL for Hadoop, but with a focus on the E & T.

The Trifacta platform reads the Hadoop data, sampling from the full source, and uses analysis to suggest field breaks. Wei used a csv file for her demo, so I can’t speak to what mileage you’ll experience with Hadoop data, but the logic seems clear. As someone who fifteen years ago worked for a company that was analyzing row data without delimiters to find fields, I know it’s possible to get close through automation. If you’re interesting, you should definitely talk with them and have them show you their platform working with your data.

The product then displays a lot of detail about the overall data and the fields. It’s very useful information but, again, it’s going to be far more useful to a data analyst than to a business user.

Trifacta also has some basic data cleansing functions, such as setting groups for slightly different variations of the same customer company name and then changing them to something consistent. Remember, this is done in the metadata; the original data remains the same. You can review the data and the cleaned data will show, but the original remains until you formally export to a clean data file.

Finally, as the demonstration clearly shows, they aren’t trying to become a BI visualization firm. They are focused on understanding, organizing and cleaning the data before analysis can be done. They partner with visualization vendors for the end-user analytics.

Summary

Trifacta has a nifty little product for better understanding, cleaning and providing Hadoop data. Analysts should love it. The problem is that, unlike what their presentation implies, Hadoop does not equal big data. They have nothing that helps link Hadoop into the wider enterprise data market. They are a very useful tool for Hadoop, but unless they quickly move past that, other vendors are already looking at how to make sense of the full enterprise data world. They seem to have a great start in a product and, from my limited exposure to three people, a very good team. If you need help leveraging your Hadoop data, talk with them.

Tableau at the BBBT: Strengthening the Business in Business Intelligence

Tableau was back at the BBBT last week. Last year’s presentation was a look ahead at v8.2. The latest visit was a look back at 2014 and a focus on v9.0. Francois Ajenstat, VP Product Management, was back again to lead us through product issues. The latest marketing presenter was Adriana Gil Miner, VP Corporate Communications.

Tableau Revenues

Ms. Gil Miner opened the morning with the look back at last year. The key point was thestrength of their growth. They are not only pleased with the year-over year growth, but thechart also shows last year’s revenue as a slice of revenue over Tableau’s lifetime. We’ll leave it simply as: They had a good year.

Another point in describing their size is that Adriana said they have 26,000 customer accounts. Some confusion with a later presentation number required clarification and this isn’t users, or even sites. We were told that the 26k is the number of paying company accounts. There were no numbers showing median account size or how far the outliers are on either extreme, but that’s a nice number for the BI space.

The final key point made by Adriana Gil Miner was localization. Modern companies almost all create products using unicode or other methods that allow for language localization, but Tableaus has made the strong push to provide localized software and data sets in multiple languages. My apologies for not listing them, there’s some weird glitch with my Adobe Reader that’s crashing only on their presentation while no other analyst is having the same problem, so I can’t provide a list. Please refer to your local Tableau rep for details.

Francois Ajenstat then took over. It was no surprise that his focus was on v9.0. He discussed it by focusing on nine points he views as key:

  • Access to more data sources
  • Answer more questions
  • Improve the user experience
  • Support analytics at scale
  • Performance as a differentiator
  • Support for mobile
  • Tableau Public redesign
  • Coming out next quarter

If you look at those, you might question why that many bullets? For instance, when it comes out is just a schedule issue and doesn’t rise to the level of the others. Tableau Public’s redesign just seems to be the obvious end of focusing on better user experience and performance.

However, a couple sound the same but should be differentiated. Analytics at scale and performance improvements overlap but aren’t identical. Francois showed both what they did to improve performance on clustered servers, helping both bigger data sources and more simultaneous users, and also demonstrating that they’ve done some great optimization in basic analytics for individuals.

One of the best parts was the honesty, now that they’re close enough to releasing v9.0, in admitting that early versions ran slowly. They showed quotes from beta testers talking about major performance improvements. In addition, Tableau Public is a great source of testing real-word analytics. Mr. Ajenstat pointed out that they took the 100 most accessed visualizations in Tableau Public and analyzed performand differences, seeing a 4x increase in performance on average. While it’s always important to generate internal tests to stress potential use, focusing on how business really use the tool is even more important in ensuring performance is seen as good in day-to-day usage by knowledge workers, not only in heavy loads by analysts doing discovery.

LOD Expressions

The one thing that really caught my eye about v9.0 is the incorporation of Level of Detail (LOD) expressions. BI firms have been adding drill-down analytics for a decade. Seeing a specific level of detail and then dropping down to a lower level is critical. However, that’s not enough.

What’s needed is to be able to visually compare the lower level details with overall numbers. For instance, a sales VP regularly wants to know not just how an individual sales person is doing, but also how that compares to the region and national numbers. Only within context can you gain insight.

Among the other things LODs help is the ability to bin aggregates. Again we can turn to sales to think about retail sales across categories while also comparing those to total sales or in a trend analysis.

While many companies are working to add more complex analysis, it’s clear that Tableau hasn’t only looked at how a very technical person can create an LOD. They’ve worked on an interface, that from the demo, has a simple and clean interface that business end users can user. Admittedly, that’s what demos are supposed to do, but I’ve seen some try and fail miserably. This seems to be a good attempt to understand business intelligence with an emphasis of the first word.

Summary

Some of the very new startups make the mistake of thinking even the first generation BI companies are too old to innovate. Those companies aren’t and are still a threat. However, Tableau is not even in the first generation and is still more nimble yet. They have their eyes on the ball and are moving forward. Even more importantly, while still focusing on their technology, as do many startups, they seem to have become mature enough to start shifting focus from the IT and business analysts to the information consumers.

Understanding what the business knowledge workers throughout the business hierarchy need, in data and performance, is what will drive the next growth spurt. Tableau seems to have them in target.

IBM and the Cloud? Don’t write it off

At today’s investor meeting, IBM execs announced a target of $40 billion in revenue for cloud, analytics, mobile, social and security software by 2018. I’ve expect to see folks talk about dinosaurs not being able to turn fast enough and predicting failure to meet that goal. I don’t know if they can do it, but to make such ardent predictions you’d have to ignore history.

Mid-sized Unix servers came along and folks talked about IBM going away.

IBM blew a chance to own PC industry and the same predictions followed them.

Linux? Freeware was going to destroy the mainframe. Oops, Linux partitions run on mainframes.

Now we know the large growth of the cloud. Much of it has been on commodity boxes. However, as data gets larger, analytics more powerful and networks become more robust, there’s clearly space for a company with such a strong history in hardware, services and adapting to changes.

After all, too many people still think of IBM as a hardware company. While it’s too early for the 2014 report, you can check the 2013 Annual Report and check page 7. Look at what a tiny percentage of the bar is hardware. Software and services are fairly even in splitting the vast majority of the revenue stream.

It’s a strong goal and will take a lot of pushing. How many politely phrased “re-orgs” will happen to lay off staff? Who knows? Will they succeed? No clue. All I expect is that they’ll continue to grow and nobody should count them out.

ClearStory at BBBT: Good, But Still A Bit Opaque

Clearstory Data presented to the BBBT last week. The company is presenting itself as an end-to-end BI company, providing data access through display. Their core is what they call Data Harmonization, or trying to better merge multiple data stores into a current whole.

Data Harmonization

The presentation started with Andrew Yeung, Director of Product Marketing, giving the overview slides. The company was founded by some folks from Aster Data and has about sixty people and its mission is to “Converge more data sources faster and enable frontline business users to do collaborative data exploration to ‘answer new questions.’” A key fact Andrew brought up was that 74% of business users need to access four or more data sources (from their own research). As I’ve mentioned before, the issue is more wide data than big data, and this company understands that.

If that sounds to you like ETL, you’ve got it. Everyone thinks they have to invent new terms and ETL is such an old one and has negative connotations, so they’re trying to rebrand. There’s nothing wrong with ETL, even if you rationalize its ELT or harmonization, it’s still important and the team has a good message.

The key differentiator is that they’re adding some fundamental data and metadata to improve the blending of the data sources. That will help lower the amount of IT involvement in creating the links and the resulting data store. Mr. Yeung talked about how the application inferred relationships between field based on both data and metadata to both link data sources and infer dimensions around the key data.

Andrew ended his segment with a couple of customer stories. I’ll point out that they were anonymous, always something in my book. When a firm trusts enough to let you use its name, you have a certain level of confidence. The two studies were a CPG company and a grocery chain, good indications of ClearStory’s ability to handle large data volumes.

Architecture

The presentation was then taken over by Kumar Srivastava, Senior Director of Product Management, for the architecture discussion.

ClearStory at BBBT - Architecture slide

ClearStory is a cloud service provider, with access to corporate systems but work is done on their servers. Mr. Srivastava started by stating that the harmonization level and higher are run in-memory on Apache Spark.

That led to the immediate question of security. Kumar gave all the right assurances on basic network security and was also quick to transition to, not hide, the additional security and compliance issues that might prevent some data from being moved from inside the firewall to the cloud. He also said the company suggests clients mask critical information but ClearStory doesn’t yet provide the service. He admitted they’re a young firm and still working out those issues with their early customer. That’s a perfectly reasonable answer and if you talk with the company, be sure to discuss your compliance needs and their progress.

Mr. Srivastava also made a big deal about collaborative analytics, but it’s something everyone’s working on, he said nothing really new and the demo didn’t show it. I think collaboration is now a checklist item, folks want to know a firm has it, but aren’t sure how to use it. There’s time to grow.

The last issue discussed by Kumar was storyboards, the latest buzzword in the industry. He talked about them being different then dashboards and then showed a slide that makes them look like dashboards. During the demo, they show as more dynamic dashboards, with more flexible drilldown and easy capture of new dashboard elements. It’s very important but the storyboard paradigm is seriously overblown.

Demo

The final presenter was Scott Anderson, Sales Engineer, for the demonstration. He started by showing they don’t have a local client but just use a browser. Everyone’s moving to HTML5 so it’s another checklist item.

While much of the demo flew by far too quickly to really see how good the interface was, there was one clear positive element – though some analysts will disagree. Based on the data, ClearStory chooses an initial virtualization. The customer can change that on the fly, but there’s no need to decide at the very front what you need the data to be. Some analysts and companies claim that is bad, that you can send the user in the wrong direction. That’s why some firms still make the user select an initial virtualization. That, to me, is wrong. Quickly getting a visualization up helps the business worker begin to immediately understand the data then fine tune what is needed.

At the end of the presentation, two issues were discussed that had important relevance to the in-memory method of working with data: They’re not good with changing data. If you pull in data and display results, then new data is loaded over it, the results change with no history or provenance for the data. This is something they’ll clearly have to work on to become more robust.

The other issue is the question of usability. They claim this is for end users, but the demo only showed Scott grabbing a spreadsheet off of his own computer. When you start a presentation talking about the number of data sources needed for most analysis, you need to show how the data is accessed. The odds are, this is a tool that requires IT and business analysts for more than the simplest information. That said, the company is young, as are many in the space.

Summary

ClearStory is a young company working to provide better access and blending of disparate data sources. Their focus is definitely on the challenge of accessing and merging data. Their virtualizations are good but they also work with pure BI virtualization tools on top of their harmonization. There was nothing that wowed me but also nothing that came out as a huge concern. It was a generic presentation that didn’t show much, gave some promise but left a number of questions.

They are a startup that knows the right things to say, but it’s most likely going to be bleeding edge companies who experiment with them in the short term. If they can provide what they claim, they’ll eventually get some names references and start moving towards the main market.

As they move forward, I hope to see more.

Datameer and Altiscale: A Tale of Two Startups

A webinar I watched was titled for the announcement of Datameer Professional, but that’s not what caught my interest. I previously blogged about Datameer’s presentation to the BBBT, so you can see where I think they’re a good, basic company who seemed to have a better grasp on the market than most startups. Sadly, that wasn’t shown today. What was most interesting was the part of the presentation that covered Altiscale, a company partnering with Datameer to provide Datameer Professional.

The Basics

The presentation was a tag team between Stefan Groschupf, CEO of Datameer, and Mike Maciag, COO of Altiscale.

It began with Stefan giving a very generic overview of the market. There was little content and that which was there wasn’t a surprise. The main point is that the demand for Hadoop has moved from IT doing behind the scenes work to business users wanting quick analysis. The lack of technical knowledge makes staffing an issue.

That led directly to Mike Maciag talking about how Altiscale provides Hadoop as a service. They’re a cloud provider of Hadoop. With founders from Google, Yahoo and LinkedIn, they understand Hadoop, the cloud and the need of companies to quickly leverage external resources to get to Hadoop with better TCO than would be the case trying to build in-house.

The reason for that quick introduction became apparent as soon as Mike tossed back to Stefan. Datameer Professional is a cloud version of their product that runs on Altiscale in the US. During Q&A, it come out that they’re using a different, unnamed provider in the EU for compliance issues over there.

One feature is they’re claiming they’ll provide three sets of basic analytics with Professional, for customer analysis, operations and cyber security – in that order. However, there was the qualification from Mr. Groschupf of “over the next few months,” so no clue how much is there now or when they’re really be available.

Case Studies

One thing I noticed placed a clear difference between the two companies as for how far they understand business customers. The Datameer customers mentioned were never named. None are approved. Therefore, I have to question the veracity of claims.

The case study presented by Altisoft was a named client. Business users want to know reality and also want to confirm that a customer has enough confidence in a vendor to attach a name.

Summary

The summary combines what I heard today and what I heard at the BBBT presentation.

Datameer is a good startup, but is still focused on technology. BI has a short life cycle and companies need to rapidly prepare for the chasm. As they showed no updates in their UI, don’t have strong market messages and didn’t have a referenced customer, Datameer is a company to look at for technology but I again wonder about the long term strategy and think the technology will be acquisition bait. Look to them for a basic interface, strong underpinnings to they don’t seem focused clearly on end-user business messages.

Altiscale is also a good startup, but they seem better focused on a business message. Perhaps that’s because they are a cloud service company so they know it is important to do so. From the little I saw, they seem to be positioning themselves well as an alternative to Amazon and other choices to help business quickly take advantage of big data on Hadoop. I’d like to hear more.

Both Datameer and Altiscale are worth looking at, but for different reasons and they don’t have to be a set.

TDWI Webinar: Embedded Analytics

The latest TDWI webinar was on embedded analytics. The speakers were Fern Halper, the director of TDWI research for advanced analytics, and Mark Gamble from OpenText. For those of you who hadn’t heard, Actuate was acquired by OpenText and is being rebranded but, according to Mark, will remain an independent division for now.

Ms. Halper’s main point is that embedded has a lot of different meanings for different audiences and that she wants to create a clear framework for understanding the terminology within the analytics space. She’s clear that what’s meant isn’t just into the mass market idea of wearable software, but that analytics can be embedded in specific applications, broader systems and, yes, devices such as mobile and wearable items.

Early in the presentation she presented a two axis image comparing structured and unstructured data combined with human and machine generated data. While I think the coloring should rotate, to emphasize that the difference between machine versus human generated information is a bigger issue than structured v unstructured, it’s a nice way of understanding some of the data streams.

TDWI Embedded Analytics - Data Sources

That, however, was a definitional slide and discussion. The real mean of Fern Halper’s presentation was the framework she described to help understand the steps of embedding analytics.

TDWI Embedded Analytics - Framework

Operationalized analytics are those that are involved in the full process of decision making. For instance, a call center employee might be talking to a prospect whose finances are flagged as a question mark. That prospect must be sent to another person to process the decision based on analytics.

Integrated analytics are those that allow the call center operator to see the analysis and immediately make decisions based upon guidelines.

Automated analytics are those that provide the operator with a decision tree response based on analytics done behind the scenes.

The only issue I take with the framework is it doesn’t necessarily mean true real time. The example discussed shows that the integrated approach can be real time for what humans think of as real time within our own interactions. Meanwhile, real-time might not be a necessary component to some automated decisions. Real-time is a separate issue and I think Fern’s framework would be better served by eliminating that item.

Fern Halper followed the framework with the usual and interesting TDWI survey numbers. This time, the questions were focused on the adoption of analytics tied to the framework. The numbers showed the unsurprising fact that analytics adoption is still in its infancy. One of the great parts of TDWI’s numbers is they show the reality which contradicts the industry’s hype.

One set of numbers I’d like to see wasn’t included. The responses were only IT responses in general, who has started using what analytics. I would have loved to see one slide that clearly showed only the sub-segment of companies who are already using analytics tools and where those companies are within Ms. Halper’s framework. Are are the bleeding edge folks doing at moving through the framework to automated solutions?

OpenText

The rest of the program was a fast presentation by Mark Gamble, pointing to OpenText’s (Actuate’s) main benefit claim of enterprise scalability and the other factors. One of the phrases I liked was his reference stating they “adhered to a low code methodology.” It’s nice to hear folks admitting that as much as we want to eliminate coding, some of that is still required. Honesty isn’t a negative in marketing and I liked that turn of phrase.

In the other direction, he mentioned there were over fourteen million downloads of BIRT and that the company “believes” they have over three million users. I’m not interested in belief but they don’t seem to have a clear figure on adoption.

The main problem I had was the demo. Mark showed experimental work positing to show live acquisition of basic automotive information such as speed and RPM displayed on a computer, phone and watch. It was not only not a business case but one that seemed to go back to the misunderstanding about the meaning of embedded which was addressed by Fern. Yes, it was embedded on two devices, but the demo didn’t show how it might be embedded in business applications. It stuck with the flashy concept of wearables.

OpenText might have something good with their analytics portability, but I don’t think the demo presents it to a business audience. Yes, techies will understand the underpinnings that make it cool, but the business folks writing checks need to see something that justifies the expenditure and I don’t think that’s shown.

Summary

Fern Halper did another good job of putting the adoption of analytics into perspective. This time, with a framework for better understanding embedded analytics.

Mark Gamble did a passable job of presenting OpenText’s solution but I feel he must do a better job of figuring out a business message.

TDWI’s data shows the early state of adoption that exists in the market. Fern Halper’s framework will help companies better understand how to move into the arena, but only if the companies providing those solutions can better present how they’ll help solve business issues.

MapR at BBBT: Supporting Hadoop and still learning

I’ve probably used this in other columns, but that’s life. MapR’s presentation to the BBBT reminded me of Yogi Berra’s statement that it feels like déjà vu all over again. Wait, if I think I’ve done this before, am I stuck in a déjà vu loop?

The presentation was a tag team effort of Steve Wooledge, VP Product Marketing, and Tomer Shiran, VP Product Management.

The Products and Their Aim

The first part of the déjà vu was good. People love to talk about freeware, but mission critical solution won’t be trusted on such. Even before Linux, before Unix, software came out and it took companies to package it with service and support to provide constancy and trust for widespread IT adoption. MapR is a key company doing that with Apache Hadoop, the primary open source technology for big data applications.

They’ve done the job well, putting together a strong company that, quite reasonably, has attracted some great investors and customers. Of course, because Hadoop is still in its infancy, even a leading company such as MapR only mentions 700 customer, companies paying for licenses; but that’s a statement about big data’s still fairly limited impact in operational systems not a knock on MapR.

Their vision statement is simple: “Empowering the As-it-happens business by speeding up the data-to-action cycle.” Note the key: Hadoop is batch oriented and all the players realize that real-time analysis matters for some key sales and marketing applications. Companies are now focusing on how fast they can get information out of the databases, not what it takes to get data in. A smart move but only half the equation.

One key part of the move to package open source into something trusted was pointed out by Steve Wooledge. When the company polled customers about why they chose MapR, the largest response was availability, the up time of the system. Better performance wasn’t far behind, but it’s clear that the company understands that availability is a critical business issue and they seem to be addressing it well.

Where the déjà vu hits in a not-so-positive way is the regular refrain of technologists still not quite getting business – even when they try. This isn’t a technology problem but an innovator’s problem. When you get so wrapped up in the cool things you’re doing, you think that you need to lead with the cool things, not necessarily what the market wants.

One example was when they were describing the complexity of the MapR packaging. Almost all the focus was on the cool buzzwords of open source. Almost lost in the mix was the mention that their software supports NFS. It was developed more than 30 years ago and helps find files on networks. That MapR helps link both the latest and the still voluminous data in existing file systems is a key point, something that can help businesses understand that Hadoop can be integrated into existing systems and infrastructure. However, it’s not cool so the information is buried.

The final thing I’ll mention about the existing products is that MapR has built a nice three product suite, providing open source, mid-tier and full enterprise versions. That’s the perfect way to address the open source conundrum and move folks along the customer curve.

Apache Drill: Has it Bitten Off Too Much?

Sorry, couldn’t help the drill bit reference. Tomer Shiran took the later part of the presentation to show off Apache’s latest data toy, Apache Drill, intended to bridge the two worlds of data. The problem I saw was one not limited to Tomer, MapR or even Apache, but to all folks with with what they think of as new technology: Over hype and an addiction to revolutionary rather than evolutionary words and messages. There were far too many phrases that denigrated IT and existing technology and implied Drill would replace things that weren’t needed. When questioned, Tomer admitted that it’s a compliment; but the unthinking words of many folks in the industry set out a pattern inimical to rapid adoption into the Global 1000’s critical information paths.

Backing up that was a reply given to one questioner: ““CIO of one of the largest tech companies said they can’t keep doing things the same way.” Tech companies tend to be bleeding edge by nature, they do not represent the fuller business world. More importantly, the idea that a CIO saying she needs to change doesn’t mean the CIO is planning on throwing out existing tools that work. It means she wants to expand and extend in a way to leverage all technology to provide better decision making capabilities to the rest of the CxO suite.

Another area of his talk finally brought forward, through a very robust discussion, of one terminology issue that many are having. Big data folks like to talk about “no schema” but that’s not really true. Even when they modify the statement to be “schema on read” it’s missing the point.

They seem to be confusing fixed layout, relational records with the theory of schemas. XML is a schema for data exchange. It’s very flexible and can be self-defined, but it’s a schema. As it came from SGML, it’s not even the first iteration of flexible schemas. The example Mr. Tomer gave was just like an XML schema. Both data source and data recipient have to know some basic information such as field names in order to make sense of data, so there’s a schema.

Flexible schemas not only aren’t new, they don’t obviate the need for flexible schemas. They’re just another technique for managing the wide variety of data that business wishes to turn into information. As long as big data folks misusing a term and acting as if they have something revolutionary, the longer they’ll retard their needed incursion into IT and business information.

Summary

Hadoop and big data aren’t going anywhere except forward. The question is at what speed. There are some great things happening in both the Apache open source world and MapR’s licensed support for that world, but the lack of understanding of existing IT and business is retarding adoption of the new and exciting technologies.

When statements such as “But the sales guy won’t do X” are used by folks who have never been in and don’t understand sales, they’re missing the market. Today’s sales person is looking for faster and more accurate information, and is using many tools people would have said the same thing about only ten years earlier. In the meantime, sales management and the CxO suite who provide guidance for the sales force are even more interested in big picture information coming from massaging large data sources.

The folks in the new arenas such as Hadoop need to realize that they are complementary to existing technologies and that can help both IT and business. When pointing that out, I was asked by one of the presenters if that meant he should do two case studies, one with Hadoop, flexible schema and one with old line uses, I gave a clear no. It should be one with new and one that shows new and existing data sources combining to give management a more holistic picture than previously possible.

Evolution is good. MapR can help. They need to do the tough part of technology and more their view from what they think is cool to what the market thinks is needed.