If you like my article, and I know you will, check out the full journal.
Read my article in the latest TDWI journal.
Yesterday’s TDWI webinar was focused on data-centric security. The tag team was Fern Halper, Research Director for Advanced Analytics, TDWI, and Jay Irwin, Director of InfoSec, Teradata. It’s always nice when the two halves of a sponsored presentation fit well. For that reason and for the content, this was a nice presentation.
Everyone in the industry knows that data breeches happen, and we all talk about the issue. I’ve seen a few articles and lists about the number of successful attacks, but Fern Halper pointed us to a nice graphic from Information is Beautiful. She also pointed to another study that showed that “In 2013, 33% of respondents said their company had a data breach. In 2014 the percentage has increased to 43%.” It’s always a race between black hats and white hats, so it’s important to minimize not only your chance of getting hacked, but also to minimize the importance and usefulness of data gained from successful hacks.
Ms. Halper than discussed four types of data security:
Each part is necessary but insufficient. Authorization is only as strong as people’s passwords. If it’s easy to steal the encryption key, encryption doesn’t matter. A robust security system leverages all the types.
One important note: Later in her presentation and throughout Jay Irwin’s section, encryption didn’t exist alone but alongside tokenization. The later is a different security technology, where characters, words, numbers and fields are replaced with other symbols, or tokens, that still look as if they’re real and can still be used in analysis. Mr. Irwin pointed out he prefers “data protection” as a rubric that covers all the techniques of data level security.
Along with that clarification, Jay Irwin also described the multiple layers as “Defense in Depth,” a concentric ring of security to ensure there’s no single point of failure. Jay also provided my favorite slide of the presentation. While it’s too wordy, it’s a pretty clear view of Teradata’s top-down approach.
An organization must start with understanding the rules and regulations that drive data security. Only then can you identify the data assets that need special attention in order to protect them from hackers.
Jay has a lot more to say in a lot more detail, and I won’t cover it all. While I blog about webinars so you don’t have to watch, this one’s an exception. If you want to get a good, broad view of core data security issues, take some time and listen to the webinar.
This week’s TDWI hosted webinar was about engaging business and, once again, it came from the standpoint of technologists rather than from business. There were some very good things said. However, until our industry stops thinking of business knowledge workers as children to be tutored and begins to think about them as people whose knowledge is the core of what we must encapsulate, we’ll continue to miss the mark and adoption of solutions will remain slow.
The main presenter was David Loshin, President of Knowledge Integrity. He began the presentation with a slide that describes his view of the definition of “data driven,” including three main points:
We should all clearly understand that the first item is not new and was not created by the business intelligence (BI) industry. Business has always been data driven. What we’re able to do now is access far more data than ever before so that we can provide a more robust view of the corporation.
The second bullet is a core point. Mr. Loshin used a couple of example such as sales territory and other areas where definitions are fuzzy. One clear difference to me is one I directly experienced 25 years ago, and more directly addresses the visualization side of the BI conundrum. I was working for a major systems integrator (SI) and my client was, well, let’s just say it was a large, fruit based computing company.
A different SI had created an inventory system for the client’s manufacturing facility but the system was a failure though all the right data was in the system. The problem was that the reports were great for the accounting department, not for inventory and manufacturing. We interviewed the inventory team and then rewrote reports to address and present the information from their standpoint.
Too often, technologists get lost in the detailed data definitions and matching fields across data sources. That is critical, but it loses the big picture. Even when data is matched, different business people use data differently.
Which brings us to David Loshin’s third point. No, we don’t need to enforce exact standards for utilization. We need to ensure that the data each consumer refers to is consistent, but we must do a better job in understanding that different departments can utilize the exact same data in a variety of ways.
David did get to the key issue a bit later, on a slide titled Operationalizing Business Policies. He points out that it’s critical to ensure that “Information policies model the data requirements for business policy.” This is key and should be bubbled up higher in the mindset of our industry. While I hear it mentioned often, it seems to be honored more in the breach.
Time was spent discussing the importance of understanding different users and their varying utilization of data. As I mentioned in the introduction, the solution to the new complexities then veers from addressing business needs to ignoring history. In a previous blog post, I discussed how many in the industry seem to be ignoring the lessons to be learned from the advent of the PC. Mr. Loshin seems to be doing that when he talks about empowering the business users to set their own usability rules. He splits IT and business in the following way:
The issue I have with that argument is a phrase that didn’t appear in this webinar until Linda Briggs, the moderator, mentioned it in a poll question right before Q&A: Data Governance. Corporations are increasingly liable for how they control and manage information. It does not make sense to allow each user to define their own data needs in a void. Rather than allow for massively expanded and relatively uncontrolled access to data and then later have to contract access, as corporations had to regain a handle on what was being done on scattered desktop computers, BI vendors should be positioning data governance from the start.
Whether it’s by executive fiat, a cross-functional team, or some other method, companies need to clarify data governance rules. Often, IT is the best intermediary between groups, actively participating in data governance definition as an impartial observer and facilitator. It is then the job of IT to ensure that it provides as open access as possible to business workers given their needs and the necessity of following governance rules.
There was one question, during Q&A, on the importance of data governance. I thought David Loshin again understated its importance while Harald Smith, Director of Product Management at Trillium, the webinar sponsor, had the comment that “everyone is responsible for data governance.” That is my only mention of the sponsor, as I felt his portion of the presentation was a recitation of sound bites, talking points and buzz words that didn’t provide any value to the hour.
David Loshin has a clear view of engaging the business and gets a number of key things correct. However, that view is one of a technologist looking over a self-imagined bridge separating technology and business. There’s not a bridge separating IT and business. They overlap in many critical areas and both must learn from and work well with each other.
TDWI held a webinar to announce their latest major report. While there are always a lot of intriguing numbers in the reports, it’s also important to remember the TDWI audience is self-selecting. People interested in the latest information lean towards the leading edge so their numbers should be taken as higher than would be in the general IT market place. Still, the numbers as they change over time are valuable and the views of the analysts are often worth hearing.
As the webinar was pushing a major report, the full tag team was in attendance: David Stodder, TDWI Director for BI, Fern Halper, TDWI Director for Analytics, and Philip Russom, TDWI Director for Data Management.
David Stodder presented his section first, and one important point he made had nothing to do with numbers. He briefly discussed one quote and user story and it was from a government employee. Companies using Hadoop to better understand internet business and relationships tend to get almost all the press, but David pointed out the importance of data and analytics in helping governments better address the needs of their citizens.
A very intriguing set of numbers David provided was on how many responders were on current versions of software versus older versions. While you can see that some areas are more quickly adopting the SaaS model, that’s not the key the he pointed out. Only 27% of respondents said they’re on the current version of their data security software. A later slide shows that security is one reason for hesitation in the move to mobile, but Mr. Stodder rightly points out that underlying all the information channels is the basis of data security. It’s not a question of if you’ll get hacked but when, so data security should be kept updated.
The presentation was then turned over to Fern Halper. I look a bit askance at the claim that the Internet of Things (IoT) is a “trend.” Her data shows only 18% taking advantage of it today and 40% might be using in within three years. We’ve been talking about IoT for a while, and it’s clearly being slowly integrated into business, I wouldn’t say it’s as fashionable as the word trend would imply.
On the more useful side is the table she showed that’s simply titles “Analytics hits mainstream.” It not only shows that massive adoption of the last decade’s focus on dashboards and BI tools, but around 30% of respondents are using many of the newer tools and techniques and the next three years indicate a doubling in usage.
Philip Russom gave the final segment of the presentation. His first slide on the adoption of newer technologies for data warehousing showed something that many have finally admitted in the last year or no: No-SQL is an excuse made by people who don’t understand how business technology works. While the numbers show 28% of respondents using Hadoop, it also shows 22% using SQL on Hadoop. The number over the next three years are even more interesting: 36% say they’ll be using Hadoop and 38% will be using SQL on Hadoop. That means existing No-SQL folks will be moving to SQL.
The presentation ended with the team of analysts presenting their list of ten priorities for those people interested in emerging technologies. To me, the first isn’t the first among equals, it is set far above all the rest: Adopt them for their business benefits. All the other nine items are how IT addresses the challenges of new technologies, but those things are useless unless you understand how technologies will support business. Without that, you can’t provide an ROI and you can’t get business stakeholders to support you for long. That’s strategy, all the other points are just tactics.
As usual, get the report and browse it.
This is more of a marketing flavored post as the recent presentation seemed to miss its own point. The title implied it was about fast decision making, but Fern Halper, TDWI Research Director for Advanced Analytics, gave a rather generic presentation about the importance of operationalizing analytics.
Fern gave a nice presentation about operationalizing analytics, but it was not significantly different than her last few. In addition, some of the survey issues discussed were clearly not well thought out. For instance, Ms. Halper listed the expected growth of predictive analytics and web/mobile analytics as if they belonged in the same discussion. The fact that web and mobile are methods of display doesn’t overlap with whether they are used to display descriptive or prescriptive analytics. The growth of those display methods also don’t move away from the use of dashboards in CRM and ERP applications, as was implied, since those applications will migrate views to the new display methods.
The best thing mentioned by both Fern Halper and the SAP presenters was the fact that there were multiple references to that need for multiple data sources. Seeing the continued refocusing of many firms on wide data rather than big data is a good thing for the industry. Big data is more of a technical issue while wide data more directly addresses complex business environments.
Now I’m hoping for more people to begin to refer to loosely structured data rather than unstructured data. Linguists, I’m sure, are constantly amused at hearing languages referred to as unstructured.
The case study was by Raj Rathee, Director, Product Management, SAP. It was an interesting project at Lufthansa, where real-time analytics were used to track flight paths and suggest alternative routes based on weather and other issues. The business key is that costs were displayed for alternate routes, helping the decision makers integrate cost and other issues as situations occur. However, that was really the only discussion of fast decision making with analytics.
The final marketing note is that the Q&A was canned but the answers didn’t always sync up. For instance, the moderator asked one question of Fern, she had a good answer, but there was no slide in the pack about her response, just the canned SAP slide referenced by Ashish Sahu, Director, Product Marketing, SAP, after Ms. Halper spoke.
I think the problem was that the presenters didn’t focus down on a tight enough message and tried to dump too much information into the presentation. The message got lost.
The most recent TDWI webinar had a guest analyst, David Loshin of Knowledge Integrity. The presentation was sponsored by Liaison and that company’s speaker was Manish Gupta. Given that Liaison is a cloud provider of data integration, it’s no surprise that was the topic.
David Loshin gave a good overview of the basics of data integration as he talked about the growth of data volumes and the time required to manage that flow. He described three main areas to focus upon to get a handle on modern integration issues:
Data curation is the organization and management of data. While David accurately described the necessity of organizing information for presentation, the one thing in curation that wasn’t touched upon was archiving. The ability to present a history of information and make it available for later needs. That’s something the rush to manage data streams is forgetting. Both are important and the later isn’t replacing the former.
The most important part of the orchestration Mr. Loshin described was in aligning information for business requirements. How do you ensure the disparate data sources are gathered appropriately to gain actionable insight? That was also addressed in Q&A, when a question asked why there was a need to bother merging the two distinct domains of data integration and data management. David quickly pointed out that there was no way not to handle both as they weren’t really separate domains. Managing data streams, he pointed out, was the great example of how the two concepts must overlap.
Data monitoring has to do with both data in motion, as in identifying real-time exceptions that need handling, and data for compliance, information that’s often more static for regulatory reporting.
The presentation then switched to Manish Gupta, who proceeded to give the standard vendor introduction. It’s necessary, but I felt his was a little too high level for a broader TDWI audience. It’s a good introduction to Liaison, but following Mr. Loshin there should have been more detail on how Liaison addresses the points brought up in the first half of the presentation – Just as in a sales presentation, a team would lead with Mr. Gupta’s information, then the salesperson would discuss the products in more detail.
Both presenters had good things to say, but they didn’t mesh enough, in my view, and you can find out far more talking to each individually or reading their available materials.
The Internet of Things (IOT) is something more and more people are considering. Wednesday’s TDWI webinar topic was “Stream Processing: Streaming Data in Real Time, in Memory,” and the event was sponsored by both SAP and Intel. Nobody from Intel took part in the presentation. Given my other recent post about too many cooks, that’s probably a good thing, but there was never a clear reason expressed for Intel’s sponsorship.
Fern Halper began with overview of how TDWI is seeing data streaming progress. She briefly described streaming as dealing with data while still in motion, as opposed to data in warehouses and other static structures. Ms. Halper then proceeded to discuss the overlap between event processing, complex event processing and stream mining. The issue I had is that she should have spent a bit more time discussing those three terms, as they’re a bit fuzzy to many. Most importantly, what’s the difference between the first two?
The primary difference is that complex event processing is when data comes from multiple sources. Some of the same things are necessary as ETL. That’s why the in-memory message was important in the presentation. You have to quickly identify, select and merge data from multiple streams and in-memory is the way to most efficiently accomplish that.
Ms. Halper presented the survey results about the growth of streaming sources. As expected, it shows strong growth should continue. I was a bit amused that it asked about three categories: real-time event streams, IOT and machine data. While might make sense to ask the different terms, as people are using multiple words, they’re really the same thing. The IoT is about connecting things, which interprets as machines. In addition, the main complex events discussed were medical and oil industry monitoring, with data coming from machines.
Jaan Leemet, Sr. VP, Technology, at Tangoe then took over. Tangoe is an SAP customer providing software and services to improve their IT expense management. Part of that is the ability to track and control network usage of computers, phones and other devices, link that usage to carrier billing and provide better cost control.
A key component of their needs isn’t just that they need stream processing, but that they need stream processing that also works with other less dynamic data to provide a full solution. That’s why they picked SAP’s Even Stream Processor – not only for the independent functionality but because it also fits in with their SAP ecosystem.
One other decision factor is important to point out, given the message Hadoop and other no-SQL folks like to give. SAP’s solution works in a SQL-like language. SQL is what IT and business analysts know, the smart bet for rapid adoption is to understand that and do what SAP did. Understand the customer and sales becomes easier. That shouldn’t be a shock, but technologists are often too enamored of themselves to notice.
Neil McGovern, Sr. Director, Marketing, at SAP gave the expected pitch. It was smart of them to have Jaan Leemet go first and it would have been better if Mr. McGovern’s presentation was even shorter so there would have been more time for questions.
Because of the three presenters, there wasn’t time for many questions. One of the few question for the panel asked if there was such a thing as too much data. Neil McGovern and Jaan Leemet spent time talking about the technology of handling lots of streaming data, but only in generalities.
Fern Halper turned it around and talked about the business concept of too much data. What data needs to be seen at what timeframe? What’s real-time? Those have different answers depending on the business need. Even with the large volume of real-time data that can be streamed and accesses, we’re talking about clustered servers, often from a cloud partner, and there’s no need to spend more money on infrastructure than necessary.
I would have liked to have heard a far more in-depth discussion about how to look at a business and decide which information truly requires streaming analysis and which doesn’t. For instance, think about a manufacturing floor. You want to quickly analyze any data that might indicate failures that would shut down the process, but the volumes of information that allow analysis of potential process improvements don’t need to be analyzed in the stream. That can be done through analysis of a resultant data store. Yet all the information can be coming across the same IoT feed because it’s a complex process. Firms need to understand their information priority and not waste time and money analyzing information in a stream for no purpose other than you can.
Tuesday’s TDWI webinar had a guest star: Claudia Imhoff. The topic was predictive analytics and the presentation was sponsored by SAP, so Pierre Leroux, Director of Product Marketing, SAP, also had his moment towards the end. Though the title was about predictive analytics, it’s best to view the presentation as an overview of the state of analytics, and there’s much more to discuss on that.
The key points revolved around a descriptive slide Ms. Imhoff presented to describe the changing analytics landscape.
Claudia Imhoff described the established EDW information supply chain as being the left half of the diagram while the newer information, with web, internet of things (IOT) and other massive data sources adding the right hand side. It’s a nice, clean way of looking at things and makes clear that the newer data can still drive rather than eliminate the EDW.
One thing I’d say is missing is a good name for the middle box. Many folks call was Ms. Imhoff terms the Date Refinery a Data Lake or other similar rationalizations. My issue is that there’s really no need to list the two parts separate. In fact, there’s a need to have them seamlessly accessible as a whole, hence the growth of SQL for Hadoop and other solutions. As I’ve expressed before, the combination of the data integration and data refinery displayed are just the next generation of the ODS. I like the data refinery label, but think it more accurately applies to the full set of data described in the middle section of the diagram.
Claudia also described, the four types of analytics:
It’s important to understand the difference in analysis because each type of report needs to have a focus and an audience. One nit I have with her discussion of these was the comment that descriptive analytics are the least valuable. Rather, they’re the least strategic. If we don’t know what happened, we can’t feed the other types of analytics, plus, reporting requirements in so much of business means that understanding and reporting what happened remains very valuable. The difference is not how valuable, but in what way. Predictive and prescriptive analytics can be more valuable in the long term, but their foundation still resides on descriptive.
My biggest complaint with our industry at large is still the obsession with the mythical data scientist. Claudia Imhoff spent a good amount of time on the subject. It’s a concept with super human requirements, with Claudia even saying that the data scientist might be the one with deep business knowledge. Nope. Not going to happen.
In Q&A, somebody brought up the point I always mention: Why does it have to be one person rather than a team. Both Claudia Imhoff and Pierre Leroux admitted that was more likely. I wish folks would start with that as it’s reasonable and logical.
I was a programmer as folks began calling themselves software engineers. I never liked that. The job wasn’t engineering but a blend of engineering and crafting. There was art. The two presenters continued to talk about the data scientist as having an art component, but still think that means the magical person is still a scientist. In addition, thirty year ago the developer was distanced much further from business, by development methods, technology and business practice. Being closer means, again, teamwork, with each person sharing expertise in math, coding, business and more to create a robust solution.
That wall has been coming down for years, but both technology and business are changing rapidly and are far more complex. The team notion is far more logical.
The other major problem I had was a later slide and words accompanying it that implied it’s up to the business people to get on board with what the technologists are doing. They must find the training, they must learn that analytics are the answer to everything.
Yes, we’re able to provide better analytics faster to management than in the past. However, they’re not yet perfect nor will they be. Models are just that. As Pierre pointed out, models will never explain 100%.
Claudia made a great point earlier about one of the benefits of big data is to eliminate sampling and look at what the entire market is doing, but markets are still complex and we can’t glean everything. Technologists must get of the high horse and realize that some of the pushback from management is because the techies too often tend to dismiss intuition and experience. What needs to happen is for the messages to change to make it clear that modern analytics will help executives and line management make better decisions, not that it will replace their decision making.
In addition, quit making overly complex visualization that have great scientific relevance but waste time. The users do not need to understand the complexities of systems. If we’re so darned smart, we can distill the visualizations to things easier to comprehend so that managers can get the information, add it to all the other information and experience and make decision.
Technologists must adapt to how business runs as much as business must adapt to leverage technology.
The title of the presentation misrepresents the content. It was a very good presentation for understanding the high level landscape of the analytics information supply chain and it’s a discussion that needs to be held more often.
You’ll notice I didn’t say much about the demo by Pierre Leroux. That’s because of technical issues between demo and webinar software. However, both he and Claudia Imhoff took questions about the industry and market and gave thoughtful answers that should help drive the conversation forward.
Yesterday’s TDWI webinar was sponsored by Liaison Technologies, who did the same thing last year. It’s a push for another acronym. While the acronym isn’t needed, the concept is. Data Platform as a Service is just using the cloud to help with data integration. Gosh, complex, ‘eh? I think it’s the natural progression of technology and business, it’s just data management on the cloud. But forget the marketing, let’s talk about the concept.
The presentation’s first half was delivered by Phillip Russom. He started with some very trivial level setting but then quickly got to a key point. If you’ve been around for a while, you remember Best of Breed. That’s when each vendor focused product company, somewhere in the information supply chain, talked about their openness and how you could piece together a solution from different vendors. That made sense at the time, since many companies were each creating the early version of parts of a full solution.
As Phillip pointed out, times have changed. We now better understand business needs, have learned more about coding the requirements and can access far better hardware than we had fifteen years ago. That means IT is looking for what they couldn’t find back then: An integrated solution from a single or a far more limited number of vendors. They want something simpler than a hodgepodge of multiple systems.
The advantages of the cloud aren’t specific to data management. One very key business driver that was minimized in Mr. Russom’s presentation but brought out later by Patrick Adamiak during his presentation then revisited by both in the Q&A is capex versus opex – something often ignored by technical folks. Having your own hardware and data center is not just costly, it’s part of capital expenditure. Service contracts with a cloud vendor are operational expenses. That means the CxO suite and Board are often happier with that because it’s not as locked it and creates flexibility in the corporate financial picture.
One nit I had with Mr. Russom’s presentation was his statement that cloud is another architecture, like client/server or the web. The cloud and web are client server, that’s not the issue. It’s another architecture in two other key aspects: The already mentioned capex/opex divide, and the way it changes a software vendor’s ability to manage and update their software in comparison to on-premises installations.
One caution he gave that needed more explanation for folks new to the cloud was when Mr. Russom mentioned that you need to ask about the elasticity of the cloud implementation. For those who might not have heard the term, elasticity is the ability to grow or shrink cloud resources in order to match processing demands. In other words, if you get a big data dump from another source, can you quickly access more disk space? Or, from the Web side of the house: You’re hosting a big event or making a major announcement on your Web site: Can site resources be replicated quickly to handle the additional load then released when no longer needed?
I was impressed by the fact that capex was mentioned on Patrick Adamiak’s first slide. Cloud technology has multiple advantages that can be communicated to IT, but it’s the capex/opex issue that will help close the deal in an enterprise setting. Liaison seems to understand the need to blend technical and business messages.
However, most of Mr. Adamiak’s presentation seemed to be about justifying the new acronym. The main slide compared dPaaS with other supposed solutions without admitting there’s really a lot of overlap between them. The columns weren’t as different as he’d like them to be.
His company slides didn’t seem any different than those I’ve seen from the many other firms in the space. Forget all of that, it was in a short webinar with TDWI, so he had limited time.
The fact is that Liaison claims they are where the market is going. They are vertically integrating the information supply chain while leveraging the cloud for its business and technology advantages. For those in IT looking to simplify their world, Liaison is a company that should be investigated.