Trifacta at the BBBT: Better Access and Understanding of Raw Hadoop Data

Trifacta is another business intelligence company to enter the horse race (yes, I know that reference is spelled differently…). They are focused on providing an early look at data coming out of Hadoop, to create some initial form and and intelligences for business use.

Last Friday, Trifacta was the presenting company at the BBBT. Their representatives were Adam Wilson, CEO, Michael Hiskey, Interim Marketing Lead, and Wei Zheng, VP Products. The presenters were there to discuss the company’s position in data wrangling. While some folks had problems with the term, as Michael Hiskey pointed out, it as term that they didn’t invent. Me? I think it makes more sense than another phrase our industry uses, data lake; but that’s another topic.

Simply put, Trifacta is working to more easily provide a view into Hadoop data by using intelligence to better understand and suggest field breaks, layouts and formats, to help users clean and refine the data in order for it to become useable information for analysis.

Michael and the others talk about self-service data preparation, implying end, business user involvement. The problem is that they’re messaging far ahead of the product. They, as lots of other companies are also doing, try too hard to imply an ease of use that isn’t there. Their users are analysts, IT or business. The product is important and useful, but it’s important to be clear about to whom it is useful. (Read more about self-service issues).

The Demo

While Michael Mr. Hiskey and Mr. Wilson gave the introduction, the meat was in Ms. Zheng’s presentation. As a guy who has spent years in product marketing, I have a bit of a love hate relationship with product management. Have had some great ones and very poor ones – and I’m sure the views of me also spread that spectrum. I’ll openly say that Wei Zheng is the most impressive example of a VP of Products I’ve heard in a long time. She not only knows the products, she was very clear about understanding the market and working to bridge that to development. How could any product marketer not be impressed? Her demo was a great mix between product and discussions about both current usage and future strategy.

One of the keys to the product Wei Zheng pointed out is that the work Trifata is doing does not include moving the data. It doesn’t update the data, it works by managing metadata that describes both data and transformations. Yes, I said the word. Transformations. Think of Trifacta as simplified ETL for Hadoop, but with a focus on the E & T.

The Trifacta platform reads the Hadoop data, sampling from the full source, and uses analysis to suggest field breaks. Wei used a csv file for her demo, so I can’t speak to what mileage you’ll experience with Hadoop data, but the logic seems clear. As someone who fifteen years ago worked for a company that was analyzing row data without delimiters to find fields, I know it’s possible to get close through automation. If you’re interesting, you should definitely talk with them and have them show you their platform working with your data.

The product then displays a lot of detail about the overall data and the fields. It’s very useful information but, again, it’s going to be far more useful to a data analyst than to a business user.

Trifacta also has some basic data cleansing functions, such as setting groups for slightly different variations of the same customer company name and then changing them to something consistent. Remember, this is done in the metadata; the original data remains the same. You can review the data and the cleaned data will show, but the original remains until you formally export to a clean data file.

Finally, as the demonstration clearly shows, they aren’t trying to become a BI visualization firm. They are focused on understanding, organizing and cleaning the data before analysis can be done. They partner with visualization vendors for the end-user analytics.

Summary

Trifacta has a nifty little product for better understanding, cleaning and providing Hadoop data. Analysts should love it. The problem is that, unlike what their presentation implies, Hadoop does not equal big data. They have nothing that helps link Hadoop into the wider enterprise data market. They are a very useful tool for Hadoop, but unless they quickly move past that, other vendors are already looking at how to make sense of the full enterprise data world. They seem to have a great start in a product and, from my limited exposure to three people, a very good team. If you need help leveraging your Hadoop data, talk with them.

Leave a Reply

Your email address will not be published.