At the recent Strata San Jose Conference, Paige Roberts of Syncsort sat down to speak with Roni Fontaine, Director of Product Marketing at Hortonworks. In the first of our two-part interview, Fontaine speaks about what’s new at Hortonworks as well as the ever-changing landscape of data.
Roberts: Can you introduce yourself for our readers.
Fontaine: Hi, I’m Roni Fontaine. I’m a director of product marketing at Hortonworks and I handle our HDP, our Cloud, and our IBM Db2 Big SQL solutions for the company.
The partnership with IBM has been really big this year.
It definitely has. We announced the partnership last June, and what’s really great about the partnership is it’s a two-way partnership. IBM is now replacing their Hadoop distribution with HDP and they’re reselling HDP and HDF, which is Hortonworks Data Flow, which is our streaming product. We’re reselling the IBM DSX, their Data Science Experience as well as IBM Db2 Big SQL. We’re doing some joint events with them, digital campaigns and that sort of thing.
So, in addition to your normal product marketing for Hortonworks, you’re doing some product marketing for IBM.
Yeah, for Db2 Big SQL. Exactly. As a matter of fact, myself and the offering manager for Big SQL just completed a webinar on March 28 called Making Enterprise Big Data Small with Ease. There seems to be a lot of interest in SQL and Apache Hive and Db2 Big SQL.
On the HDP side, I know Apache Hadoop 3 just came out. What do you think is the coolest thing about that?
The thing that we’re excited about is erasure coding. Right now, the 3X replication method of storing data has an overhead of about 200%. Erasure coding will reduce that overhead by about 50%. There’s also containerization, which will make it really fast and easy to roll out microservices and applications via containers. There’s NameNode Federation, which will help for scaling. There is NameNode Standby which has to do with High Availability. And then there is GPU Support which everyone is very excited about because that will really help with the performance and speed for all that data. I did a blog post on How Apache Hadoop 3 adds Value to Apache Hadoop.
I noticed that a lot of the newer data science tools are supporting GPUs now. As far as the Strata conference, and the Big Data industry in general, what are some of the big changes that you’ve seen over the past few years?
I think that they’re trying to go beyond Hadoop and be known as a data conference because you can include more topics, such as machine learning, data science. Last year Internet of Things was really big. Another thing I’ve noticed is there’s definitely a lot more interest in Cloud. It’s a lot easier for smaller companies to migrate their workloads to the Cloud or even just start up in the Cloud, depending on how old the company is. What we’ve learned from talking to customers about moving their workloads to the Cloud, is they’re really concerned about enterprise security. Security is still huge, security and governance.
Yeah. I’m seeing a big up-surge in governance concern. Especially with all the GDPR excitement.
Exactly. So, that’s another big topic. There’s also been a change in how the software looks, and maybe this is just me noticing it, but a big change in the GUIs. There are a lot of visualizations and graphics. I was in a talk on Hadoop 3.0 today, and they showed the new UI for YARN, which is the common circles and graphs that you see in everything. It makes it easier to use.
It seems to me like the whole ecosystem of Big Data is moving towards more ease of use. I mean, the first push was, make it awesome. Make it work, make it fast, make it big, make it scale. Now the push is more to make it easier to use, make it more useful.
The rate of data is just exploding, so this has to be able to scale. We thought it was big before. It gets bigger every year and it seems like there’s always a need to get rid of all of these data silos. No matter what we do, it seems like we end up getting new silos and then we have to grow out our data, to access all that.
The whole idea was, hey, forget these data silos, put all your data in the data lake, and now everybody has 14 data lakes, and they don’t know what to do.
Tune in for the next installment when Fontaine explains some of her experiences as a woman in the tech world.
Make sure to download our eBook, “The New Rules for Your Data Landscape“, and take a look at the rules that are transforming the relationship between business and IT.
Attending either of the DataWorks Summit events? See us in Berlin from April 16-19 and in San Jose from June 17-21!