The big theme in this year’s Strata Data Conference in San Jose was how Data Science can be a business game changer. In his keynote remarks, Eric Colson from Stitch Fix pointed out that differentiation has always been key to doing business. Data Science now offers a new opportunity for organizations to set themselves apart.
New Data Sciences Will Provide New Business Insights
If a data scientist can come up with an insight that no one else has thought of yet, it could give considerable advantage over the competition. Eric suggested that the Data Science function should report directly to the CEO. This point was echoed by Mike Olson, Chief Strategy Officer at Cloudera. In his Executive Briefing Session on Machine Learning, Mike quoted a Harvard Business Review study that reported that less than 50% of all structured data and less than 1% of unstructured data collected by an organization are currently used to make business decisions. Now that the technology is available to analyze all this data, it is important to have the person gaining those insights communicate directly to the business decision makers to effect real change.
Execs at Strata Point to Transformative Data Science Use Cases that Drive Revenue!
Some examples where Data Science has already transformed business were mentioned in various sessions and keynotes. Mike Olson mentioned the State of Kentucky. Their initiative to analyze sensor and weather data to do a better job of snow removal has been so successful that other states have bought the solution developed by Kentucky. Navistar, a transportation company that has used predictive maintenance models to keep their vehicles on the road was able to increase their profits in a very competitive market by transforming their business: they can sell up-time instead of just selling vehicles. They have also identified a new revenue stream, by selling their sensors.
Data Science Challenges for Humans and Machines
While there have been great advances in Data Science, there were also plenty of reminders about the remaining challenges. Computers are not yet as good at learning as humans, which can lead to disconcerting results. Hilary Mason from Cloudera Fast Forward Labs showed the results of some web searches that combined her information with that of an actress that shares her name (and little else). Janelle Shane maintains a blog of neural networks training mishaps and shared some examples of misidentified images in her keynotes. Humans also take their share of the blame. In his hilarious keynotes, Seth Stephens-Davidowitz, author of ‘Everybody Lies’, compared how people completed the sentence “My husband is…” in two (very) different contexts: Facebook posts versus private Google searches.
Even for businesses that rely on data sets that are less “out in the wild”, such as sensor data, or customer transaction data, there are big challenges. Data needs to be fresh; models become obsolete after a certain amount of time and need to be retrained on new data. The ‘echo chamber’ effect is also hard to counter: if your data is not diversified enough, you may be drawing incorrect conclusions.
Collecting all your data in one place is still a challenge. Gwen Shapira from Confluent delivered a session on the evolution of ETL to a standing-room only audience. She gave an example of a hotel use case for processing streaming customer data to deliver relevant promotions to elite members. The traditional approach of enriching the data with a database lookup wouldn’t scale, since there were millions of weblog events coming in. Her proposal was to cache the customer data and use a Change Data Capture solution to keep it fresh.
If an industry is regulated, the data needs to be governed. Data lineage and governance was another challenge discussed in a few sessions, such as the detailed GDPR executive briefing by Cloudera and the ODPi initiative presented by ING and the Linux Foundation. Syncsort CTO, Dr. Tendü Yoğurtçu, PhD likened governance to having a farm-to-table view of your data.
Data Science Agility: Problem Infatuation versus Problem Solving
In order to take advantage of the business transformation promises of Data Science, it’s important to be agile. The quicker Data Scientists can get their hands on clean, trusted data, the quicker they can begin asking innovative questions. In Mike Olson’s briefing, he quoted the research that indicates data scientists still spend 80% of their time on data preparation instead of analysis.
There is no one-size fits all solution. Tobias Ternstrom from Microsoft suggested in his keynotes that when selecting a solution, it’s important to focus on how to bring value to your business and have an objective proof of concept, instead of getting attached to a solution that sounds exciting. This ‘fall in love with the problem, not the solution’ sentiment was echoed by Ted Malaska, who shared his experiences with technology selection and cautioned against letting your passions mislead you.
Having an ETL solution that understands the Big Data ecosystem as well as the traditional enterprise business models and data sources can be a great asset in eliminating data silos, ensuring the data is clean and up-to-date, and providing governance. Syncsort DMX-h makes it easy to ingest and integrate data from all sources, including Mainframe, IBM i, relational databases, and streams such as Kafka, and to populate the data lake on any Big Data ecosystem. DMX-h supports all Hadoop distributions, works the same way on batch and streaming data, on premise and in the cloud, on MapReduce and Spark. In her , Dr. Tendü Yoğurtçu discussed the trends complicating data governance, and new Change Data Capture, data quality, and governance capabilities that Syncsort has delivered to address the evolving needs, helping customers meet these challenges and make data part of their core strategy.
Make sure to download our eBook, “The New Rules for Your Data Landscape“, and take a look at the rules that are transforming the relationship between business and IT.
Bigdata and data center
thanks you RSS link