New Reference Architecture Accelerates Data Integration and Analytics Pipeline for Hadoop at Scale
June 28, 2016, HADOOP SUMMIT, SAN JOSE, CA — Pentaho, a Hitachi Group Company, today announced “Filling the Data Lake”, a blueprint that helps organizations architect a modern data onboarding process for ingesting big data into Hadoop data lakes that is flexible, scalable, and repeatable. Data management professionals can now offload the drudgery of the data preparation process and spend more time on higher value-added projects.
According to Ventana Research, big data projects require organizations to spend 46 percent of their time preparing data and 52 percent of their time checking for data quality and consistency. By following Pentaho’s “Filling the Data Lake” blueprint, organizations can manage a changing array of data sources, establish repeatable processes at scale and maintain control and governance along the way. With this capability, developers can easily scale ingestion processes and automate every step of the data pipeline.
“With disparate sources of data numbering in the thousands, hand coding transformations for each source is time consuming and extremely difficult to manage and maintain,” said Chuck Yarbrough, Senior Director of Solutions Marketing at Pentaho, a Hitachi Group Company. “Developers and data analysts need the ability to create one process that can support many different data sources by detecting metadata on the fly and using it to dynamically generate instructions that drive transformation logic in an automated fashion.”
Revealed in a Forrester Consulting report commissioned by Pentaho, on average, 52% of firms blend together 50 or more distinct data sources to enable analytics capabilities, about a third (34%) blend 100 or more data sources, and 12% blend 1,000 or more. While many organizations use Python or other scripting languages to code their way through these data sources, the “Filling the Data Lake” architecture reduces dependence on hard-coded data ingestion procedures to unlock huge potential for operational efficiency gains, increase cost savings, and greatly ease the maintenance burden.
“A major challenge in today’s world of big data is filling Hadoop data lakes in a simple, automated way. Our team was passionate about identifying repeatable ways to accelerate the big data analytics pipeline and have developed an approach to drive more agile and automated big data analytics at scale,” added Yarbrough.
Pentaho has created four other blueprints to help enterprises quickly optimize and tackle their big data projects. Find out more on: Optimize Data Warehouse, Monetize My Data, Streamlined Data Refinery and Customer 360-Degree View.
Resources
About Pentaho, a Hitachi Group company
Pentaho, a Hitachi Group company, is a leading data integration and business analytics company with an enterprise-class, open source-based platform for diverse big data deployments. Pentaho’s unified data integration and analytics platform is comprehensive, completely embeddable and delivers governed data to power any analytics in any environment. Pentaho’s mission is to help organizations across multiple industries harness the value from all their data, including big data and IoT, enabling them to find new revenue streams, operate more efficiently, deliver outstanding service and minimize risk. Pentaho has over 15,000 product deployments and 1,500 commercial customers today including ABN-AMRO Clearing, BT, EMC, NASDAQ and Sears Holdings Corporation. For more information visit www.pentaho.com.