Orchestration streamlines the machine learning workflow to help data scientists, engineers and analysts collaboratively build and deploy predictive models on big data
March 14, 2017, San Jose CA — Pentaho, a Hitachi Group Company, today announced orchestration capabilities that streamline the entire machine learning workflow and enable teams of data scientists, engineers and analysts to train, tune, test and deploy predictive models. Pentaho's Data Integration and analytics platform ends the 'gridlock' associated with machine learning by enabling smooth team collaboration, maximizing limited data science resources and putting predictive models to work on big data faster - regardless of use case, industry, or language--whether models were built in R, Python, Scala or Weka.
Streamlining four areas of the machine learning workflow
With Pentaho's machine learning orchestration, the process of building and deploying advanced analytics models maximizes efficiency. Most enterprises struggle to put predictive models to work because data professionals often operate in silos and the workflow - from data preparation to updating models - create bottlenecks. Pentaho's platform enables collaboration and removes bottlenecks in four key areas:
- Data and feature engineering - Pentaho helps data scientists and engineers easily prepare and blend traditional sources like ERP, EAM and big data sources like sensors and social media. Pentaho also accelerates the notoriously difficult and costly task of feature engineering by automating data onboarding, data transformation and data validation in an easy-to-use drag and drop environment.
- Model training, tuning and testing - Data scientists often apply trial and error to strike the right balance of complexity, performance and accuracy in their models. With integrations for languages like R and Python, and for machine learning packages like Spark MLlib and Weka, Pentaho allows data scientists to seamlessly train, tune, build and test models faster.
- Model deployment and operationalization – a completely trained, tuned and tested machine learning model still needs to be deployed. Pentaho allows data professionals to easily embed models developed by the data scientist directly in a data workflow. They can leverage existing data and feature engineering efforts, significantly reducing time-to-deployment. With embeddable APIs, organizations can also include the full power of Pentaho within existing applications.
- Update models regularly - According to Ventana Research, less than a third (31%) of organizations use an automated process to update their models. With Pentaho, data engineers and scientists can re-train existing models with new data sets or make feature updates using custom execution steps for R, Python, Spark MLlib and Weka. Pre-built workflows can automatically update models and archive existing ones.
Hitachi Rail uses Pentaho with Hitachi's Hyper Scale-Out Platform to fulfil its pioneering "Trains-as-a-Service" concept, applying advanced IoT technology in three event horizons: real-time (monitoring, fault alerting), medium-term (predictive maintenance) and long-term (big data trend analysis). With each train carrying thousands of sensors generating huge amounts of data per day, the project's data engineers and scientists face many challenges associated with big data and machine learning. Although the project is not yet operational, Pentaho is already helping to deliver productivity improvements across the business.
According to Philip Hewlett, Project Manager, "Hitachi Rail conservatively estimate that Pentaho's orchestration capabilities for data preparation, engineering and machine learning have already delivered wide-reaching productivity improvements and specialized development ambitions which will translate into value-added services for our customers - and this is at a very early stage in the project."
David Menninger, SVP & Research Director, Ventana Research, commented, "According to our research, 92 percent of organizations plan to deploy more predictive analytics, however, 50 percent of organizations have difficulty integrating predictive analytics into their information architecture. Pentaho offers a robust platform to help companies take advantage of machine learning algorithms throughout their organization, helping business units and IT to work together with the common goal of making predictive analytics deliver value to the enterprise."
Wael Elrifai, Director of Worldwide Enterprise Data Science, Pentaho said, "In 2017 we're working with early adopters looking to transform their businesses with machine learning. Fortunately our early foray into big data analytics gave us the insight to solve some of the toughest challenges in this area. As part of Hitachi, which has a large team of data science experts, we will continue growing our machine learning capabilities as this market matures."
The machine learning orchestration capabilities are available in Pentaho 7.0.
Resources
- Learn more about Pentaho's machine learning orchestration capabilities
- Visit us at booth #1321 at Strata+HadoopWorld San Jose the week of March 13th
- Join Pentaho for our Strata session "Five steps to a killer data lake, from ingest to machine learning" on March 15 at 1:50 pm.
About Pentaho, a Hitachi Group company
Pentaho, a Hitachi Group company, is a leading data integration and business analytics company with an enterprise-class, open source-based platform for diverse big data deployments. Pentaho's unified data integration and analytics platform is comprehensive, completely embeddable and delivers governed data to power any analytics in any environment. Pentaho's mission is to help organizations across multiple industries harness the value from all their data, including big data and IoT, enabling them to find new revenue streams, operate more efficiently, deliver outstanding service and minimize risk. Pentaho has over 15,000 product deployments and 1,500 commercial customers today including ABN-AMRO Clearing, BT, EMC, NASDAQ and Sears Holdings Corporation. For more information visit www.pentaho.com.