Have you ever been asked to develop a Machine Learning model on a huge database? Typically, the customer will provide you the database and ask you to make certain predictions such as who will be the potential buyers; if there can be an early detection of fraudulent cases, etc. To answer these questions, your task would be to develop a Machine Learning algorithm that would provide an answer to the customer’s query. Developing a Machine Learning algorithm from scratch is not an easy task and why should you do this when there are several ready-to-use Machine Learning libraries available in the market.
您是否曾被要求在庞大的数据库上开发机器学习模型? 通常,客户将向您提供数据库,并要求您做出某些预测,例如谁将成为潜在买家; 如果可以及早发现欺诈案件等。要回答这些问题,您的任务是开发一种机器学习算法,为客户的查询提供答案。 从头开始开发机器学习算法并不是一件容易的事,为什么在市场上有几个现成的机器学习库可用时,为什么要这样做呢?
These days, you would rather use these libraries, apply a well-tested algorithm from these libraries and look at its performance. If the performance were not within acceptable limits, you would try to either fine-tune the current algorithm or try an altogether different one.
如今,您宁愿使用这些库,从这些库中应用经过测试的算法,并查看其性能。 如果性能不在可接受的范围内,则可以尝试微调当前算法或尝试完全不同的算法。
Likewise, you may try multiple algorithms on the same dataset and then pick up the best one that satisfactorily meets the customer’s requirements. This is where H2O comes to your rescue. It is an open source Machine Learning framework with full-tested implementations of several widely-accepted ML algorithms. You just have to pick up the algorithm from its huge repository and apply it to your dataset. It contains the most widely used statistical and ML algorithms.
同样,您可以在同一个数据集上尝试多种算法,然后选择满意地满足客户要求的最佳算法。 这就是H2O拯救您的地方。 它是一个开放源代码的机器学习框架,其中包含对几种广为接受的ML算法进行全面测试的实现。 您只需要从庞大的存储库中提取算法并将其应用于数据集即可。 它包含使用最广泛的统计和ML算法。
To mention a few here it includes gradient boosted machines (GBM), generalized linear model (GLM), deep learning and many more. Not only that it also supports AutoML functionality that will rank the performance of different algorithms on your dataset, thus reducing your efforts of finding the best performing model. H2O is used worldwide by more than 18000 organizations and interfaces well with R and Python for your ease of development. It is an in-memory platform that provides superb performance.
这里仅举几例,其中包括梯度提升机(GBM),广义线性模型(GLM),深度学习等等。 它不仅还支持AutoML功能,该功能将对数据集上不同算法的性能进行排名,从而减少了寻找最佳性能模型的工作。 H2O在全球范围内有18000多家组织使用,并且可以轻松地与R和Python进行接口。 它是一个提供出色性能的内存平台。
In this tutorial, you will first learn to install the H2O on your machine with both Python and R options. We will understand how to use this in the command line so that you understand its working line-wise. If you are a Python lover, you may use Jupyter or any other IDE of your choice for developing H2O applications. If you prefer R, you may use RStudio for development.
在本教程中,您将首先学习同时使用Python和R选项在计算机上安装H2O。 我们将了解如何在命令行中使用它,以便您逐行理解它的工作方式。 如果您是Python爱好者,则可以使用Jupyter或您选择的任何其他IDE来开发H2O应用程序。 如果您更喜欢R,则可以使用RStudio进行开发。
In this tutorial, we will consider an example to understand how to go about working with H2O. We will also learn how to change the algorithm in your program code and compare its performance with the earlier one. The H2O also provides a web-based tool to test the different algorithms on your dataset. This is called Flow.
在本教程中,我们将考虑一个示例,以了解如何使用H2O。 我们还将学习如何在程序代码中更改算法,并将其性能与早期算法进行比较。 H2O还提供了基于Web的工具来测试数据集上的不同算法。 这称为流。
The tutorial will introduce you to the use of Flow. Alongside, we will discuss the use of AutoML that will identify the best performing algorithm on your dataset. Are you not excited to learn H2O? Keep reading!
本教程将向您介绍Flow的用法。 同时,我们将讨论AutoML的使用,该方法将识别数据集上性能最佳的算法。 您对学习H2O感到不兴奋吗? 继续阅读!
翻译自: https://www.tutorialspoint.com/h2o/h2o_introduction.htm