aws lambda使用
by Daitan
通过大潭
Moving machine learning (ML) models from training to serving in production at scale is an open problem. Many software vendors and cloud providers are currently trying to properly address this issue.
将机器学习(ML)模型从培训转移到大规模生产服务是一个开放的问题。 许多软件供应商和云提供商当前正在尝试正确解决此问题。
One of the biggest challenges is that serving a model (i.e. accepting requests and returning a prediction) is only part of the problem. There is a long list of adjacent requirements. These include, for example:
最大的挑战之一是服务模型(即接受请求并返回预测)只是问题的一部分。 相邻需求列表很长。 这些包括,例如:
In order to better advise our teams in that respect, we have set up a small, but smart and dedicated, research group here at Daitan Group.
为了在这方面更好地为我们的团队提供建议,我们在Daitan Group设立了一个小型但精明而敬业的研究小组 。
To begin, we have established a roadmap for learning the requirements and caveats of deploying machine learning models into multiple ML pipelines and infrastructures.
首先,我们已经建立了一个路线图,用于学习将机器学习模型部署到多个ML管道和基础架构中的要求和注意事项。
The purpose of this article is to provide an overview of our methodology and the results we achieved from our baseline implementation.
本文的目的是概述我们的方法以及从基线实施中获得的结果。
Amazon SageMaker, Google Cloud ML, Seldon Core, and others promise a smooth, end-to-end pipeline from training to serving at scale. However, before working with such solutions we wanted to get our hands dirty with a manual process to create a baseline reference.
Amazon SageMaker , Google Cloud ML , Seldon Core以及其他产品保证了从培训到大规模服务的流畅,端到端的渠道。 但是,在使用此类解决方案之前,我们希望通过手动过程来创建基准参考,以免手忙脚乱。
The goal was to train, export, and serve at scale an ML model in the cloud with the least possible effort.
目标是以最小的努力在云中训练,导出和大规模服务ML模型。
To start, we chose TensorFlow as our ML framework and AWS Lambda as the deploying infrastructure. We used Apache JMeter and Taurus to generate load tests.
首先,我们选择TensorFlow作为我们的ML框架,选择AWS Lambda作为部署基础架构。 我们使用Apache JMeter和Taurus生成负载测试。
Our baseline was predicated on experimenting with the following combinations:
我们的基线是根据以下组合进行实验得出的:
With some combinations, we reached ~40 predictions per second with an average response time of ~200ms. Such results would address many production use cases. And we could effortlessly scale up, if needed.
通过某些组合,我们达到了每秒〜40个预测 , 平均响应时间为〜200 ms 。 这样的结果将解决许多生产用例。 如果需要,我们可以毫不费力地扩大规模。
However, there were caveats, which we discuss below when we detail the test results.
但是,有一些警告,我们在下面详细介绍测试结果时会进行讨论。
Furthermore, a second experiment with a “heavier” model (image segmentation model) was completed, which will be detailed in a follow-up post.
此外,第二个使用“较重”模型(图像分割模型)的实验已完成,将在后续帖子中详细介绍。
TensorFlow is an open-source library created by Google for programming data flows across a range of tasks. We started with TF for a number of reasons:
TensorFlow是Google创建的一个开放源代码库,用于对一系列任务中的数据流进行编程。 我们从TF开始的原因很多:
It is currently the most popular machine learning framework on GitHub with around 110k stars and 1.6k contributors
它是GitHub上目前最受欢迎的机器学习框架,拥有约11万颗星和1.6万贡献者
These factors, combined, make TF a natural candidate for our clients to use when building predictive models.
这些因素加在一起,使TF成为我们的客户在建立预测模型时使用的自然选择。
AWS Lambda is Amazon’s implementation of the Function-as-a-Service (FaaS) or Serverless architecture. We have noticed a significant increase in use over the last few years.
AWS Lambda是Amazon的功能即服务(FaaS)或无服务器架构的实现 。 我们注意到在过去几年中使用量大幅度增加。
Some of its most avid users include companies like Netflix and Coca-Cola. Recent studies (below) show that this dominance is likely to continue for the coming years.
它最热情的用户包括Netflix和可口可乐等公司。 最近的研究(如下)表明,这一主导地位可能会在未来几年内持续下去。
Most of the popularity comes from the versatility and flexibility of FaaS to deploy applications. Also, they reduce operational costs and are easy to use because most of the complexity is hidden from the end-user.
大多数流行来自FaaS部署应用程序的多功能性和灵活性。 而且,它们降低了运营成本并且易于使用,因为大多数复杂性对最终用户都是隐藏的。
Another important point in favor of AWS Lambda is its pay-per-execution pricing. For example, in a single setup, all our tests described below only cost approximately one dollar. As a result, it can be suitable for a wide range of cost-effective, event-driven applications.
支持AWS Lambda的另一个重要点是按执行次数付费。 例如,在一个单一的设置中,我们下面描述的所有测试仅花费大约一美元。 因此,它可以适用于各种具有成本效益的,事件驱动的应用程序。
Yet, when using AWS Lambda to serve ML models in production, you need to take care of all the steps in the total pipeline. These steps typically include:
但是,当使用AWS Lambda在生产中提供ML模型时,您需要注意整个管道中的所有步骤。 这些步骤通常包括:
If not properly automated, these tasks may contribute to higher-than-planned costs and a slower pipeline.
如果自动化程度不高,这些任务可能会导致超出计划的成本和较慢的流程。
In a series of upcoming articles, we will explore how other ML pipelines facilitate the steps above, comparing with using AWS Lambda only.
与仅使用AWS Lambda相比,在一系列即将发表的文章中,我们将探索其他ML管道如何促进上述步骤。
Overall, we wanted to investigate the performance of various platforms for serving ML models at scale.
总体而言,我们想研究用于大规模服务ML模型的各种平台的性能。
In this direction, our first experiments focused on deploying a simple machine learning model. This choice provided major benefits, especially eliminating the time and concerns over the model training and performance.
在这个方向上,我们的第一个实验着重于部署简单的机器学习模型。 这种选择提供了主要好处,尤其是消除了时间和对模型训练和性能的担忧。
To train our model, we chose the KDD99 dataset. The dataset has a total of 567,498 records, with 2211 of those considered anomalies.
为了训练我们的模型,我们选择了KDD99数据集 。 该数据集共有567,498条记录,其中2211条被视为异常。
After checking for missing values, we proceeded by splitting the current corpus into training and testing sets. We then normalized the training data using the Mean Standardization technique.
在检查缺失值之后,我们将当前语料库分为训练集和测试集。 然后,我们使用均值标准化技术对训练数据进行标准化。
Next, we trained a binary Logistic Regression model on the given problem. Finally, we assessed the model accuracy by calculating the confusion matrix:
接下来,我们针对给定问题训练了二进制Logistic回归模型。 最后,我们通过计算混淆矩阵来评估模型的准确性:
It is important to note that our focus lies in the infrastructure side for serving ML models. Thus, the best practices for training ML algorithms is out of our scope.
重要的是要注意,我们的重点在于服务ML模型的基础架构方面。 因此,训练ML算法的最佳实践不在我们的范围之内。
To get things done more quickly, we used some popular AWS services. They included the API Gateway, Lambda functions, S3, Cloudwatch, and IAM.
为了更快地完成工作,我们使用了一些流行的AWS服务。 它们包括API网关 ,Lambda函数, S3 , Cloudwatch和IAM 。
The following describes the role for each:
下面介绍了每个角色:
We designed the tests to answer questions regarding the best cost-benefit when using AWS Lambda services. In this context, our tests focused on three main components:
我们设计了测试,以回答有关使用AWS Lambda服务时最佳成本效益的问题。 在这种情况下,我们的测试集中在三个主要方面:
We used Java and Python. To make the tests more comparable, we designed both implementations to be as similar as possible.
我们使用了Java和Python。 为了使测试更具可比性,我们将两种实现设计为尽可能相似。
In summary, the Python and Java code that goes inside the Lambda executes the same operations. First, it unzips the file containing the metadata about the model. Second, it gets the input parameters ready. And finally, it performs model inference.
总之,Lambda内的Python和Java代码执行相同的操作。 首先,它将包含有关模型的元数据的文件解压缩。 其次,它准备好输入参数。 最后,它执行模型推断。
One nice feature of AWS Lambda is that we can configure it to deploy our model with different amounts of memory. In other words, we can choose how much memory we allow the Lambda function to use. Moreover, when we increase this memory, CPU settings are also upgraded — meaning that we switch to a more powerful machine.
AWS Lambda的一个不错的功能是我们可以对其进行配置,以部署具有不同内存量的模型。 换句话说,我们可以选择允许Lambda函数使用多少内存。 此外,当我们增加内存时,CPU设置也会升级-这意味着我们将切换到功能更强大的计算机。
We chose three different configurations of memory in our load tests. For Python and Java, we performed the tests using the memory sizes of 256, 512 and 1024 Mb.
在负载测试中,我们选择了三种不同的内存配置。 对于Python和Java,我们使用256、512和1024 Mb的内存大小进行了测试。
We ran each scenario using the following parameters:
我们使用以下参数运行每个方案:
In this scenario, we send a single request and wait for M minutes to send the next one. At the beginning of the test (minute 0), the load test framework sends a single request to the Lambda. Next, it waits for 6 minutes and sends a new request. Continuing, it waits 10 minutes more to issue the next request, and so on. In total, the test takes roughly one hour and requests are performed at minutes 0, 6, 16, 31, and 61.
在这种情况下,我们发送一个请求,并等待M分钟以发送下一个请求。 在测试开始时(第0分钟),负载测试框架向Lambda发送一个请求。 接下来,它等待6分钟并发送一个新请求。 继续,它还要等待10分钟才能发出下一个请求,依此类推。 总共,测试大约需要一个小时,并且请求将在0、6、16、31和61分钟执行。
The purpose is to assess how much time a Lambda function instance will be kept alive before going down. As we know, each time AWS launches a new Lambda, it needs time to setup and install dependencies (cold-start). So, we want to evaluate how often this situation occurs for a single instance.
目的是评估Lambda函数实例在关闭之前将保持活动状态的时间。 众所周知,每次AWS启动新的Lambda时,都需要时间来设置和安装依赖项(冷启动)。 因此,我们想评估一次单个实例发生这种情况的频率。
In short, a cold-start may occur in two situations.
简而言之,在两种情况下可能会发生冷启动。
In the second case, after an idle timeout, the Lambda goes down. As a result, subsequent requests need new cold-starts, which accounts for an increased setup latency.
在第二种情况下,空闲超时后,Lambda会关闭。 结果,后续请求需要新的冷启动,这会增加设置等待时间。
In this scenario, we played with different users performing concurrent requests. To begin, we set up the concurrency parameter to C = 9. It means that there will be 9 different users making requests for a T period of time. This T period is the sum of an R period (ramp-up in minutes) and an H period (hold-for in minutes).
在这种情况下,我们与执行并发请求的不同用户一起玩。 首先,我们将并发参数设置为C =9。这意味着在T时段内将有9个不同的用户发出请求。 该T时段是R时段(以分钟为单位)和H时段(以分钟为单位)的总和。
The ramp-up period means that a new user will start making requests after a period of R/C time. At the end of this ramp-up time, all 9 users have joined the system and started making requests.
加速期意味着新用户将在一段R / C时间后开始发出请求。 在加速时间结束时,所有9个用户都已加入系统并开始发出请求。
The hold-for period means that the 9 users (making parallel requests) will be kept for H minutes. We setup R equal to 3 and H to 1 at the first stage.
保留期意味着将9个用户(并行请求)保留H分钟。 我们在第一阶段将R设置为等于3,将H设置为1。
After that, C was set to 18, meaning that 9 new users will “join the system” and start making requests, R is then set to 0 and H to 1 minute.
之后,C设置为18,这意味着9个新用户将“加入系统”并开始发出请求,然后R设置为0,H设置为1分钟。
We aimed to answer two questions with this test. We wanted to see the response time of new users making requests and its impact when new Lambda instances need to be instantiated (cold-starts).
我们旨在通过此测试回答两个问题。 我们希望看到新用户发出请求的响应时间及其对需要实例化新Lambda实例(冷启动)的影响。
The first graph shows the results of the first test scenario. Here, given a period of time, we perform requests to an AWS Lambda instance in different rates. Only one user — one parallel request is made throughout the test.
第一张图显示了第一个测试方案的结果。 在给定的时间段内,我们以不同的速率执行对AWS Lambda实例的请求。 在整个测试过程中,只有一个用户-一个并行请求。
As expected, the first user/request presented a much longer response time. Almost 8 seconds for the cold-start. Passed 5 minutes from the first request, we can see that AWS used the same Lambda instance, which caused a very small response time (only 1 second).
不出所料,第一个用户/请求提出了更长的响应时间。 冷启动将近8秒钟。 从第一个请求经过5分钟,我们可以看到AWS使用了相同的Lambda实例,这导致了非常小的响应时间(仅1秒)。
Interestingly, even with longer intervals between requests, AWS managed to use the same instance. Specifically, requests after 10 and 14 minutes after the previous one still used the same Lambda. However, for a time interval of 30 minutes, AWS had to spin a new Lambda — which resulted in a longer response time due to cold-start (nearly 6 seconds). In practice, AWS had to shut down the Lambda somewhere in the interval of 14 to 30 minutes.
有趣的是,即使请求之间的间隔较长,AWS也设法使用了同一实例。 具体来说,前一个请求后10和14分钟之后的请求仍使用相同的Lambda。 但是,在30分钟的时间间隔内,AWS必须旋转一个新的Lambda-由于冷启动,响应时间更长(将近6秒)。 实际上,AWS必须在14到30分钟的间隔内关闭Lambda。
Again, our goal was to assess how long AWS maintains a Lambda instance alive. In practice, AWS does not guarantee it. In fact, AWS algorithms might decide this time based on many other circumstances such as network load and so on. However, this test helped us further understand how AWS Lambda internal logic works.
同样,我们的目标是评估AWS保持Lambda实例存活的时间。 实际上,AWS不能保证。 实际上,AWS算法可能会根据许多其他情况(例如网络负载等)来决定这次的时间。 但是,此测试有助于我们进一步了解AWS Lambda内部逻辑如何工作。
For the second test scenarios, we have the following settings.
对于第二种测试方案,我们具有以下设置。
For all the experiments, we display 4 types of information.
对于所有实验,我们显示4种类型的信息。
The set of graphs below show the performance test comparisons between the two programming languages: Java and Python.
下面的图形集显示了两种编程语言(Java和Python)之间的性能测试比较。
We increase the total number of users from 3 to 18 during a total testing time of 5 minutes (x-axis). The rate of users increasing is 3, 6, 9 and 18. Also, note that each user performs parallel requests to the server. So, when the number of users is 18, there can be up to 18 parallel requests being issued.
在5分钟(x轴)的总测试时间内,我们将用户总数从3增加到18。 用户的增长速度为3、6、9和18。此外,请注意,每个用户都向服务器执行并行请求。 因此,当用户数为18时,最多可以发出18个并行请求。
For further understanding, the y-axis to the left shows the results in milliseconds. Use this axis to evaluate the max and mean latency. In contrast, the y-axis to the right displays information about the number of hits and parallel users.
为了进一步理解,左侧的y轴以毫秒为单位显示结果。 使用此轴可以评估最大和平均延迟。 相反,右侧的y轴显示有关点击次数和并行用户的信息。
Overall, the tests using Python behaved as expected. Note that the number of parallel requests (hits) grows as the number of parallel users increases. One very interesting point is the effect of the cold-start in terms of the number of users.
总体而言,使用Python进行的测试表现出预期。 请注意,并行请求(命中)的数量随着并行用户数量的增加而增加。 一个非常有趣的观点是冷启动对用户数量的影响。
First, the red line (max latency) shows the first spike at the beginning of every experiment. In practice, at the beginning of the experiment, AWS instantiates 3 Lambda instances. One for each user request. As we discussed, each request (from a new user), requires this setup. That is the cold-start.
首先,红线(最大延迟)显示了每个实验开始时的第一个峰值。 实际上,在实验开始时,AWS实例化了3个Lambda实例。 每个用户请求一个。 正如我们所讨论的,每个请求(来自新用户)都需要此设置。 那是冷门。
Yet, we can see some other spikes in latency during the test execution. These spikes seemed to be related to the increase in the number of users (parallel requests). When we increase the number of users, AWS (again) needs to set up new Lambda functions for taking care of the new incoming requests. And as expected, subsequent requests, from the same user, are executed by the same Lambda instances. As a result, it avoids a new cold-start for each new request.
但是,我们可以看到在测试执行过程中延迟的其他一些峰值。 这些高峰似乎与用户数量的增加(并行请求)有关。 当我们增加用户数量时,AWS(再次)需要设置新的Lambda函数来处理新的传入请求。 正如预期的那样,来自同一用户的后续请求将由相同的Lambda实例执行。 结果,它避免了每个新请求的新冷启动。
Another interesting point is the effect of the Lambda memory on the test scalability. With 256Mb, the test reaches a peak of 77 hits (with 18 parallel users) and mean hits of 33.4. This mark significantly increased when using 512Mb, but did not go any further with 1024Mb. Indeed, most of the statistics (avg response time, avg hits, and min response time) did not change much. Perhaps, the most significant improvement (from 512 to 1024 Mb) is the max response time, which indicates faster cold-starts. Our tests suggest that 512Mb is the best cost/benefit setup for this specific model.
另一个有趣的问题是Lambda内存对测试可伸缩性的影响。 在256Mb的情况下,测试达到77个命中的峰值(有18个并行用户),平均命中率为33.4。 使用512Mb时,此标记显着增加,但使用1024Mb时,此标记没有进一步提高。 实际上,大多数统计信息(平均响应时间,平均点击次数和最小响应时间)并没有太大变化。 也许最大的改进(从512 Mb到1024 Mb)是最大响应时间,这表明冷启动更快。 我们的测试表明,对于该特定型号,512Mb是最佳的成本/收益设置。
For Java, with the exception of the 256Mb settings, the overall performance was very similar to Python. To begin, the 256Mb test only scored a mean of 0.55 in the number of hits. Less than 1 request per second.
对于Java,除了256Mb设置外,总体性能与Python非常相似。 首先,256Mb测试的命中次数平均值仅为0.55。 每秒少于1个请求。
Actually, this test configuration was not able to scale the number of requests in terms of users. Moreover, it suffered from very high latency across all test times. Average response time stayed at of 17,049 milliseconds ~17 seconds!
实际上,此测试配置无法根据用户扩展请求的数量。 此外,它在所有测试时间内都具有很高的延迟。 平均响应时间保持在17049毫秒〜17秒!
One possible reason for this poor performance is the JVM memory footprint since other memory configurations did just fine.
造成这种性能下降的一个可能原因是JVM内存占用量大,因为其他内存配置也很好。
For 512 and 1024 Mbs, the results confirmed expected behavior. Comparing with the corresponding ones from Python, we can see very similar results.
对于512和1024 Mb,结果证实了预期的行为。 与Python中的相应结果相比,我们可以看到非常相似的结果。
One noticeable discrepancy, from Java 512 to Python 512, is the behavior of the max response time. For Java, response time was much higher, which translates to longer cold-starts.
从Java 512到Python 512的一个明显差异是最大响应时间的行为。 对于Java,响应时间要长得多,这意味着更长的冷启动时间。
Lastly, similarly to Python results, increasing the process memory size from 512 to 1024Mb did not achieve significantly better outcomes. Even though the number of hits didn’t improve very much, some cold-starts got a huge improvement. As shown in the table, the max response time for Java 512Mb was ~18s while for 1024Mb was ~10s.
最后,类似于Python的结果,将进程内存大小从512Mb增加到并没有获得明显更好的结果。 即使命中数没有很大改善,一些冷启动也有很大的改善。 如下表所示,Java 512Mb的最大响应时间为〜18s,而1024Mb的最大响应时间为〜10s。
In summary, AWS Lambda is a good choice for provisioning lightweight ML models that need to scale, with only a few caveats.
总而言之,AWS Lambda是预配置需要扩展的轻量级ML模型的一个不错的选择,但有一些警告。
Among its main advantages for the use cases we envision are:
我们设想的用例的主要优点包括:
Convenience:
方便:
2. Cost: depending on the workload, pay-per-execution can drive the infrastructure cost down.
2. 成本:根据工作量,按执行付费可以降低基础架构的成本。
Among the caveats, the language of choice of the Lambda function matters. Thus, it may have significant impacts on both performance and cost.
在警告中 ,Lambda函数的选择语言很重要。 因此,它可能对性能和成本都有重大影响。
Also, if your application relies on very small latency, AWS Lambda might not be the best choice. The cold start penalty reduces the range of applications that can benefit from it. In general, one can extract the most from Lambdas if the application has either one of these attributes.
此外,如果您的应用程序依赖于很小的延迟,则AWS Lambda可能不是最佳选择。 冷启动损失减少了可从中受益的应用范围。 通常,如果应用程序具有以下任一属性,则可以从Lambda中提取最多的内容。
In some situations, one could mitigate the cold start time with approaches like a ping to keep the Lambda functions up.
在某些情况下,可以使用诸如ping的方法减轻Lambda功能的启动时间,以减少冷启动时间。
Another consideration is that TF models (even a simple model like the one used in this test) may not be the smallest option available in some situations.
另一个考虑因素是,在某些情况下,TF模型(甚至是本测试中使用的简单模型)可能不是最小的选择。
Last but not least, any other step in the ML workflow such as versioning and retraining needs to be done by ourselves. This is expected since AWS Lambda is only a general-purpose computing environment.
最后但并非最不重要的是,ML工作流程中的任何其他步骤(例如版本控制和重新培训)都需要我们自己完成。 这是可以预期的,因为AWS Lambda只是一个通用计算环境。
Now we will try more advanced, end-to-end ML pipelines and compare the experience. The question is, will they provide us with a streamlined process from training to serving, with the flexibility and scalability we are looking for?
现在,我们将尝试更高级的端到端ML管道并比较经验。 问题是,他们是否会为我们提供从培训到服务的简化流程,以及我们所寻求的灵活性和可扩展性?
Stay tuned!
敬请关注!
Bruno Schionato, Diego Domingos, Fernando Moraes, Gustavo Rozato, Isac Souza, Marciano Nardi, Thalles Silva — Daitan Group
布鲁诺·斯基奥纳托(Bruno Schionato),迭戈·多明戈斯(Diego Domingos),费尔南多·莫拉斯(Fernando Moraes),古斯塔沃·罗扎托(Gustavo Rozato),艾萨克·索扎(Isac Souza),马尔恰诺·纳尔迪(Marciano Nardi),塔勒斯·席尔瓦( Thalles Silva)— Daitan Group
aws lambda使用