January 06, 2023 by Ulf Hermann | Comments
2023年1月6日 Ulf Hermann |评论
As you may know, you can compile your QML code to C++ these days. There are multiple reasons why you would do this. One of them is that it leads you to better structured code by forcing you to declare the types you're using. The most important one is that the resulting program will run faster.
正如您可能知道的,现在您可以将QML代码编译为C++。你这么做的原因有很多。其中之一是,它通过强制您声明正在使用的类型来引导您获得更好的结构化代码。最重要的一点是,生成的程序将运行得更快。
In my previous posts I've been rather cautious about the actual performance numbers. This is for a reason. The Qt Quick Compiler cannot translate any old JavaScript you throw at it, and depending on the exact characteristics of your code, the resulting speedup varies greatly. We're constantly working on increasing the Qt Quick Compiler's coverage of the QML language, but it's still a long way to go.
在我之前的帖子中,我对实际绩效数据相当谨慎。这是有原因的。Qt Quick编译器无法翻译任何旧的JavaScript,并且根据代码的确切特性,所产生的速度差异很大。我们一直在努力提高Qt Quick编译器对QML语言的覆盖率,但这还有很长的路要走。
However, today I'll go out on a limb and show you a piece of code that gets 4 times faster by compiling it to C++. Consider the following little QML program:
然而,今天我将尝试向您展示一段代码,通过将其编译为C++,它的速度将提高4倍。考虑以下小型QML程序:
import QtQml
QtObject {
id: root
enum Parameters {
Length = 1024,
Iterations = 32768,
Category0 = 0xf0f,
Category1 = 0xf0f0,
Category2 = 0xf0f0f,
Maximum = 0xf0f0f0,
Mask = 0xabcdef
}
functionrandomNumber() : int {
return (Math.random() * Categorizer.Maximum);
}
property varnumbers: {
var result = [];
for (var i = 0; i < Categorizer.Length; ++i)
result[i] = randomNumber();
return result;
}
functionsum() : list<double> {
var numbers = root.numbers;
var cat1Sum = 0;
var cat2Sum = 0;
var cat3Sum = 0;
var huge = 0;
for (var i = 0; i < Categorizer.Iterations; ++i) {
for (var j = 0; j < Categorizer.Length; ++j) {
var num = numbers[j] & Categorizer.Mask;
if (num < Categorizer.Category0)
cat1Sum += num;
elseif (num < Categorizer.Category1)
cat2Sum += num;
elseif (num < Categorizer.Category2)
cat3Sum += num;
else
huge += num;
}
}
return [cat1Sum, cat2Sum, cat3Sum, huge];
}
Component.onCompleted: {
console.log("start")
var result = sum();
console.log("done");
console.log("< " + Categorizer.Category0 + ":", result[0])
console.log("< " + Categorizer.Category1 + ":", result[1])
console.log("< " + Categorizer.Category2 + ":", result[2])
console.log("huge:", result[3]);
}
}
It generates some random numbers and iterates them repeatedly, masking them with a bitmask, and adding them up into 4 sums, depending on their size. While you should not write your business logic in JavaScript, some helper function like this might be employed to position visual elements relative to each other and it might even be a bottleneck. The outer iteration is just there so that we can talk about seconds rather milliseconds.
它生成一些随机数并重复迭代,用位掩码屏蔽它们,并根据它们的大小将它们相加成4个和。虽然您不应该用JavaScript编写业务逻辑,但可能会使用类似这样的帮助器函数来定位彼此相关的视觉元素,这甚至可能是一个瓶颈。外部迭代就在那里,所以我们可以谈论秒而不是毫秒。
Save this program as a file called "Categorizer.qml" and run it with the "qml" utility, measuring the time between the "start" and "done" output. You can use the underappreciated QT_MESSAGE_PATTERN environment variable to have it print the timestamps for each message. For example:
将此程序保存为名为“Classifier.qml”的文件,并使用“qml”实用程序运行它,测量“开始”和“完成”输出之间的时间。您可以使用未得到充分评估的QT_MESSAGE_PATTERN环境变量,让它打印每条消息的时间戳。例如:
QT_MESSAGE_PATTERN="%{time process}: %{message}" /path/to/qml Categorizer.qml
On my machine I get something like this:
在我的机器上,我得到了这样的东西:
0.000:start
1.842:done
1.842:<3855:808091648
1.842:<61680:29789028352
1.842:<986895:3433703112704
1.842: huge:170690851176448
You need at least Qt 6.2 to run this example. The result will likely be very similar for any version of Qt that can run it, even the latest 6.5 snapshots. Using the "qml" utility you disable most benefits of Qt Quick Compiler. The code is not compiled ahead of time, no C++ code is generated, and no byte code is compiled into your (non-existing) application. If you run it more than once, on subsequent invocations it will still be able to use cache files to avoid re-compiling the document to byte code, but the compilation is not the dominant factor here.
运行此示例至少需要Qt 6.2。对于任何可以运行它的Qt版本,即使是最新的6.5快照,其结果也可能非常相似。使用“qml”实用程序可以禁用Qt Quick编译器的大部分优点。代码不会提前编译,不会生成C++代码,也不会将字节代码编译到(不存在的)应用程序中。如果您多次运行它,在随后的调用中,它仍然可以使用缓存文件来避免将文档重新编译为字节码,但编译并不是主要因素。
Those numbers are rather underwhelming when you think about what the program actually does. Now let's see how it behaves when you compile it to C++. Create an example project with Qt Creator, using the Qt 6.2+ template, and add the Categorizer.qml file to it. Then have your main.cpp load Categorizer.qml rather than main.qml. Make sure to compile your application in release mode. This will make sure the QML file is compiled to C++. Now run the result. Tada: the numbers are the same, even with the latest 6.5 snapshot.
当你思考这个项目的实际作用时,这些数字相当令人失望。现在让我们看看当您将其编译为C++时它的行为。使用Qt 6.2+模板,使用Qt Creator创建一个示例项目,并将Classifier.qml文件添加到其中。然后让main.cpp加载Classifier.qml而不是main.qml。确保以发布模式编译应用程序。这将确保QML文件被编译为C++。现在运行结果。塔达:数字是一样的,即使是最新的6.5快照。
"OK, what's the point?" you may ask. Well, lets try something else. Add a type to the property declaration for "numbers":
“好吧,有什么意义?”你可能会问。好吧,让我们试试别的。将类型添加到“数字”的属性声明中:
property list<double> numbers: { ... }
You may notice that you now need Qt 6.4 to run this. Earlier versions of Qt will consider it a syntax error. The resulting performance, when running it with 6.4, or with 6.5 without compiling to C++ is actually worse than before. I see you are getting really impatient with me, but please, let's try just one last thing: Use the new code with your example project with Qt 6.5 beta1. I get:
您可能会注意到,现在需要Qt 6.4来运行该程序。Qt的早期版本会认为它是语法错误。当使用6.4或6.5运行而不编译到C++时,所产生的性能实际上比以前更差。我看到你对我很不耐烦,但请让我们尝试最后一件事:在Qt 6.5 beta1的示例项目中使用新代码。我得到:
0.000:start
0.392:done
0.392:<3855:607322112
0.392:<61680:27637481472
0.392:<986895:3592250556416
0.392: huge:181336245927936
You can call me Watman now. And you can leave it at that if you like. I won't sing the NaNNaNNaN tune for you in that case. If you want to make really sure that it's the compilation to C++ that causes the speedup you can define the QML_DISABLE_DISK_CACHE environment variable:
现在可以叫我沃特曼了。如果你愿意,可以把它留在那里。在这种情况下,我不会为你唱NaNNaNaNAN曲调。如果您确实想确保是编译到C++导致加速,那么可以定义QML_DISABLE_DISK_CACHE环境变量:
QML_DISABLE_DISK_CACHE=1 ./my_example
This prevents the QML engine from using any byte code or native code generated for QML documents at compile time. It will then re-compile the source code at run time and interpret or JIT it.
这将防止QML引擎在编译时使用为QML文档生成的任何字节码或本机代码。然后,它将在运行时重新编译源代码,并对其进行解释或JIT。
But let me sing some NaNNaNNaN now.
但现在让我唱几句NaNNaNANN。
If we look at this little "sum()" function through a pure JavaScript lens, we see that it retrieves a thing called "numbers" from another thing called "root". Now this "numbers" could be an array of numbers. Or it could be a URL object, a String, a dictionary keyed by combinations of "幽", "霊", "文", and "字", or just about anything else.
如果我们从纯JavaScript的角度来看这个小“sum()”函数,我们会发现它从另一个名为“root”的对象中检索到一个称为“number”的对象。现在这个“数字”可以是一个数字数组。或者它可以是一个URL对象、一个字符串、一个由“幽", "霊", "文“,和”字“,或者其他任何事情。
Then we iterate over a range of numbers. We can be pretty sure that i starts out as an integer. However, we again know nothing about the boundary condition. We simply retrieve something called "Iterations" from something called "Categorizer", and both are outside the scope of this function.
然后我们对一系列数字进行迭代。我们可以很确定我一开始是一个整数。然而,我们再次对边界条件一无所知。我们只需从“分类程序”中检索一些称为“迭代”的内容,这两者都不在这个函数的范围内。
Then we retrieve something from our "numbers". This may again be just about anything, including undefined if that "j" does not exist in "numbers". The masking operation, though, is a neat little trick. The ECMAScript standard mandates that whatever you throw into an '&' operator, what you get out is an integer. Now, ECMAScript tries hard not to know about integers, but luckily it cannot completely hide them. Therefore, we could know something about the "+=" operations. Those are all numbers, after all. The worst thing that can happen there is that they overflow from integer into a real number (pun intented, and don't ask).
然后我们从我们的“numbers”中检索一些东西。这也可能是任何事情,包括如果“j”不存在于“numbers”中,则未定义。不过,掩蔽操作是一个巧妙的小技巧。ECMAScript标准规定,无论您向“&”运算符中输入什么,您得到的都是整数。现在,ECMAScript尽力不知道整数,但幸运的是它不能完全隐藏它们。因此,我们可以了解一些关于“+=”操作的信息。毕竟,这些都是数字。最糟糕的事情是它们从整数溢出到实数(双关语,不要问)。
So, for the inner part of the loop we could generate some fairly efficient code even without knowing anything else about the rest of the file. For the rest, we have to always assume that we might have to sing the Watman tune, and generate code generic enough to do just that.
因此,对于循环的内部部分,我们可以生成一些相当有效的代码,即使不知道文件的其他部分。对于其余部分,我们必须始终假设我们可能必须唱Watman的曲子,并生成足够通用的代码来实现这一点。
The JavaScript interpreter (and JIT) in QML, specifically, is not even capable of optimizing the inner part of the loop, though. It doesn't do any type inference, as it's optimized for simplicity and compilation speed. It does cache the lookups for the enumerations, which makes them faster on subsequent iterations, but even the cached lookups require some function calls.
具体来说,QML中的JavaScript解释器(和JIT)甚至无法优化循环的内部。它不做任何类型推断,因为它为简单性和编译速度进行了优化。它确实缓存了枚举的查找,这使得它们在后续迭代时更快,但即使是缓存的查找也需要一些函数调用。
It's important to note that other JIT compilers, for example the one used in V8, can do interesting heuristical optimizations on such code. By observing the occurrence of types at run time, specialized code can be generated for types that occur frequently, with deoptimization steps for unexpected type mismatches. This comes at a cost of memory and code complexity, though.
需要注意的是,其他JIT编译器(例如V8中使用的编译器)可以对此类代码进行有趣的启发式优化。通过在运行时观察类型的出现,可以为频繁出现的类型生成专门的代码,并针对意外的类型不匹配执行去优化步骤。然而,这是以内存和代码复杂性为代价的。
So much about JavaScript.
JavaScript就这么多了。
If we look at the whole file as a QML document, we know quite a bit more. First, the previously opaque values called "Categorizer.Iterations", "Categorizer.Length", etc turn out to be enumeration values, which by definition can be expressed as integers. At compile time, we even know their values and we know they cannot change. Furthermore, we know what "root" is. "root" is an ID. Elements referenced by ID, in contrast to properties, cannot change. Here's the catch, though: as long as the "numbers" property is just a "var", we still don't know anything about it. The Qt Quick Compiler refuses to generate code that sings the Watman tune. Therefore, it rejects the whole "sum()" function in this case and we fall back to interpretation or JIT compiling.
如果我们将整个文件视为一个QML文档,我们会知道更多。首先,以前称为“分类程序迭代”、“分类程序长度”等的不透明值变成了枚举值,根据定义,这些值可以表示为整数。在编译时,我们甚至知道它们的值,我们知道它们不会改变。此外,我们知道什么是“root”。“root”是一个ID。ID引用的元素与属性不同,不能更改。然而,这里有一个问题:只要“numbers”属性只是一个“var”,我们仍然对此一无所知。Qt Quick编译器拒绝生成唱Watman曲调的代码。因此,在这种情况下,它拒绝了整个“sum()”函数,我们回到解释或JIT编译。
If, however, the type for the "numbers" property is given, the Qt Quick Compiler as shipped with the latest 6.5 snapshot generated efficient C++ code for the "sum()" function. It knows that the result of an indexed lookup in "numbers" can only ever produce a number or undefined. It might produce undefined because you may have replaced the value of the "numbers" property with a shorter list.
然而,如果给定了“numbers”属性的类型,Qt Quick Compiler(最新的6.5快照附带)将为“sum()”函数生成高效的C++代码。它知道索引查找“numbers”的结果只能产生一个数字或未定义的数字。它可能会产生undefined,因为您可能已经用更短的列表替换了“numbers”属性的值。
With that information alone the Qt Quick Compiler could generate code for the comparisons and the "+=" operations, using QJSPrimitiveValue. The result would be somewhat slower, but still faster than interpretation. Using the "&" operator, we further constrain the type, so that the Qt Quick Compiler can generate straight C++ arithmetics using int and double. It also just pastes the numeric values of the enum entries into the generated code, saving us the lookups we need to do in JavaScript.
仅凭这些信息,Qt Quick编译器就可以使用QJSPrimitiveValue生成用于比较和“+=”操作的代码。结果会稍慢,但仍比解释快。使用“&”运算符,我们进一步约束类型,以便Qt Quick编译器可以使用int和double生成直接的C++算法。它还将枚举项的数值粘贴到生成的代码中,为我们节省了在JavaScript中需要执行的查找。
The generated C++ code looks like this:
生成的C++代码如下所示:
// var num = numbers[j] & Categorizer.Mask;// generate_MoveReg
r17_1 = r14_1;
// generate_LoadReg
r2_3 = r12_1;
// generate_LoadElementif (!QJSNumberCoercion::isInteger(r2_3))
r2_5 = QJSPrimitiveValue();
elseif (r2_3 >= 0 && r2_3 < r17_1.size())
r2_5 = QJSPrimitiveValue(r17_1.at(r2_3));
else
r2_5 = QJSPrimitiveValue();
// generate_StoreReg
r18_1 = r2_5;
// generate_GetLookup
r2_6 = 11259375;
// generate_BitAnd
r2_3 = double((r18_1.toInteger() & r2_6));
// generate_StoreReg
r13_1 = r2_3;
// if (num < Categorizer.Category0)// generate_StoreReg
r17_2 = r2_3;
// generate_GetLookup
{
int retrieved;
retrieved = 3855;
r2_3 = double(retrieved);
}
// generate_CmpLt
r2_4 = r17_2 < r2_3;
// generate_JumpFalseif (!r2_4) {
goto label_4;
}
;
// cat1Sum += num;// generate_MoveReg
r18_2 = r7_1;
// generate_LoadReg
r2_3 = r13_1;
// generate_Add
r2_3 = (r18_2 + r2_3);
// generate_StoreReg
r7_1 = r2_3;
// generate_Jump
{
goto label_5;
}
[...]
There are a lot of unnecessary renames in there, but a C++ compiler worth its salt should be able to eliminate them. So, this is almost as fast as we can get with the inner loop without violating the ECMAScript standard. We might still perform a range analysis on the loop counters to find out that they can never be anything but integers, but so far we don't.
这里有很多不必要的重命名,但值得一试的C++编译器应该能够消除它们。因此,这几乎是我们在不违反ECMAScript标准的情况下使用内部循环所能达到的速度。我们可能仍然会对循环计数器执行范围分析,以发现它们只能是整数,但到目前为止我们还没有。
Clearly, the above example is carefully tailored for maximum W{at|ow} effect. However, this is where QML is going. As the Qt Quick Compiler's language support improves such examples will become more common.
显然,上面的例子是为最大W{at | ow}效应而精心设计的。然而,这就是QML的目标。随着Qt Quick Compiler语言支持的提高,此类示例将变得更加常见。