深度学习编译中间件之NNVM(十四)NNVM源代码阅读3

阎博易

2023-12-01

参考文档

参考文档1从最外层的nnvm.compiler.build函数入手逐渐深入到NNVM的实现细节。

首先可以将nnvm.compiler.build的执行过程总结为如下步骤：

校正Layout
初始化Pass(指定shape)
初始化所有变量(_all_var_init)
应用优化
预计算裁剪
融合相邻运算并生成最终so
保存变量的初始化值到params参数文件中

分析步骤6：融合相邻运算并生成最终so

进行Lowering操作，生成LoweredFunc数组
调用nnvm.compiler.build_target生成最终so，具体通过调用tvm.build来实现。

其中DoLower函数中比较重要的有两点
1. GetScheduleArgs函数用于生成Schedule参数
2. GetPackedFunc(“nnvm.compiler.lower”)重新调用了TVM的Python接口

GetScheduleArgs函数通过使用FTVMCompute和FTVMSchedule将NNVM Graph和topi的Python接口关联起来。

提示：

topi内部定义了深度学习中常用的运算，并且也定义运算的调度设置

而tvm.build具体实现则是使用了LLVM和HalideIR这两个工具将之前生成的LoweredFunc数组编译为具体硬件平台的二进制执行代码。

通过上述总结，可以初步了解NNVM中几个重要组件的调用和组织关系：

NNVM Compiler
NNVM Frontend
NNVM Graph Pass
NNVM Top (tensor operators)
TVM Codegen
TVM Pass
TVM Runtime
TOPI
HalideIR

本系列文档接下来将依次介绍上述组件，本篇文档首先介绍NNVM Top组件的相关内容。

和NNVM Top组件的相关代码位于：

src/top:主要负责深度学习常用运算符的参数设置
python/nnvm/top:主要负责和topi进行关联

Top组件的C++相关代码

使用的C++命令空间为nnvm::top

代码位于

include/nnvm/top/*.h
src/top/*.cc

这里以dense这个操作符作为示例，其他的操作符的方法基本一致：

这里先展示include/nnvm/top/nn.h的部分代码，这部分代码展示了DenseParam这个参数结构体

struct DenseParam : public dmlc::Parameter<DenseParam> {
    int units;
    bool use_bias;

    DMLC_DECLARE_PARAMETER(DenseParam) {
        DMLC_DECLARE_FIELD(units).set_lower_bound(1)
        .describe("Number of hidden units of the dense transformation.");
        DMLC_DECLARE_FIELD(use_bias).set_default(true)
        .describe("Whether to use bias parameter");
    }
    // constants
    static const constexpr int kData = 0;
    static const constexpr int kWeight = 1;
    static const constexpr int kBias = 2;
};

这里展示src/top/nn.cc的部分代码，这部分代码展示了DenseParam的参数注册过程和操作符dense的注册过程。

// 参数注册过程
DMLC_REGISTER_PARAMETER(DenseParam);

inline bool DenseInferShape(const nnvm::NodeAttrs& attrs,
                            std::vector<TShape>* in_shape,
                            std::vector<TShape>* out_shape) {
    const DenseParam& param = nnvm::get<DenseParam>(attrs.parsed);
    if (param.use_bias) {
        CHECK_EQ(in_shape->size(), 3U) << "Input:[data, weight, bias]";
    } else {
        CHECK_EQ(in_shape->size(), 2U) << "Input:[data, weight]";
    }
    CHECK_EQ(out_shape->size(), 1U);
    // reverse infer
    if ((*out_shape)[0].ndim() != 0) {
        TShape dshape = (*out_shape)[0];
        dshape[dshape.ndim() - 1] = 0;
        NNVM_ASSIGN_INPUT_SHAPE(attrs, *in_shape, DenseParam::kData, dshape);
    }
    dim_t num_inputs = 0;
    if ((*in_shape)[DenseParam::kData].ndim() != 0) {
        TShape oshape = (*in_shape)[DenseParam::kData];
        num_inputs = oshape[oshape.ndim() - 1];
        oshape[oshape.ndim() - 1] = param.units;
        NNVM_ASSIGN_OUTPUT_SHAPE(attrs, *out_shape, 0, oshape);
    }
    NNVM_ASSIGN_INPUT_SHAPE(attrs, *in_shape, DenseParam::kWeight,
                            TShape({param.units, num_inputs}));
    if (param.use_bias) {
        NNVM_ASSIGN_INPUT_SHAPE(attrs, *in_shape, DenseParam::kBias, TShape({param.units}));
    }
    return true;
}

// 操作符注册过程
NNVM_REGISTER_OP(dense)
.describe(R"code(Applies a linear transformation: :math:`Y = XW^T + b`.

- **data**: `(x1, x2, ..., xn, input_dim)`
- **weight**: `(units, input_dim)`
- **bias**: `(units,)`
- **out**: `(x1, x2, ..., xn, units)`

The learnable parameters include both ``weight`` and ``bias``.

If ``use_bias`` is set to be false, then the ``bias`` term is ignored.

)code" NNVM_ADD_FILELINE)
.add_argument("data", "nD Tensor", "Input data.")
.add_argument("weight", "2D Tensor", "Weight matrix.")
.add_argument("bias", "1D Tensor", "Bias parameter.")
.add_arguments(DenseParam::__FIELDS__())
.set_attr_parser(ParamParser<DenseParam>)
.set_attr<FGetAttrDict>("FGetAttrDict", ParamGetAttrDict<DenseParam>)
.set_num_outputs(1)
.set_num_inputs(UseBiasNumInputs<DenseParam>)
.set_attr<FListInputNames>("FListInputNames", UseBiasListInputNames<DenseParam>)
.set_attr<FInferShape>("FInferShape", DenseInferShape)
.set_attr<FInferType>("FInferType", ElemwiseType<-1, 1>)
// leave weight & bias layout undefined
.set_attr<FCorrectLayout>("FCorrectLayout", ElemwiseFixedLayoutCopyToOut<1, 1>)
.set_attr<FGradient>(
    "FGradient", [](const NodePtr& n,
                    const std::vector<NodeEntry>& ograds) {
        const DenseParam& param = nnvm::get<DenseParam>(n->attrs.parsed);

        NodeEntry data_grad = MakeNode("matmul",
                                       n->attrs.name + "_data_grad",
                                       {ograds[0], n->inputs[DenseParam::kWeight]});
        NodeEntry w_grad_sub = MakeNode("matmul",
                                        n->attrs.name + "_weight_grad_sub0",
                                        {ograds[0], n->inputs[DenseParam::kData]},
                                        {{"transpose_a", "true"}});
        TShape w_reduce_axis = {0, -1};
        std::ostringstream w_oss; w_oss << w_reduce_axis;
        NodeEntry w_grad = MakeNode("sum", n->attrs.name + "_weight_grad",
                                    {w_grad_sub},
                                    {{"axis", w_oss.str()}, {"exclude", "true"}});
        std::vector<NodeEntry> grads = {data_grad, w_grad};

        if (param.use_bias) {
            TShape axis = {-1};
            std::ostringstream b_oss; b_oss << axis;
            grads.push_back(MakeNode("sum", n->attrs.name + "_bias_grad",
                            {ograds[0]},
                            {{"axis", b_oss.str()}, {"exclude", "true"}}));
        }
        return grads;
    })
.set_support_level(1);

上述C++代码即展示了一个深度学习操作的参数是如何注册的。

Top组件的Python相关代码

和NNVM Top组件的Python相关代码位于：

python/nnvm/top/*.py

Top的Python接口主要功能是通过和TVM topi组件关联来定义所有深度学习操作符的计算模式和调度函数。

这里先介绍registry.py，这个文件负责定义注册函数：

def register_compute(op_name, f=None, level=10)         # 注册操作符的实际计算函数
def register_schedule(op_name, f=None, level=10)        # 注册操作符的实际调度函数
def register_pattern(op_name, pattern, level=10)        # 注册操作符的计算模式

现在以dense这个操作符为例介绍如何定义深度学习操作符的计算函数和调度函数。

/python/nnvm/top/nn.py

# 这里的代码用到了python函数修饰符的语法
@reg.register_compute("dense")
def compute_dense(attrs, inputs, _):
    """Compute definition of dense"""
    if attrs.get_bool("use_bias"):
        return topi.nn.dense(inputs[0], inputs[1], bias=inputs[2])
    return topi.nn.dense(inputs[0], inputs[1])

@reg.register_schedule("dense")
def schedule_dense(_, outs, target):
    """Schedule definition of dense"""
    with tvm.target.create(target):
        return topi.generic.schedule_dense(outs)

reg.register_pattern("dense", OpPattern.OUT_ELEMWISE_FUSABLE)

至此和NNVM Top组件相关的解释已经完成了，下一个文档将介绍NNVM Frontend组件

深度学习编译中间件之NNVM(十四)NNVM源代码阅读3

参考文档

Top组件的C++相关代码

Top组件的Python相关代码

相关阅读

相关文章

相关问答

相关文档