pycuda CompileError: nvcc compilation failed

上官淮晨
2023-12-01

环境 win10, visual studio 2019, pycuda 2019.02,

写一段 cuda c 代码, 提交给一个构造函数

import pycuda.driver as cuda
import pycuda.autoinit 
from pycuda.compiler import SourceModule

import numpy as np
a = np.random.randn(4,4)
a = a.astype(np.float32)

a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)

运行一个内核函数(kernel)

写一个代码来把a_gpu这段显存中存储的数组的每一个值都乘以2. 为了实现这个效果,我们就要写一段CUDA C代码,然后把这段代码提交给一个构造函数,这里用到了pycuda.compiler.SourceModule:

mod = SourceModule("""
 __global__ void doublify(float *a)
 {
 int idx = threadIdx.x + threadIdx.y*4;
 a[idx] *= 2;
 }
 """)

pycuda CompileError: nvcc compilation failed

如果这一步出错CompileError: nvcc compilation of C:\Users\KANGNI~1\AppData\Local\Temp\tmpbregy28e\kernel.cu failed 可能是环境配置的问题,尝试在代码中加入如下片段

import os
_path = r"D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\bin\Hostx64\x64"

if os.system("cl.exe"):
   os.environ['PATH'] += ';' + _path
if os.system("cl.exe"):
   raise RuntimeError("cl.exe still not found, path probably incorrect")
mod = SourceModule("""
   					__global__ void doublify(float *a)
   					{
   						int idx = threadIdx.x + threadIdx.y*4;
   						 a[idx] *= 2;
   					 }
   			""")

这一步如果没有出错,就说明这段代码已经编译成功,并且加载到显卡中。然后咱们可以使用pycuda.driver.Function,然后调用此引用,把显存中的数组a_gpu作为参数传过去,同时设定块大小为4x4:


func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))

a_doubled = np.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

result:

[[ 1.4142118   1.48865     3.0958736  -0.64879215]
 [ 0.5829473   1.684329    0.21461935 -2.3383026 ]
 [ 1.2626396  -0.7566854   1.8427325  -0.52471924]
 [-0.80164     1.3886924  -3.5787368   0.72956717]]
[[ 1.4142118   1.48865     3.0958736  -0.64879215]
 [ 0.5829473   1.684329    0.21461935 -2.3383026 ]
 [ 1.2626396  -0.7566854   1.8427325  -0.52471924]
 [-0.80164     1.3886924  -3.5787368   0.72956717]]

ref: http://www.voidcn.com/article/p-zoelqetb-bvv.html

 类似资料:

相关阅读

相关文章

相关问答