No compiled kernel found.
Compiling kernels : C:\Users\admin.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\admin.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\admin.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
d:/mingw/bin/…/lib/gcc/mingw32/6.3.0/…/…/…/…/mingw32/bin/ld.exe: cannot find -lpthread
collect2.exe: error: ld returned 1 exit status
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 C:\Users\admin.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c -shared -o C:\Users\admin.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
Kernels compiled : C:\Users\admin.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
Cannot load cpu kernel, don’t use quantized model on cpu.
(1)手动编译,在模型path下
gcc -fPIC -pthread -fopenmp -std=c99 quantization_kernels_parallel.c -shared -o quantization_kernels_parallel.so
gcc -fPIC -pthread -fopenmp -std=c99 quantization_kernels.c -shared -o quantization_kernels.so
(2)然后在原先模型加载后手动加载一下手动编译的kernel
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
model = model.quantize(bits=4, kernel_file="Your Kernel Path")
还是会报编译错误,但是已经可以使用了。