为什么我的GPU在矩阵运算中比CPU慢？-乒乓球女子世界杯-世界杯预赛_世界杯分析

CPU: i7-9750 @2.6GHz (带有16G DDR4 Ram)；GPU: Nvidia Geforce GTX 1600 TI (6G)；OS: Windows 10-64位

我试着看看GPU和CPU相比做基本矩阵操作的速度有多快，我基本上遵循了这个https://towardsdatascience.com/heres-how-to-use-cupy-to-make-numpy-700x-faster-4b920dda1f56。下面是我的超级简单代码

代码语言：javascript运行复制import numpy as np

import cupy as cp

import time

### Numpy and CPU

s = time.time()

A = np.random.random([10000,10000]); B = np.random.random([10000,10000])

CPU = np.matmul(A,B); CPU *= 5

e = time.time()

print(f'CPU time: {e - s: .2f}')

### CuPy and GPU

s = time.time()

C= cp.random.random([10000,10000]); D = cp.random.random([10000,10000])

GPU = cp.matmul(C,D); GPU *= 5

cp.cuda.Stream.null.synchronize()

# to let the code finish executing on the GPU before calculating the time

e = time.time()

print(f'GPU time: {e - s: .2f}')具有讽刺意味的是，它显示了CPU时间: 11.74 GPU时间: 12.56

这真的让我很困惑。在大型矩阵操作上，GPU怎么会比CPU慢呢？请注意，我甚至没有应用过并行计算(我是一个初学者，我不确定系统是否会为我打开它)。我确实检查过类似的问题，比如Why is my CPU doing matrix operations faster than GPU instead?。但是这里我使用的是cupy而不是mxnet (cupy是为GPU计算设计的更新的)。

有人能帮忙吗？我真的很感激！