After reading the essays about recent research, the main parallelization schemes for optimization on Cell Processor are: optimize the DMA transfer (single buffering, double buffering and multi-way buffering), optimize the implementation models (how to divide the whole data block into different piece and how to connect each program structures) and optimize the get/put methods (using some prefetch mechanism such as pseudo-result, asynchronous unsafe cache and software cache).
通过几天阅读的论文情况,基本上发现了几种加速并行处理的方式:优化DMA传输方式,缓冲,双缓冲和多重缓冲;优化实现的模型,拆分程序流到不同的SPE,不同的计算模型,会得到不同的实现效果;优化数据读取的方式,比如利用预读取,虚拟结果,和异步多流水线缓存和程序缓存的方式。
继续阅读 →