分类目录归档:IT World

CUDA学习笔记(一)——CUDA编程模型

转自新浪博客: http://blog.sina.com.cn/s/blog_48b9e1f90100fm56.html

一、CUDA 编程模型

2009-10-21

CUDA的代码分成两部分,一部分在host(CPU)上运行,是普通的C代码;另一部分在device(GPU)上运行,是并行代码,称为kernel,由nvcc进行编译。

Kernel产生的所有线程成为Grid。在并行部分结束后,程序回到串行部分即到host上运行。

在CUDA中,host和device有不同的内存空间。所以在device上执行kernel时,程序员需要把host memory上的数据传送到分配的device memory上。在device执行完以后,需要把结果从device传送回host,并释放device memory。CUDA runtime system提供了API给程序员做这些事情。

继续阅读

DATA-DRIVEN MACHINE

Basic knowledge
LISP: LISt Processor. It is a function language for the list.
Data-driven machine: different from the old machines. It execute with the data flow.
Pseudo-result: not a actual-result but can be used in the next function as a semi-result.
Processing element: the basic unit of processing. We often call it PE.
Lazy uation: uation of a computation is delayed until the following computation requires the actual argument values.
A new control mechanism: use a data-driven architecture (one of non von-Neumann computers) to exhibits full potential for parallelism both in hardware and software.
Parallel: Divide a program into different piece and execurate in serval processing element at the same time. To do so, we can accelerate the speech of processing time and use the time and space wisely.
Semi-result: A cons operation include a pseudo-result or a semi-result
Actual-result: result with the real data after execurating.
Packet oriented architecture: data transference between each section in a PE is done by a packet in a pipeline manner as well as between each PE
Pseudo-result lifetime: a time interval from the time when a new pseudo-result is created, to the time when the value of the result becomes actual-result.

Details of it

First section, the organization of the data-driven machine is described.
Function uation scheme –
to achieve eager uation with pseudo-result, allows some degree of overlapping of computation.
Machine Organization –
multiprocessing system with a number of identical PEs in wich each PE is connected via a packet communication network. 继续阅读

IBM Cell SDK 装机总结

首先感慨一下这个新技术编程环境啊,是一个进入一个技术的大头,能配好环境,确实是一件不容易的事情啊。前前后后装了2-3天,才算差不多搞定了这个并行的编程环境。
具体这个东西是个啥,就不罗嗦了,google一下IBM CELL SDK就行了,下面就是总结一下配置的经验:
1.新电脑不一定都是好的,在台式机上先后尝试了CENT OS 5, FEDORA 9, FEDORA 12,都木有成功,首先是系统本身就有问题,一会检测不到声卡,1会检测不到网卡。这些问题解决完之后呢,开始装CELL SDK,又有各种依赖性的问题,各种包找不到,然后为了这个包,又去下另外一个包,然后又需要第N+1个包,折腾人啊。
2.网上的教程估计都是理想状态产生的吧?没有任何BUG?没有任何阻拦,全部都一气呵成,有点佩服。。。反正我是没有一次完全成功地按照教程弄好了的,都是查了无数的GUILD PDF和论坛才基本上弄明白的。
3.原来PS3也是可以用来编程的啊,在上面按照教程(这个教程是官方的PDF文档,还不错),装好了yellow dog linux,然后跑起来还算顺畅,可能是老PS3的缘故吧,速度不是很快,但是还是可以接受,特别是在命令行下面的时候。
4.版本统一是个很严重的问题,网上各种教程,各种版本,各种测试代码,搞得眼花缭乱,可能你按版本A装的东西,代码B就跑不起来,你还以为有问题。然后就是机器不同,装了的东西也不一样,笔记本是X86 32位的,台式是64位的,PS3算是PPC吧,真是稀奇古怪什么都有。
5.Linux还是一个必须要学的东西,特别是这几天配置环境,学会了用yum,很强大的命令,还有各种常用命令,哎。
先写这么多,有空再写

Programming with PS3

Now I have a desktop, a laptop, a sony play station 3 with 3 mouse and 3 keyboards.

Programming in Fedora 9 and Yellow Dog Linux with Cell BE SDK and IBM Cell simulator.

The main theme is develop a waiting mechanism for data driven machine and uate the mechanism.

I want to write a formal essay and publish it before I go back home.

Need to work hard now.

I am back

One year ago, I registed a host for personal blog. But the speed is not very fast, in another word, it is very slow sometimes. So I have to cancel it. Later on, I opened a new blog on sina.com.cn and moved all my msn and qzone articles into it. It seems run very well.

So today, I’m back to this blog again and try to figure this out. This is just another world of nobita gu who is at tsukuba university right now.

Hope things go well this time.

Good night.