1. First, analyze the algorithm parts and step, ex: point of interest algorithm consist of five parts, we divide it into 9 steps.
2. Program the spe, each spe can do the whole job, depends on the worknum sent to spe. Ex: the single CPU can do as in x86 model.
3. Define the partition and size of work. Ex: here we part the work into 8, so 8*9 steps need to be worked.
4. PPU, initial the data from image, divided an area for storage, and initialize work dependency. Ex: we read the image, and save data into img array, and setup the status array.
5. PPU circle check if there is work need to be done, and send data to spu, update the work dependency list. Ex: use while, if status don’t reach the end, means work need to do, so sign work.
6. PPU get the result from the spu, and update the result cache list, which will be used later. Ex: send result function.
7. Collect all the results and save into files or have a final result. Ex: PPU use the corner array to generate the corner picture.
These are simulating in single CPU.
When develop into real cell parallel machine, things need to be considered.
1. Almost the same.
2. Setup spe program, choose the right worknum to work.
3. Partition and size can be analyze depends on the num of spe and total memory. We can try several combinations.
4. Almost the same, but the dependency lists various as the algorithm is different.
5. Send data with DMA transfer to LS, and choose the free spe, considerate the locality. The LS maybe limited, so we should replace some.
6. Update result list, don’t need this much list, just one, and can store different data.
7. Organize data and give out the final result.
Smith waterman algorithm:
1. Divide into several blocks.
2. Choose one to start, and send the east and south data back to store, other data, and store into MM.
3. Dynamic choose spe and finish all.
4. PPU organize the result.
=========================================
After meeting, we discussed about what I’ve done and compared these two algorithm. Maybe Smith Water man algorithm is better to evaluation this kind of waiting mechanism. Because it has more dependency.
So next, I will realise that, and simulate on single cpu, later, develop it into real cell.
After, I may continue this poi algorithm.