<div><div class="gmail_quote">On 4 November 2010 14:07, E Chalaron <span dir="ltr"><<a href="mailto:e.chalaron@xtra.co.nz">e.chalaron@xtra.co.nz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi Tim<br>
<br>
Well I don't even know where to start ...<br>
Let's say ideally 1 would have a disk with a complete system and the<br>
soft I am using the other 4 just CPU power.<br>
I still have a question before getting any further : how efficient is<br>
clustering for this type of jobs, e.g. munching numbers non stop.<br>
Is there a relation between the amount of boxes working together versus<br>
1 big CPU like a i7?<br></blockquote><div><br></div><meta http-equiv="content-type" content="text/html; charset=utf-8">Hrm... parallel programming does not scale linearly with the number of processes. There is some added complexity to split jobs up and reassemble them. It's hard to generalise in a single email. Wikipedia is a good resource[1] However, parallel processing can perform much better than a single CPU if three broad features exist with the data:</div>
<div class="gmail_quote"><br></div><div class="gmail_quote"> - massive amounts of data are required to be processed</div><div class="gmail_quote"> - the job can be split into many smaller jobs</div><div class="gmail_quote">
- shared state is not required between the smaller jobs</div><div class="gmail_quote"><div><br></div><div>If you have lots of numbers to crunch, and some time to learn the API, I recommend taking a look at using the GPU to process some of these data. GPU programming is similar to map/reduce. As long as you don't need shared state, you can process massive computation in parallel. You will get around a 40x performance increase with the right type of problem.</div>
<div><br></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8">One option would be to load a system like Puppy linux into RAM from an external harddrive / CD and then unmount the disk. You then need to load some event listener that is ready to receive tasks. If you're into Python, a very simple map reduce framework is mincemeat.py[2]. A more featureful package is mrjob[3], which is designed for Hadoop. Otherwise you could roll your own processing system with execnet [4].</div>
<div><br></div><div>I don't know if I can provide anything more specific without knowing more details of your problem. Parallel programming is unfamiliar to everyone. Martin gave a very good talk at RailsConf this year about this issue[5]. Although our hardware has progressed 10^25 in 50 years - all software derived from Fortran has been tailored for single core processing.</div>
<div><br></div><div><bigger tangent> In general, I think you will see a resurgence in interest in functional languages. My picks are JavaScript, which is derived from Scheme, and Erlang due to its communications heritage. </div>
<meta http-equiv="content-type" content="text/html; charset=utf-8"><div><br></div><div>Tim</div><div><br></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8">[1] <a href="http://en.wikipedia.org/wiki/Parallel_computing">http://en.wikipedia.org/wiki/Parallel_computing</a></div>
<div>[2] <a href="http://remembersaurus.com/mincemeatpy/">http://remembersaurus.com/mincemeatpy/</a></div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>[3] <a href="http://packages.python.org/mrjob/writing-and-running.html">http://packages.python.org/mrjob/writing-and-running.html</a></div>
<div>[4] <a href="http://codespeak.net/execnet/">http://codespeak.net/execnet/</a></div><div>[5] <a href="http://www.youtube.com/watch?v=mslMLp5bQD0">http://www.youtube.com/watch?v=mslMLp5bQD0</a></div><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta http-equiv="content-type" content="text/html; charset=utf-8"></div>
</div>