[wellylug] diskless headless
Tim McNamara
paperless at timmcnamara.co.nz
Thu Nov 4 14:31:38 NZDT 2010
On 4 November 2010 14:07, E Chalaron <e.chalaron at xtra.co.nz> wrote:
> Hi Tim
>
> Well I don't even know where to start ...
> Let's say ideally 1 would have a disk with a complete system and the
> soft I am using the other 4 just CPU power.
> I still have a question before getting any further : how efficient is
> clustering for this type of jobs, e.g. munching numbers non stop.
> Is there a relation between the amount of boxes working together versus
> 1 big CPU like a i7?
>
Hrm... parallel programming does not scale linearly with the number of
processes. There is some added complexity to split jobs up and reassemble
them. It's hard to generalise in a single email. Wikipedia is a good
resource[1] However, parallel processing can perform much better than a
single CPU if three broad features exist with the data:
- massive amounts of data are required to be processed
- the job can be split into many smaller jobs
- shared state is not required between the smaller jobs
If you have lots of numbers to crunch, and some time to learn the API, I
recommend taking a look at using the GPU to process some of these data. GPU
programming is similar to map/reduce. As long as you don't need shared
state, you can process massive computation in parallel. You will get around
a 40x performance increase with the right type of problem.
One option would be to load a system like Puppy linux into RAM from an
external harddrive / CD and then unmount the disk. You then need to load
some event listener that is ready to receive tasks. If you're into Python, a
very simple map reduce framework is mincemeat.py[2]. A more featureful
package is mrjob[3], which is designed for Hadoop. Otherwise you could roll
your own processing system with execnet [4].
I don't know if I can provide anything more specific without knowing more
details of your problem. Parallel programming is unfamiliar to everyone.
Martin gave a very good talk at RailsConf this year about this issue[5].
Although our hardware has progressed 10^25 in 50 years - all software
derived from Fortran has been tailored for single core processing.
<bigger tangent> In general, I think you will see a resurgence in interest
in functional languages. My picks are JavaScript, which is derived from
Scheme, and Erlang due to its communications heritage.
Tim
[1] http://en.wikipedia.org/wiki/Parallel_computing
[2] http://remembersaurus.com/mincemeatpy/
[3] http://packages.python.org/mrjob/writing-and-running.html
[4] http://codespeak.net/execnet/
[5] http://www.youtube.com/watch?v=mslMLp5bQD0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wellylug.org.nz/pipermail/wellylug/attachments/20101104/e15f80bf/attachment.htm>
More information about the wellylug
mailing list