FlashMob Computing is a fascinating concept and the first attempt on April 3 was at least partially successful:
FlashMob I was very successful and a lot of fun. Over 700 computers came into the gym and we were able to hook up 669 to the network. Our best Linpack result was a peak rate of 180 Gflops using 256 computers, however a node failed 75% through the computation. Our best completed result was 77 Gflops using 150 computers. The biggest challenge was indentifying flakely computers and determining the best configuration for running the benchmark. Each of the 669 computers ran Linpack at some point in the day.
- Today, supercomputing is controlled largely by governmental organizations, academic research institutions, animation studios, and recently biotech companies. This means that the problems that get solved by supercomputers are narrow in scope and tightly controlled. We want to change that.
We think that a group of folks should be able to get together and study whatever they want, and they should be able to use a supercomputer to help them. So if a high school science class wanted to study the ozone hole using a supercomputer model, they could create a FlashMob supercomputer in a few hours and start running their model today. If a group of neighbors were worried about how a local gas station’s underground gas tank might leak into the drinking water if the tank ever cracked, they could use Flash Mob Computing to model the scenario. In short, we hope Flash Mob Computing will democratize supercomputing. That is to say, it will make supercomputing accessible to everyone. To us, that’s a very exciting idea.
….Just what is Flash Mob Computing and FlashMob I?
A Flash Mob supercomputer is hundreds or even thousands of computers connected together via a LAN working together as a single supercomputer. A Flash Mob computer, unlike an ordinary cluster, is temporary and organized on-the-fly for the purpose of working on a single problem. Flash Mob I is the first of its kind. By bringing hundreds of people like you together in one room, we will have enough computing power to become one of the fastest supercomputers on the planet.
….FlashMob I is the brainchild of a group of graduate students at USF studying supercomputers. Our hope at the beginning of the semester was to build a supercomputer that would make the Top 500 list of supercomputers. After some back-of-the-envelope calculations, we concluded that we were about 100 computers short of having a good shot. Someone raised their hand and said: “We could post a message on Craig’s List and get a hundred people to just show up.” Thus the idea of FlashMob Computing was born.
How is FlashMob Computing different? Today supercomputing can be divided into two categories: Big Iron and Grid Computing.
Big Iron “Big iron” supercomputing dates back to World War II. Historically, supercomputers like the old Cray Supercomputers, or the current reigning champion, Japan’s Earth Simulator, are hideously expensive custom machines that use custom parts and are constructed by PhD’s to do very specific things. Recently Apple and Virginia Tech made headlines by networking 1100 Apple G5’s together and creating the 3rd fastest supercomputer for the low-low price of $5 million (which is an impressively big step down from the estimated $1 B-B-Billion dollars the Earth Simulator cost). But still, unless you have a couple of million lying around, supercomputers of this kind are pretty much still out of reach.
Grid Computing Then there’s Grid Computing. Grid Computing is based on the idea that most computers are idle most of the time. So, instead of a screen saver with flying toasters, let’s put the computer to work. [email protected] is the best known example of a grid computer. Users install a free program that downloads and analyzes radio telescope data looking for patterns in the data that might be signs of extraterrestrial life (i.e. the Search for Extraterrestrial Intelligence AKA SETI). [email protected] is pretty awesome but you need a **lot** of computers (currently [email protected] has roughly half-a-million active computers at any one time) and grid computing is only good for certain kinds of problems. (More on that later.)
FlashMob Computing FlashMob I is something new in the world of supercomputers. FlashMob I is an ad-hoc supercomputer created on-the-fly using ordinary PC’s interconnected via a well-organized LAN. A FlashMob computer has no permanent infrastructure, it’s designed to be run in a gymnasium or a warehouse. There are no cooling towers, no expensive T-3’s connected to the Internet, no custom hardware. The primary cost of a FlashMob computer is people’s time.
Comparing FlashMob Computing to Grid and Big Iron Before you can compare the three, you have to understand what makes a computer a “supercomputer” and a little about the kinds of problems supercomputers solve.
A supercomputer is a computer that has a lot of CPU’s working in parallel on a single problem. The smallest supercomputers typically have more than 64 CPU’s; the Earth Simulator has 5120 CPU’s. So your dual-CPU gaming machine does not count as a supercomputer. Sorry. Morever, if you brought 64 friends over for a LAN party to play Quake, you still don’t have a supercomputer because the 64 machines are not all working on solving one single problem. Sorry again. It’s both the parallelism and the singularity of purpose that defines a supercomputer. So what can I do with a supercomputer?
There are essentially two types of problems supercomputers can solve: problems where all the CPU’s have to talk to each other occasionally and problems where CPU’s have to talk to each other all the time. Here’s two examples:
Shrek. Computer animated movies like Shrek are created frame-by-frame by a computer. The computer has to calculate the color and brightness of each pixel on the screen. There are a lot of pixels on a movie frame and sixteen frames per second of movie. So a ninety minute movie like Shrek or Toy Story or Finding Nemo takes a lot of computing power. The good news is that the pixel on the upper-left of the screen doesn’t have much to do with the pixel on the lower right. So if you wanted to speed up the process of drawing one frame you could assign one computer to draw the left side of the screen and another to draw the right side, this would cut the time it takes to draw one frame in half. If you cut the screen into four areas you could put four computers on the job, eight and so on. Because the computers drawing the various parts of the screen don’t have to talk to each other, the computers can be far apart, and if one of the computers fails it’s no big deal because the other computers aren’t waiting for each others results. The failed computer’s job can just be resubmitted at the end.
Billiard Balls. Now imagine you wanted a computer to calculate where all the balls would end up after the initial break in a game of eight-ball. The result depends on how all the balls hit each other and the walls of the pool table. Unlike Shrek, the movement of one ball is very much dependent on the movements of all the other balls. Imagine you assigned one computer to calculate the low balls, and one computer to calculate the high balls. As soon as a low ball hit a high ball, the two computers would have to exchange data to determine how to change the balls’ paths based on their impact. Since lots of high balls hit lots of low balls on a break, the two computers have to talk constantly. Moreover, if one computer failed the other computer couldn’t finish its job because it would be waiting for the failed computer. Consequently, the computers need to be close together so they can talk very quickly, and they have to be very reliable because they’re counting on each other. The billiard ball example is simple, there are only fifteen balls. But now imagine there were 1000 balls, or a million balls, or the balls weren’t balls at all but instead atoms of water flowing in a river. In this case the computers would spend more time talking than computing. So the network is very important, and so is the reliability of the nodes. If one computer is slow or one fails, everyone waits.
Big Iron is good at both. But a lot of the cost of Big Iron goes into the networking, so to put Big Iron to work on Shrek is a waste of a lot of expensive networking equipment. So they’re generally not used this way. Big Iron really shines in problems analogous to the Billiard Balls (like modeling the weather or how air flows around a jet fighter’s wings, or how far a nuclear mushroom cloud will billow given the winds.)
Grid Computers are very good for problems like Shrek, but not so good for Billiard Ball type problems simply because the worst computer on a grid computer is usually pretty bad and nodes on a grid fail often so there’s lots of waiting and lots of redos.
FlashMob computers have an advantage over Big Iron in that they are inexpensive and an advantage over grid computers in that they can solve Billiard Ball type problems. Unlike big iron, FlashMob computers can be sized to fit the problem at hand by adding more or fewer nodes. FlashMob Computing nodes are also dedicated to the problem at hand, so individual nodes are much more reliable than grid computer nodes.
Amazing and the ramifications are dramatic, namely that the barriers to entry for tackling supercomputer- related questions are greatly reduced.