Cloud Computing Parallelism
The challenge for IT is figuring out what to do with all that computing power. That means harnessing parallelism.
Supercomputers exploited parallelism by using the power of mathematics to breakdown matrix-oriented problems into multiple subproblems that could be worked on simultaneously. Taking advantage of its underlying mathematical foundation, Relational DBMS vendors have long been able to decompose queries into sets of operations that could be performed in parallel running on independent processors.
It was Google, however, who most successfully has figured out how to employ high-performance parallel programming techniques to power its search engine by connecting together a million, cheap PC-like servers into what's effectively the world's largest supercomputer. The Google cloud helps ferret out answers to billions of queries in a fraction of a second.
Traditionally, supercomputers have been used mainly by research labs owned by the military, government intelligence agencies, universities and very large companies. The problems they've historically tackled have generally involved enormously complex calculations for such tasks as simulating nuclear explosions, predicting climate change, or designing airplanes.
Cloud computing aims to apply supercomputer power -- measured in the tens of trillions of computations per second -- in a way that users can tap through the Web by spreading data-processing chores across large groups of networked servers.
"Google and the Wisdom of Clouds" describes how Google, teamed with IBM, is introducing students, researchers, and entrepreneurs with the immense power of Google-style computing.
Unlike traditional supercomputers, Google's system never ages. When its individual pieces die, usually after about three years, engineers pluck them out and replace them with new, faster boxes. This means the cloud regenerates as it grows, almost like a living thing.
A move towards clouds signals a fundamental shift in how we handle information. At the most basic level, it's the computing equivalent of the evolution in electricity a century ago when farms and businesses shut down their own generators and bought power instead from efficient industrial utilities.
The software at the heart of Google computing is called "MapReduce." MapReduce delivers Google's speed and industrial heft. It divides each task into hundreds, or even thousands, of tasks, and distributes them to legions of computers. In a fraction of a second, as each one comes back with its nugget of information, MapReduce quickly assembles the responses into an answer.
There's an open-source version of the MapReduce architecture of cloud computing called "Hadoop." The team that developed Hadoop belonged to a company, Nutch, that got acquired. Oddly, they are now working within the walls of Yahoo, which was counting on the MapReduce offspring to give its own computers a touch of Google magic. Hadoop, though, remains open source.
What will computing clouds look like? They'll function as huge virtual laboratories "curating" troves of data. All sorts of business models are sure to evolve. Google's CEO, Eric Schmidt, likes to compare the cloud-based supercomputer data centers to the prohibitively expensive particle accelerators known as cyclotrons. "There are only a few cyclotrons in physics," he says. "And every one if them is important, because if you're a top-flight physicist you need to be at the lab where that cyclotron is being run. That's where history's going to be made; that's where the inventions are going to come." As Mark Dean, head of IBM's research operation in Almaden, Calif., says, in the future using these new cloud computing labs, "you may win the Nobel prize by analyzing data assembled by someone else."
As Google, IBM, Microsoft, Yahoo!, and Amazon lead the world in building massive cloud computing data centers with massively parallel processing capabilities, the only constraints may be finding enough electricity to power their truly amazing infrastructures.