Twenty petabytes (20,000 terabytes) per day is a tremendous amount of data processing and a key contributor to Google's continued market dominance. Competing search storage and processing systems at Microsoft (Dyrad) and Yahoo! (Hadoop) are still playing catch-up to Google's suite of GFS, MapReduce, and BigTable.
MapReduce statistics for different months
|Aug. 2004||Mar. 2006||Sep. 2007|
|Number of jobs (1000s)||29||171||2,217|
|Avg. completion time (secs)||634||874||395|
|Machine years used||217||2,002||11,081|
|Avg. machines per job||157||268||394|
Google processes its data on a standard machine cluster node consisting two 2 GHz Intel Xeon processors with Hyper-Threading enabled, 4 GB of memory, two 160 GB IDE hard drives and a gigabit Ethernet link. This type of machine costs approximately $2400 each through providers such as Penguin Computing or Dell or approximately $900 a month through a managed hosting provider such as Verio (for startup comparisons).
The average MapReduce job runs across a $1 million hardware cluster, not including bandwidth fees, datacenter costs, or staffing.