Friday, May 23, 2008

Large Scale Data Munging

I don't believe I've ever posted a link to one of my favorite blogs, Datawocky. Great article today on super large scale data munging:

Yet the Map Reduce paradigm has its limitations. The biggest problem is that it involves writing code for each analysis. This limits the number of companies and people that can use this paradigm. The second problem is that joins of different data sets is hard. The third problem is that Map Reduce works on files and produces files; after a while the number of files multiplies and it becomes difficult to keep track of things. What's lacking is a metadata layer, such as the catalog in database systems. Don't get me wrong; I love Map Reduce, and there are applications that don't need these things, but increasingly there are applications that do.

-Why the World Needs a New Database System

0 Comments:

Post a Comment

<< Home


[/home] [blog home]

07/01/2002 - 08/01/2002 08/01/2002 - 09/01/2002 09/01/2002 - 10/01/2002 10/01/2002 - 11/01/2002 11/01/2002 - 12/01/2002 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 04/01/2005 - 05/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 02/01/2006 - 03/01/2006 03/01/2006 - 04/01/2006 04/01/2006 - 05/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 11/01/2006 - 12/01/2006 12/01/2006 - 01/01/2007 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 04/01/2008 - 05/01/2008 05/01/2008 - 06/01/2008 06/01/2008 - 07/01/2008