MapReduce and its discontents

Había leído hace tiempo esta presentación

Pero hasta el curso en el que estoy esta semana no lo he visto tan claro.

Sin duda que Hadoop resulta útil en muchos escenarios, ¿pero es la solución universal al escenario Big Data actual?…tengo ya una opinión, pero quizás sea pronto para dejarla por escrito…de momento me quedo con algunas de las opiniones de @deanwampler, concretamente con esta slide, hoy mismo he sentido lo mismo!!! Para ejecutar un ejemplo tonto Map Reduce necesitas tener escrito los pasos!!!!

Frase lapidaria!!!

“I worked with EJBs a decade ago. Like EJBs, Hadoop has an invasive API that obscures your business logic and reusability. There were too many configuration options in XML files.

The framework “paradigm” is a poor fit for most problems (like soft real time systems and most algorithms beyond Word Count). Internally, EJB implementations were inefficient and hard to optimize, because they relied on poorly considered object boundaries that muddled more natural boundaries and created large-scale, monolithic modules with few abstractions for extension and optimization points.

I’ve also argued in other presentations and my “FP for Java Devs” book that OOP is a poor modularity tool…

The fact is, Hadoop reminds me of EJBs in almost every way. It works okay and people do get stuff done, but just as the Spring Framework brought an essential rethinking to Enterprise Java, I think there is an essential rethink that needs to happen in Big Data. The FP community is well positioned to create the next generation.”

“The “next gen.”, V2.0 Hadoop fixes some problems, but I think this first-generation infrastructure has too many flaws to be dominant for a long time (at least outside large enterprises that always stick with suboptimal solutions sold by big-name players). The Java-OO eccentricities, the overly-large and bloated modules, the premature optimization (and missing optimizations that wouldn’t be premature), lead me to believe that Hadoop will be displaced the same way that the Spring Framework displaced EJBs.”

Y ya en un mundo que me resulta más familiar dentro de Spring Data Spring ya existe una versión para trabajar con Hadoop: Spring Data Apache Hadoop aún en M2, y cierto que se simplifica su configuración:

¿Pero será suficiente? ¿O vuelve a acertar Google que ya se está moviendo de MapReduce a Pregel?…seguiremos investigando 😉

Deja un comentario