Apache Hadoop integration to existing Architectures
This section suggests different ideas on the way we think is the best to integrate Hadoop to existing architectures.
How to integrate existing data with Hadoop?
- be transparent in the data importation to DFS
- user periodic batch import
- asynchronic saving of data to DFS
- take data from xml, cvs, data bases
- Pentaho: desktop app ETL using Hive chroned using OS jobs. Full version includes importing from SAP repositories
- Use Hadoop Java API to asynchronously updload files to the File System
-
Use Hadoop Distributed File System’s commands to import data asynchronously
Integration with DB
- SQL-Sqoop-Hive
http://www.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
Log files processing
- Mount HDFS Fuse
https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS
http://wiki.apache.org/hadoop/MountableHDFS
- Flume https://github.com/cloudera/flume
- Chukwa for analyzing log files http://wiki.apache.org/hadoop/Chukwa
Visualization
- Build an open source tool
- Flashless tools:http://raphaeljs.com/ It doesn’t include the exportation to pdf!
- Incorporate graphics to Rocketui de Globant. Rochetui includes widgets on Yahoo-ui, Prototype and JQuery
- Google Visualization: there is a tool which obtains a static image as a printscreen by calling a URL
- Lab de Adobe project to export Flash to HTML5
- Fusion charts http://www.fusioncharts.com/
- Info Build Toolkig
- GWT: data models with canvas. GWT exporter can be used to be referenced with a JS application
Visualization Tools Comparison
http://sixrevisions.com/javascript/20-fresh-javascript-data-visualization-libraries/
How to generate Dashaboard once the output is ready?
- Hive+using intermediate data base. Then you can extract your reports from a MySQL DB
- Export results to CSV
- Fusion Tables
Pentaho functionality
Navigation@globant
Hadoop@Facebook
Hadoop@Twitter
Cassandra
http://www.datastax.com/dev/tutorials
http://www.datastax.com/docs/0.8/introduction/index#getting-started-with-cassandra
http://www.datastax.com/dev





cpanel argentina
I don’t create a ton of responses, but i did
some searching and wound up here Apache Hadoop
integration to existing Architectures