This section suggests different ideas on the way we think is the best to integrate Hadoop to existing architectures.
How to integrate existing data with Hadoop?
- be transparent in the data importation to DFS
- user periodic batch import
- asynchronic saving of data to DFS
- take data from xml, cvs, data bases
- Pentaho: desktop app ETL using Hive chroned using OS jobs. Full version includes importing from SAP repositories
- Use Hadoop Java API to asynchronously updload files to the File System
Use Hadoop Distributed File System’s commands to import data asynchronously
Integration with DB
Log files processing
- Mount HDFS Fuse
- Flume https://github.com/cloudera/flume
- Chukwa for analyzing log files http://wiki.apache.org/hadoop/Chukwa
- Build an open source tool
- Flashless tools:http://raphaeljs.com/ It doesn’t include the exportation to pdf!
- Incorporate graphics to Rocketui de Globant. Rochetui includes widgets on Yahoo-ui, Prototype and JQuery
- Google Visualization: there is a tool which obtains a static image as a printscreen by calling a URL
- Lab de Adobe project to export Flash to HTML5
- Fusion charts http://www.fusioncharts.com/
- Info Build Toolkig
- GWT: data models with canvas. GWT exporter can be used to be referenced with a JS application
Visualization Tools Comparison
How to generate Dashaboard once the output is ready?
- Hive+using intermediate data base. Then you can extract your reports from a MySQL DB
- Export results to CSV
- Fusion Tables