The focus of the Mahout project is to develop commercially-friendly, scalable machine learning algorithms such as classification, clustering, regression and dimension reduction under the Apache brand and on top of Hadoop.
The need for machine-learning techniques like clustering, collaborative filtering, and categorization has never been greater, be it for finding commonalities among large groups of people or automatically tagging large volumes of Web content. The Apache Mahout project aims to make building intelligent applications easier and faster.
Although relatively young in open source terms, Mahout already has a large amount of functionality, especially in relation to clustering and Collaborative Filtering. Mahout’s primary features are:
- Taste CF, an open source project for Collaborative Filtering.
- Several MapReduce enabled clustering implementations, including k-Means, fuzzy k-Means, Canopy, Dirichlet, and Mean-Shift.
- Distributed Naive Bayes and Complementary Naive Bayes classification implementations.
- Distributed fitness function capabilities for evolutionary programming.
- Matrix and vector libraries.
Why use Mahout?
Companies can use Mahout to:
- Assess, identify and describe their relative position in the market in terms of business maturity, performance and health.
- Identify, evaluate and choose courses of action that will accelerate growth and development.
- Monitor their own progress over time and continuously evaluate strategies, plans and actions.
Use case of Mahout
- Collaborative filtering