Carrotò is an open source search results clustering engine. It can automatically cluster small collections of documents, e.g. search results or document abstracts, into thematic categories. Carrotò is written in Java and distributed under the BSD license.
The initial version of Carrotò was implemented in 2001 by Dawid Weiss as part of his MSc thesis to validate the applicability of the STC clustering algorithm to clustering search results in Polish. In 2003, a number of other search results clustering algorithms were added, including Lingo, a novel text clustering algorithm designed specifically for clustering of search results. While the source code of Carrotò was available since 2002, it was only in 2006 when version 1.0 was officially released. In the same year, version 2.0 was released with improved user interface and extended tool set. In 2009, version 3.0 brought significant improvements in clustering quality, simplified API and new GUI application for tuning clustering based on the Eclipse Rich Client Platform. In 2020, version 4.0.0 brought further simplification of the API, code cleanups and removal of the desktop Workbench. Version 4.1.0 brings back the Workbench as a web-based application.
Carrotò 4.0 is predominantly a Java programming library with public APIs for management of language-specific resources, algorithm configuration and execution. A HTTP/REST component (document clustering server) is provided for interoperability with other languages.
Carrotò offers a few document clustering algorithms that place emphasis on the quality of cluster labels:
Carrot Search, a commercial spin-off of the Carrotò project, works on further development of Carrotò, offers a real-time text clustering algorithm compliant with the Carrotò framework as well as text mining consulting services based on open source and proprietary software.
Carrotò gave rise to a number of independent open source projects released under the umbrella of Carrot Search Labs. The following projects are or were published as part of this initiative:
Discontinued projects: