Back to index

BIRCH: An Efficient Data Clustering Method for Very Large Databases

Tian Zhang, Raghu Ramakrishnan, Miron Livny, UW Madison Summary by: Armando Fox and Steve Gribble

One-line summary: Scan data and build balanced trees representing data clusters; each point is inserted into a cluster based on a local decision, and may cause the cluster to split/merge (like GiST); when done, do a little polishing, toss some outliers (all using heuristics with no statistical basis), and voila, you have data clustering.

Overview/Main Points


Single-pass, sort-of-linear time algorithm that results in a sort-of-clustering of large number of data points, with most outliers sort-of-thrown-out.


Back to index