## Abstract

Hierarchical Clustering trees have been widely accepted as a useful form of clustering data, resulting in a prevalence of adopting fields including phylogenetics, image analysis, bioinformatics and more. Recently, Dasgupta (STOC 16’) initiated the analysis of these types of algorithms through the lenses of approximation. Later, the dual problem was considered by Moseley and Wang (NIPS 17’) dubbing it the Revenue goal function. In this problem, given a nonnegative weight wij for each pair i, j ∈ [n] = {1, 2, . . . , n}, the objective is to find a tree T whose set of leaves is [n] that maximizes the function P

i<j∈[n] wij (n − |Tij |), where |Tij | is the number of leaves in the subtree

rooted at the least common ancestor of i and j.

In our work we consider the revenue goal function and prove the following results. First, we prove the existence of a bisection (i.e., a tree of depth 2 in which the root has two children, each being a parent of n/2 leaves) which approximates the general optimal tree solution up to a factor of

1/2 (which is tight). Second, we apply this result in order to prove a 2/3p approximation for the general revenue problem, where p is defined as the approximation ratio of the MAX-UNCUT BISECTION problem. Since p is known to be at least 0.8776 (Austrin et al., 2016) (Wu et al., 2015), we get a 0.585 approximation algorithm for the revenue problem. This improves a sequence of earlier results which culminated in an 0.4246-approximation guarantee (Ahmadian et al., 2019).

i<j∈[n] wij (n − |Tij |), where |Tij | is the number of leaves in the subtree

rooted at the least common ancestor of i and j.

In our work we consider the revenue goal function and prove the following results. First, we prove the existence of a bisection (i.e., a tree of depth 2 in which the root has two children, each being a parent of n/2 leaves) which approximates the general optimal tree solution up to a factor of

1/2 (which is tight). Second, we apply this result in order to prove a 2/3p approximation for the general revenue problem, where p is defined as the approximation ratio of the MAX-UNCUT BISECTION problem. Since p is known to be at least 0.8776 (Austrin et al., 2016) (Wu et al., 2015), we get a 0.585 approximation algorithm for the revenue problem. This improves a sequence of earlier results which culminated in an 0.4246-approximation guarantee (Ahmadian et al., 2019).

Original language | English |
---|---|

Title of host publication | Proceedings of Machine Learning Research |

Subtitle of host publication | Proceedings of Thirty Third Conference on Learning Theory, PMLR |

Editors | Jacob Abernethy, Shivani Agarwal |

Pages | 153-162 |

Number of pages | 10 |

Volume | 125 |

State | Published - 2020 |

Event | Conference on Learning Theory, COLT 2020 - Graz, Austria Duration: 9 Jul 2020 → 12 Jul 2020 http://proceedings.mlr.press/v125/ |

### Conference

Conference | Conference on Learning Theory, COLT 2020 |
---|---|

Abbreviated title | colt2020 |

Country/Territory | Austria |

City | Graz |

Period | 9/07/20 → 12/07/20 |

Internet address |