TY - GEN

T1 - Improving constrained search results by data melioration

AU - Guy, Ido

AU - Milo, Tova

AU - Novgorodov, Slava

AU - Youngmann, Brit

N1 - Publisher Copyright:
© 2021 IEEE.

PY - 2021/4

Y1 - 2021/4

N2 - The problem of finding an item-set of maximal aggregated utility that satisfies a set of constraints is at the cornerstone of many search applications. Its classical definition assumes that all the information needed to verify the constraints is explicitly given. However, in real-world databases, the data available on items is often partial. Hence, adequately answering constrained search queries requires the completion of this missing information. A common approach to complete missing data is to employ Machine Learning (ML)-based inference. However, such methods are naturally error-prone. More accurate data can be obtained by asking humans to complete missing information. But, as the number of items in the repository is vast, limiting human effort is crucial. To this end, we introduce the Probabilistic Constrained Search (PCS) problem, which identifies a bounded-size item-set whose data completion is likely to be highly beneficial, as these items are expected to belong to the result set of the constrained search queries in question. We prove PCS to be hard to approximate, and consequently propose a best-effort PTIME heuristic to solve it. We demonstrate the effectiveness and efficiency of our algorithm over real-world datasets and scenarios, showing that our algorithm significantly improves the result sets of constrained search queries, in terms of both utility and constraints satisfaction probability.

AB - The problem of finding an item-set of maximal aggregated utility that satisfies a set of constraints is at the cornerstone of many search applications. Its classical definition assumes that all the information needed to verify the constraints is explicitly given. However, in real-world databases, the data available on items is often partial. Hence, adequately answering constrained search queries requires the completion of this missing information. A common approach to complete missing data is to employ Machine Learning (ML)-based inference. However, such methods are naturally error-prone. More accurate data can be obtained by asking humans to complete missing information. But, as the number of items in the repository is vast, limiting human effort is crucial. To this end, we introduce the Probabilistic Constrained Search (PCS) problem, which identifies a bounded-size item-set whose data completion is likely to be highly beneficial, as these items are expected to belong to the result set of the constrained search queries in question. We prove PCS to be hard to approximate, and consequently propose a best-effort PTIME heuristic to solve it. We demonstrate the effectiveness and efficiency of our algorithm over real-world datasets and scenarios, showing that our algorithm significantly improves the result sets of constrained search queries, in terms of both utility and constraints satisfaction probability.

UR - http://www.scopus.com/inward/record.url?scp=85112867708&partnerID=8YFLogxK

U2 - 10.1109/ICDE51399.2021.00147

DO - 10.1109/ICDE51399.2021.00147

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:85112867708

T3 - Proceedings - International Conference on Data Engineering

SP - 1667

EP - 1678

BT - Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021

PB - IEEE Computer Society

Y2 - 19 April 2021 through 22 April 2021

ER -