A machine learning method for estimating the probability of presence using presence-background data

Yan Wang*, Chathuri L. Samarasekara, Lewi Stone

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Estimating the prevalence or the absolute probability of the presence of a species from presence-background data has become a controversial topic in species distribution modelling. In this paper, we propose a new method by combining both statistics and machine learning algorithms that helps overcome some of the known existing problems. We have also revisited the popular but highly controversial Lele and Keim (LK) method by evaluating its performance and assessing the RSPF condition it relies on. Simulations show that the LK method with the RSPF assumptions would render fragile estimation/prediction of the desired probabilities. Rather, we propose the local knowledge condition, which relaxes the predetermined population prevalence condition that has so often been used in much of the existing literature. Simulations demonstrate the performance of the new method utilizing the local knowledge assumption to successfully estimate the probability of presence. The local knowledge extends the local certainty or the prototypical presence location assumption, and has significant implications for demonstrating the necessary condition for identifying absolute (rather than relative) probability of presence from presence background without absence data in species distribution modelling.

Original languageEnglish
Article numbere8998
JournalEcology and Evolution
Issue number6
StatePublished - Jun 2022
Externally publishedYes


FundersFunder number
Australian Research CouncilDP190100613


    • RSPF
    • constrained LK method
    • local certainty
    • local knowledge
    • presence-background
    • prevalence
    • probability of presence


    Dive into the research topics of 'A machine learning method for estimating the probability of presence using presence-background data'. Together they form a unique fingerprint.

    Cite this