Optimizing AI for Mobile Malware Detection by Self-Built-Dataset GAN Oversampling and LGBM

Ortal Dayan, Lior Wolf, Fang Wang, Yaniv Harel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The cyber detection industry focuses on analyzing the behavior of threats in order to develop IOCs and triggers. This process makes the detection always behind the attackers, as there is an analysis time between the attack tool launch and the detection ability. To address the challenges, a dedicated Sandbox environment was built, and thousands of mobile devices' samples were tested, resulted in creation of an up-to-date training dataset that is not based on the attacks analysis. With this dataset, the research focus was directed towards optimizing the AI methodology to achieve the best detection rates for a compromised mobile device. A CupolaGAN was implemented to oversample dataset and to compare results obtained from training LGBM models on both original imbalanced dataset and oversampled dataset. Classification scores on the oversampled data increase by maximum of 0.47+/-0.37%. The performance of the fine-tuned model using Optuna on the balanced data reaches 99.36+/-0.19% accuracy.

Original languageEnglish
Title of host publicationProceedings of the 2023 IEEE International Conference on Cyber Security and Resilience, CSR 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages60-65
Number of pages6
ISBN (Electronic)9798350311709
DOIs
StatePublished - 2023
Event3rd IEEE International Conference on Cyber Security and Resilience, CSR 2023 - Hybrid, Venice, Italy
Duration: 31 Jul 20232 Aug 2023

Publication series

NameProceedings of the 2023 IEEE International Conference on Cyber Security and Resilience, CSR 2023

Conference

Conference3rd IEEE International Conference on Cyber Security and Resilience, CSR 2023
Country/TerritoryItaly
CityHybrid, Venice
Period31/07/232/08/23

Keywords

  • CupolaGAN
  • LightGBM
  • Sandbox
  • cybersecurity
  • malware detection
  • oversampling

Fingerprint

Dive into the research topics of 'Optimizing AI for Mobile Malware Detection by Self-Built-Dataset GAN Oversampling and LGBM'. Together they form a unique fingerprint.

Cite this