Website categorization via design attribute learning

Doron Cohen*, Or Naim, Eran Toch, Irad Ben-Gal

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Malicious software (malware) is a challenging cybersecurity threat, as it is often bundled with legitimate software and downloaded by naïve users. A significant source of malware downloads is via crack websites that are used to circumvent copyright protection mechanisms. Crack websites often change URLs and IPs to avoid automatic detection; however, in many cases, they preserve specific visual designs that signal the website's function to potential users (such as particular colors, text fonts, shapes, and sizes.). Website design features are numerous, have high dimensionality and complicated interactions, making categorization challenging. This study shows that straightforward machine learning models for categorizing Crack and Malicious websites can considerably benefit from using design features. We report on two experiments based on unbalanced datasets and show that classification by using design features can reach a categorization accuracy of over 90% with an F1-score over 77% in some instances. Finally, we discuss the results in the context of developing intelligent security mechanisms.

Original languageEnglish
Article number102312
JournalComputers and Security
Volume107
DOIs
StatePublished - Aug 2021

Funding

FundersFunder number
ICRC
Koret Foundation

    Keywords

    • Crack websites
    • Cyber security
    • Human computer interaction
    • Malware
    • Online learning
    • Website categorization
    • Website design elements

    Fingerprint

    Dive into the research topics of 'Website categorization via design attribute learning'. Together they form a unique fingerprint.

    Cite this