Exploring the Application of Transfer Learning in Malware Detection by Fine-tuning Pre-Trained Models on Binary Classification to New Datasets on Multi-class Classification

Tools

Ajayi, Bamidele; Barakat, Basel; McGarry, Ken and Abukeshek, Mays. 2024. 'Exploring the Application of Transfer Learning in Malware Detection by Fine-tuning Pre-Trained Models on Binary Classification to New Datasets on Multi-class Classification'. In: 29th International Conference on Automation and Computing (ICAC). Sunderland, United Kingdom 28 - 30 August 2024. [Conference or Workshop Item]

Preview

Text
Exploring_the_Application_of_Transfer_Learning_in_Malware_Detection_by_Fine-tuning_Pre-Trained_Models_on_Binary_Classification_to_New_Datasets_on_Multi-class_Classification.pdf - Accepted Version
Download (726kB) | Preview

Official URL: https://doi.org/10.1109/ICAC61394.2024.10718851

Abstract or Description

This research presents a method for classifying malicious and benign binary files using Convolutional Neural Networks (CNNs), transitioning from binary to multiclass classification. Three commonly used datasets were tested: EMBER, BODMAS, and MALIMG, with EMBER and BODMAS serving as training and testing sets for the base model. Data from these datasets is converted into image representations and analyzed by CNN models, achieving a high accuracy of 98%. A transfer learning model is then developed, incorporating knowledge from EMBER and BODMAS. This model reduces training time significantly and achieves 97% accuracy with just 5 epochs and a batch size of 25 across 25 malware family sets, averaging a perfect AUC of 1.00. This indicates perfect discrimination between positive and negative classes, with 100% correct predictions, underscoring the robustness of the method.

Item Type:

Conference or Workshop Item (Paper)

Identification Number (DOI):

https://doi.org/10.1109/ICAC61394.2024.10718851

Additional Information:

“© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”

Keywords:

Training, Adaptation models, Accuracy, Computational modeling, Scalability, Transfer learning, Malware, Convolutional neural networks, Computer security, Testing

Departments, Centres and Research Units:

Computing

Dates: