IMPROVING PHISHING URL DETECTION ACROSS DOMAINS USING TRANSFORMER MODELS AND GRADIENT BOOSTING MACHINES
DOI:
https://doi.org/10.63878/cjssr.v3i4.1473Abstract
Phishing is still one of the most notorious forms of cybercrime, and it is used in most data breaches. Phishing is a form of online fraud that takes advantage of victims' psychological vulnerabilities. The most successful method for preventing phishing attacks. This is because it enables users to identify harmful intentions based on the content and the forms of URLs, although there are various other methods available. On the other hand, there are other machine learning and deep learning models that are already in existence. Furthermore, it is a worry, particularly for phishing scenarios in which URLs have a brief duration, and campaigns typically employ newly created domains that are them free of detection. Another point to consider is that the precise structure and encoding of URLs can differ from one network system to another. It is therefore possible for the datasets that were acquired from various entities to differ in such characteristics. Increasing the generalization capacity of phishing detection algorithms across domains is the goal of a novel model that is described here. This model is based on Unsupervised Domain Adaptation (UDA), which is offered to address these challenges. In this work I ammainly focus on early attack detection using transformer models and gradient boosting machines. Three main algorithms are used for attack detection named BERT, LSTM and Gradient Boosting Machine. I amutilized the benchmark dataset containing 600,000 URLs samples that were labelled. These Uniform Resource Locators (URLs) are shorthand for websites that are both accessible and legitimate. When web crawlers were employed to reach these URLs, an HTTP status code of 200 was generated. This is an exceptionally significant fact to take into consideration. I amsplit the dataset into training, testing and validation. Furthermore, I amcompared our proposed approach with previous studies and achieved the highest accuracy of 96%, surpassing the results of earlier work.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Contemporary Journal of Social Science Review

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
