Abstract
The Internet's widespread design makes it easy for malware to propagate yet protecting against it is challenging. To combat this issue, machine learning-based malware detection models may improve, however, their detection rates vary depending on the malware they find and how they classify it. In addition, the effectiveness of various machine learning algorithms for malware detection might vary depending on the adequacy of their classifiers, even when using an appropriate training dataset. A method for identifying malicious software is suggested in this research. This method combines an isolation forest with a machine learning methodology to identify dangerous software and harmless files. This study also suggests voting methods for major decisions. By using many decision trees, the isolation forest searches for outliers in the data. In other words, it doesn't need labeled data for model training. This strategy selects a parameter at random and then splits the information across extremes. The decision tree will then go through this procedure until either all possible divides in the data have been created, or a maximum number of splits has been reached. Anomalies and outliers may be more easily isolated and sorted out of the data if they are spotted early on. The KISA CISC2017 dataset is used to conduct tests on the suggested methodology. In an experiment using 96,724 out-of-the-ordinary samples, was