Research

Feature selection for big data

machine learning

data mining

knowledge discovery

expert systems

big data

High-dimensional datasets frequently occur in social media, text processing, image recognition, and bioinformatics. These datasets can contain tens of thousands of features while having only hundreds (or usually less than a hundred) samples available. High dimensionality can negatively impact the performance of a classifier by increasing the risk of overfitting and prolonging the computational time. Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is essential for knowledge discovery in many areas.

We work on proposal of new feature selection methods and evaluate existing methods from different point of views such as stability, ability to correctly identify significant features and influence on classification accuracy.

Feel free to try out our Weighted nearest neighbors method for feature selection.

Related publications:

2020

P. Drotár, P. Bugata

On some aspects of minimum redundancy maximum relevance feature selection

Journal: Science China Information Sciences. Roč. 63, č. 1 (2020), s. 1-15 [print].

2019

P. Drotár, M. Gazda, L. Vokorokos

Ensemble feature selection using election methods and ranker clustering

Journal: Information Sciences

P. Bugata, P. Drotár

Weighted nearest neighbors feature selection

Journal: Knowledge-Based Systems. č. 163 (2019), s. 749-761 [print].

2017

P. Drotár, M. Gazda, J. Gazda

Heterogeneous ensemble feature selection based on weighted Borda count

Journal: ICITEE 2017.

2015

P. Drotár, J. Gazda, Z. Smékal

An Experimental Comparison of Feature Selection Methods on Two-Class Biomedical Datasets