Research
High-dimensional datasets frequently occur in social media, text processing, image recognition, and bioinformatics. These datasets can contain tens of thousands of features while having only hundreds (or usually less than a hundred) samples available. High dimensionality can negatively impact the performance of a classifier by increasing the risk of overfitting and prolonging the computational time. Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is essential for knowledge discovery in many areas.
We work on proposal of new feature selection methods and evaluate existing methods from different point of views such as stability, ability to correctly identify significant features and influence on classification accuracy.
Feel free to try out our Weighted nearest neighbors method for feature selection.