Comparative study of clustering algorithms on textual databases
Аннотация:Revision with unchanged content. The large collection of Brazilian researcher`s curricula vitae held on the Lattes Platform provides a formidable base for the discovery of information about people`s skills, abilities and knowledge which is referred to as competencies. Results from the analysis of competencies can be applied in human resources, project management and the planning of on-the-job training in terms of how to enforce collaboration, how to build or modify teams and how to direct resources. The study builds on Knowledge Discovery in Textual Database (KDT) in order to analyze each step from data selection, term extraction and weighting to clustering and the interpretation toward knowledge management. For the division of an input dataset into a priori unknown competency-based groups two clustering algorithms have been implemented, namely the k-means algorithm and Kohonen Self-Organizing Maps (Kohonen- SOM). Two illustration techniques for competency-based groups as well as decision rules for the application of both algorithms will be presented.