Design of Efficient K-Means Clustering Algorithm with Improved Initial Centroids

Issue: Vol.5 No.1

Authors:

Afzali Maedeh (Manav Rachna International University, Faridabad)

Kumar Suresh (Manav Rachna International University, Faridabad)

Keywords: Data minnig, Data analysis, Cluster analysis, Clustering algorithms, K-means clustering algorithm, Initial centroids.

Abstract: 

Data mining is a process of analyzing data from different perspectives and summarizing it into useful information. Clustering is one of the existing data mining techniques. It is the process of grouping a set of items into classes of similar objects. A cluster is a group of data elements having similar characteristics within the same cluster and are dissimilar to the objects in other clusters. One of the most common clustering algorithms is k-means algorithm that groups data with similar characteristics or features together. The k-means algorithm is very expensive and because of arbitrary selection of initial centroids, it does not produce unique clustering results for the multiple runs of the same input. Several attempts were made in the literature to improve the efficiency of k-means algorithm. In this paper an efficient method for finding better initial centroids is proposed. The proposed algorithm produces more accurate and unique clustering results.

References:

1. Abdul Nazeer K. A. and Sebastian M. P. , “ Improving the Accuracy and Efficiency of the K-means Clustering Algorithm”, International Conference on Data Mining and Knowledge Engineering (ICDMKE), Vol. 1, London UK, 2009.

2. Chun Shemg Li, “Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters”, International Conference on Advances in Engineering, Vol. 24, pp. 324-328, 2011.

3. Deelers S. and Auwatanamongkol S. , “Enhancing K-means Algorithm with Initial Clusters Centers Derived from Data Partitioning enter Initialization for k-mean Clustering”, Pattern recognition Letters, Vol 25, Issue 11, pp. 1293- 1302, 2004.

4. Erisoglu Murat, Calis Nazif, Sakallioglu, “A New Algorithm for Initial Cluster Centers in K-means Algorithm”, Pattern Recognition Letters, Vol. 32, pp. 1701-1705, 2011.

5. Fahim A. M., Salem A. M. , Torkey F. A. and Ramadan M. A., “An Efficient Enhanced K-means Clustering Algorithm”, Journal of Zhejiang University, Vol. 10, No. 7, pp. 1626-1633, 2006.

6. Fayyad Usama, Shapiro Gregory Piatetsky and Smyth Padhraic, “From Data Mining to Knowledge Discovery in Databases”, AI Magazine, pp. 37-54, 1996.

7. Jiawei Han M. K., “Data Mining (Concepts and techniques)”, Morgan Kufman Publishers, An Imprint of Elsevier, 2006.

8. Khan Shehroz. S., Ahmad Amir, “Cluster Center Initialization Algorithm for K-means Clustering”, Pattern Recognition, Vol. 25, pp. 1293-1302, 2004.

9. Kanungo Tapas, Mount David M. , Netanyahu Nathan S., Piatko Christine D. , Ruth Silverman, and Angela Y. Wu, “An Efficient k-Means Clustering Algorithm: Analysis and Implementation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 7, 2002.

10. Koheri Arai and Ali Ridho Barakbah, “Hierarchical K[1]means: An Algorithm for Centorids Initialization for k[1]means”, Departement of Information Science and Electrical Engineering Politechnique in Surabaya, Faculty of Sicence and Engineering, Saga University, Vol. 36, No. 1, 2007. 38

11. Likas Aristidis, Vlassis Nikos and Verbeek Jakob J, “The global K-means clustering algorithm”, Pattern Recognition 36, pp. 451-461, 2003.

12. Prasad R. N. , Archarya Seema, “Fundamentals of Business Analytics”, First Edition, published by Wiley India, 2011.

13. Saha Sriparna, Bandyopadhyay Sanghamitra, “A Generalized Automic Clustering Algorithm in a Multiobjective Framework”, Applied Soft Computing, vol. 13, pp.89-108, 2013.

14. Yedla Madhu, Srinivasa Rao Pathakota, T. M. Srinivasa, “Enhancing K-means Clustering Algorithm with Improved Initial Center”, International Journal of Computer Science and Information Technologies (IJCSIT), Vol.1, No. 2, pp.121-125, 2010.

15. Youguo Li, Haiyan Wu, “A Clustering Method Based on K-Means Algorithm”, International Conference on Solid State Devices and Materials Science”, pp.1104-1109, 2012.

16. Yuan F., Meng Z. H. , Zhang H. X., Dong C. R., “A New Algorithm to Get the Initial Centroids”, Proceedings of the third International Conference onMachine Learning and Cybernetics, pp. 26-29, 2004.