Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming

K. Duraiswamy, V. Valli Mayil


With the rapid increasing popularity of the WWW, Websites are playing a crucial role to convey knowledge to the end users. Every request of Web site or a transaction on the server is stored in a file called server log file.  Providing Web administrator with meaningful information about user access behavior (also called click stream data) has become a necessity to improve the quality of Web information and service performance. As such, the hidden knowledge obtained from mining, web server traffic data and user access patterns ( called Web Usage Mining), could be directly  used for marketing and management of E-business, E-services, E-searching , E-education and so on.

Categorizing visitors or users based on their interaction with a web site is a key problem in web usage mining. The click stream generated by various users often follows distinct patterns, clustering  of  the access pattern will provide the  knowledge,  which may help in recommender system of  finding learning pattern of user  in E-learning system , finding group of visitors  with similar interest , providing  customized content in site manager, categorizing  customers in E-shopping etc.

Given session information, this paper focuses a method to find session similarity by sequence alignment using dynamic programming, and proposes a model such as similarity matrix for representing session similarity measures. The work presented in this paper follows Agglomerative Hierarchical Clustering method to cluster the similarity matrix in order to group similar sessions and the clustering process is depicted in dendrogram diagram.

Full Text:



Computer and Information Science   ISSN 1913-8989 (Print)   ISSN 1913-8997 (Online)
Copyright © Canadian Center of Science and Education

To make sure that you can receive messages from us, please add the '' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.