Fuzzy Clustering of Students' Data Repository for At-risks Students Identification and Monitoring

In educational data mining, identifying academic courses that contribute significantly to students' class of degree and predicting students' performances can help in the choice and improvement of intervention and support services for students whose performances are poor. Experience shows that graduates with weak class of degree find it difficult to gain employment, hence, the need to identify and group these at-risk students at an early stage of their academic career and then develop a plan to improve their performance. This paper identifies possible academic courses with significant contribution to academic performance and predicts students' graduating class of degree. 11Ants Model Builder provided a means for course rank analysis while MATLAB was the system development tool. Fuzzy c-Means (FCM) algorithm was used to partition students into weak, average and good clusters. Four (4) natural clusters of at-risk students were automatically identified with k-means algorithm. Results show that Sugeno-type inference system is best suitable for the provision of initial parameters for Adaptive Neuro Fuzzy Inference System (ANFIS) training of students' dataset. The results also prove the effectiveness of the combination of FCM, k-means and ANFIS in the classification of students based on academic performance and at-risk levels. The results will help educational managers monitor groups of students at the same level of performance, and those at the boundary of two classes of degree for the provision of informed counseling and intervention plans, to improve academic performance.


Introduction
The increasing demand for information from educational planners has led to the existence of huge educational data repositories for use in the extraction of vital information.Chandra and Nandhini (2010) describes students' result repository as a large data bank of students' raw scores and grades in different courses enrolled for during their years of attendance in an institution.The data, which can be personal or academic, can be used to understand students' behaviour, assist instructors, improve teaching, evaluate and improve e-learning systems and many other benefits (Romero & Ventura, 2007).As huge amount of data is being collected and stored in databases, traditional statistical and database management techniques are no longer adequate for analyzing them (Kumar & Chadha, 2011).Although, statistical approaches quantify the inherent uncertainty that results when one tries to infer general patterns from a particular sample or an overall population, they lack the ability to handle large, complex, noisy and vague dataset.Furthermore, these approaches perform limited search during pattern extraction from databases, thereby producing incomplete and unreliable results, which cannot support effective decision-making (Fayyad, Piatetsky-Sapiro, & Smyth, 1996;Miller & Han, 2001).This gives rise to improved techniques and tools for automatic and intelligent analysis of huge data sets.One of such techniques is Data Mining (DM), which has attracted attention from researchers in many different fields such as database design, statistics, pattern recognition, machine learning, data visualization, biology, web applications, electronic commerce and so on (Piatetsky-Shapiro, 2007).
Cluster analysis is an unsupervised learning technique of DM, which, through the examination of relationships existing between data, describes and groups them into classes or clusters (Yang, 1993;Rao & Vidyavathi, 2010;Shovon & Haque, 2012).It is a process of grouping data objects into disjointed clusters so that the data in the same cluster are similar while data belonging to different clusters are different (Kumari, Sharma, & Gaur, 2012).
Many clustering algorithms have been introduced, all based on the principle of maximizing the similarity between objects in the same cluster and maximizing the dissimilarity between cases of different clusters (Han, Kamber, & Tung, 2001).Baboo and Priya (2013) identify two broad categories of clustering algorithms; Hard (k-means) clustering partitions the data into a specified number of mutually exclusive subsets.Hard clustering, forces data points, in any sample space, that have attributes of more than one cluster to a particular cluster.Fuzzy clustering, however, allow objects to belong to more than one cluster simultaneously, with different degrees of membership.Objects on the boundaries between several classes are not forced to fully belong to only one of the classes, but rather are assigned membership degrees in the range [0, 1] indicating their partial membership in each classes (Kumar, Verma, & Sharma, 2010;Baboo & Priya, 2013;Zuviria & Deepa, 2013).Fuzzy C-Means (FCM), Possibilistic C-Means (PCM), Fuzzy Possibilistic C-Means (FPCM) and Possibilistic Fuzzy C-Means (PFCM) are some of the fuzzy clustering techniques.Clustering Analysis is a method of data description which provides a means of data analysis in various fields like machine learning, data mining, pattern recognition, image analysis and bio-informatics (Ghosh & Dubey, 2013).The FCM clustering is best model for students' academic performance modeling and serves as a good benchmark to monitor the progression of students in educational domain (Yadav & Singh, 2012).
Educational Data Mining, concerns with the extraction of useful, previously unknown patterns from educational database for better understanding, improved educational performance and assessment of students' learning process (Chan, Chow, & Cheung, 2008;Romero, Ventura, & Garcia, 2008;Ngor, 2007;Castro, Nebot, & Mugica, 2007).Students' academic performance monitoring plays a very vital role in higher institutions of learning.Grade Point Average (GPA) is a common factor used by the academic planners to evaluate and monitor the progression of students (Sansgiry, Bhosle, & Sail, 2006;Oyelade, Oladipupo, & Obagbuwa, 2010;Yadav & Singh, 2012;Shovon & Haque, 2012).The academic performance of students during their first year at university is a turning point in the path of their educational performance and usually influences their Cumulative Grade Point Average (CGPA) significantly, which in turn affects their class of degree (Shovon & Haque, 2012).Based on this critical issue, grouping students into different clusters according to their performance has become a useful but complicated task (Oyelade et al., 2010).Yadav and Singh (2012) states that intelligence-level wise grouping is essential for maintaining the homogeneity of the group otherwise it would be difficult to provide good educational services to student population with highly diverse characteristics.
The main goal of fuzzy clustering analysis is to partition students into homogeneous groups according to their characteristics and abilities (Kifaya, 2009).Oyelade et al. (2010) demonstrated the combination of k-means algorithm and deterministic model provided in (Omolehin, Oyelade, Ojeniyi, & Rauf, 2005) in the prediction of students' academic performance.The system provided clusters of students and the overall performance of each cluster by cluster size.However, the system proposed in (Oyelade et al., 2010) lacked the facility to handle uncertainty characterizing academic performances.Yadav and Singh (2012) proposed a rule based Fuzzy Expert system for students' academic performance evaluation based on FCM Clustering algorithm, Fuzzy Logic (FL) and Regression analysis model.FL models lack self-learning and adaptive capabilities (Muzzammil, 2010) while Regression Models are effective where there are no multicollinearity among the predictor variables (Petraitis, Dunham, & Niewiarowski, 1996;Giovanis, 2010).On the other hand, Adaptive Neuro-Fuzzy Inference System (ANFIS) provides an intelligent way of reasoning and prediction, and performs better than regression models and other conventional statistical techniques (Aali, 2009;Muzzammil, 2010;Bisht & Jangid, 2011, Mahdavi & Khademi 2012;Giovanis, 2010).ANFIS is the widely used model in the studies of classification, estimation and prediction (Yayar, Hekim, Yilmaz, & Bakirci, 2011).This paper proposes a methodology based on the hybrid of FCM, k-means algorithms and ANFIS for the prediction of students' performances and classification of students based on performance and at-risk levels.

Overview of Fuzzy c-Means (FCM) Algorithm
The FCM algorithm is an iterative algorithm that generalizes the hard c-means algorithm to allow any point partially belong to multiple clusters (Kumar et al., 2010).The aim of FCM is to find clusters centers that minimize a dissimilarity function and then partition a finite collection of elements, X={x 1 , x 2 ,x 3,… x n }, into a collection of fuzzy clusters, C={c 1 ,c 2 , …, c p } with respect to some given criterion (Ekong, Onibere & Imianvan, 2011).The algorithm is implemented in the following steps (Inyang & Akinyokun, 2006;Kumar et al., 2010;Imianvan & Obi, 2011;Zuviria & Deepa, 2013).

i.
Set m,c, and , and with the following constraints: where,

U -partition matrix
u ij -degree of membership of x i in the cluster j

Dataset Selection, Description and Preprocessing
The students' dataset used for the training and analysis, consists of six sets of Bachelor of Science, Computer Science graduands.The input variables of interest are performances of students in first year courses while the target variable is students graduating class of degree.Each student's performance in a course, measured by their score in the course, is an aggregation of Continuous Assessment (CA) score and the examination scores.CA score constitutes 30% of the total score while examination is 70%.The various grades associated with scores are described in Equation 5.Each grade is assigned a numerical value or point; grade 'F' attracts 0 point, 'E' is weighted 1, 'C' attracts 3 points while 'A' grade is weighted 5 points.The product of the point and Course Unit (Credit Hours) yields the Quality Point (QP) earned by a student in the course.The academic standing of any student is based on GPA, determined by dividing the total QP earned by a student in a semester by the total credit hours of courses registered for during the semester.The class of degree of any student depends on the graduating CGPA, which is in the range [0.00, 5.00].
Experience shows that students who perform below second-class lower division are regarded to have performed poorly; those with second-class lower division are considered as average students, while students with second class upper division and first class are rated as very intelligent.In this work, three clusters of students, Weak (Third Class and Pass Degrees), Average (Second Class Lower Division) and Good (Second Class Upper and First Class Honours) were considered.A summary of the clusters and their respective CGPA is presented in Table 1. is labeled Good class of degree.Students that failed to complete their studies on account of voluntary withdrawal and those who had missing result(s) in any of the courses were excluded from the dataset.The research relied on 496 complete records of Bachelor of Science (B.Sc.), Computer Science students admitted from 1999 to 2005 in University of Uyo, Nigeria.The sequence of steps adopted in the clustering of students' data repository is presented in Figure 1.

Input Variables Rank Analysis
An attribute importance analysis is necessary to rank attributes based on their contribution to the target values and to reduce the size of input variables for prediction.It will also help to increase speed and accuracy of models in the prediction task (Paris, Affendey, & Mustapha, 2010;Inyang, Njungbwen, & Inyang, 2009;Saltelli, 2002).In the domain of educational data mining, identifying dominant courses can help improve the intervention and support services, at an early stage, for students who perform poorly in their studies.It will also help the instructors to concentrate in the teaching of courses with high rate of failure and which have significant contribution to academic performance (Chamillard, 2006).11Ants offers a straightforward and effective means of estimating and testing a very large number of models and ensembles within the convenient and familiar framework of MS-Excel.11Ants also has strengths in terms of wider range of algorithms, easy data preparation tools and supports very large dataset (Inyang, 2011).11Ants Model Builder was used to determine each course's influence on the class of degree.A rank of courses based on their importance is presented in Table 2.

Fuzzy c-Means System and Results
This paper considers a method by which fuzzy membership functions are created for clusters of students based on GPA derived from the significant first year courses identified in Section 3.2 and the vector of degree of membership of students in weak students cluster.The system was developed in Matrix Laboratory (Matlab) with Ms Excel as the backend engine.The major components of the system are Knowledge Base (KB) and Inference Engine.The KB contains students' raw scores, grades and GPAs.It also contains fuzzy rules, linguistics values and fuzzy membership values.Two clustering algorithms; FCM and k-means algorithms, and Adaptive Neuro-Fuzzy Inference System (ANFIS) drive the Inference Engine.The FCM parameters were set as follows; m=2, ε=0.01, and k=200.The arithmetic mean of all the data points was the initial cluster center.The performances of the FCM objective function while partitioning the training data set into three clusters is depicted in Figure 2. Figure 3 shows that, 126 students have highest membership degrees in weak students' cluster, 170 students are in Average students cluster while the Good students cluster has 150 students.The large symbols represent the final cluster centre obtained from the training.A summary of the cluster centres and numbers of students is depicted in Table 3. Fuzzy Inference Systems (FISs) structures were generated for ANFIS.Sugeno-type (fismat1) and Mamdani-type (fismat2) FISs were built by extracting rules that model the students dataset behaviour using membership functions for rules' antecedent and consequent parts.The third FIS (fismat3), a Sugeno-type was generated using subtractive clustering in determining the number of rules and antecedent membership functions; and linear least squares estimation method for determining each rule's consequent.The summary of the performances of these FISs on the dataset is presented in Table 4.The performance metric used was the Root Mean Squared Error (RMSE) for training (trnRMSE) and testing (chkRMSE) of the dataset as in (Hossain & Ahmad, 2012).The results show that Sugeno-type FIS has lower RMSE for both training and testing sessions, therefore better than Mamdani FIS.However, trnRMSE for Fismat1 is slighter higher than that of Fismat3, though with a lower chkRMSE of 0.2842.Fismat3 was chosen for the provision of initial parameters for training ANFIS.The properties of the ANFIS Model are presented in Table 5.The ANFIS training errors depicted in Figure 4 shows that the error settles at the100th epoch as the ANFIS attempts to minimize the error.

Model Validation and Evaluation
The system was validated and evaluated using the test dataset.Figure 5 shows the plot of test data and predicted values while Figure 6 is the graphical representation of testing error.The circles in Figure 5 represent the predicted output while the lines represent the test dataset.The graph depicted in Figure 6 shows that the smallest value of the testing data error occurs at the 58th epoch, after which it increases slightly with an average step size value of 0.00042 even as ANFIS continues to minimize the error against the training data to the 200th epoch.The RMSE of the model on the test data is 0.2819, this shows that the system's performance is satisfactory and suitable for the prediction and clustering of students based on performance level.The results of FCM clustering of the Test dataset are presented in Table 6 and Figure  7.The results presented in Table 6 and Figure 7 show that student number 1 belongs to the weak students cluster with a degree of membership 0.79 while student number 2 has 90% likelihood of graduating with second class lower degree(Average students cluster), 2% and 8% likelihood of weak and good students clusters respectively.In addition, students ' number 4, 5, 7, 14, 19, 27, 42, 48 and 49 are at risk of attrition with weak students' membership value greater than 90%.These students may be advised to withdraw from the programme.Students number 17, 22, 34, 36, 43, 45 and 47 have competing degree of membership in two clusters.A plan for these students should be provided to either move them from a lower cluster to a better one or sustain them in the better cluster.For example student number 34 has a membership value of 0.46 in weak students cluster and 0.47 in average students cluster, therefore should be monitored for improvement required to sustain the student in the average cluster.Furthermore, student number 37 depicts 0.25, 0.53 and 0.24 probability of having weak class of degree, average class of degree and good class of degree respectively.In this case, monitoring is required to minimize poor performance either to sustain him/her in the average students cluster or improvement to good students cluster.

Clustering of At-Risks Students
The students' degree of membership in the weak students' cluster depicted in Table 6, was the vector used in determining the natural number of clusters, and in the actual clustering of at-risk students via the k-means algorithm.The Silhouette plot presented in Figure 8 shows that the optimal number of clusters for the dataset is four (4).As shown in Figure 8, most points in all the clusters have a large silhouette value, greater than 0.61, indicating that the clusters are well separated from each other.In addition, the mean silhouette value of 0.70 also proves that the number of clusters is optimal.The result of the classification of at-risk students into clusters with their silhouette values is presented in Table 7.As shown in Table 7, out of the 50 students in the test dataset, 23 students are in cluster 1 with no risks of performing poorly or having a weak class of degree.These students have very high chances of graduating with a minimum of second class lower; therefore monitoring of this category of students is required to sustain them in their respective clusters.The second cluster comprises student whose degree of membership in the weak students cluster is in the range [0.23, 0.51].These students are those who have competing degree of membership in Weak and Average students clusters, hence require intervention plans to reduce their chances of belonging to the weak students cluster.The linguistic value indicating at-risks level of this cluster is slightly risky.In cluster 3, students have high probability of poor performance and are likely to spend more than the stipulated minimum duration of the programme; these students should be closely monitored for possible improvement to reduce their chances of retrogressing into the fourth cluster.The at-risk level of cluster 4 is Very Risky, with likelihood of having a pass class of degree above 88%.This category of students is at risk of attrition and may be advised to change or withdraw from the programme.A similar approach described in (Ekong, Inyang, & Onibere, 2012) guided the choice of triangular membership function.The fuzzy membership function for the at-risks students clusters, is presented in Equation 6.

Conclusion
Predicting students' performance is useful in identifying students who are likely to perform poorly in their studies.Fuzzy clustering technique has been used to perform important analysis in the educational environment for decisions to enhance educational standards.Significant first year courses based on their percentage contribution to graduating class of degree were identified and used for the experiment.The proposed system provides a grouping of students based on their level of performance and discovers the degree of membership of students in each cluster.Four (4) natural clusters of at-risk students were automatically identified with k-means algorithm.Results show that Sugeno-type inference system is best suitable for the provision of initial parameters for ANFIS training of students' dataset.In addition, FCM and ANFIS models provide an intelligent way of predicting and classifying students' academic performance.The results also prove the effectiveness of the combination of FCM and k-means algorithm and ANFIS in the classification of students based on academic performance and at-risk levels.As further work, the fuzzy membership functions and ANFIS rules could be used in the knowledge base of expert systems in the domain of students performance estimation and at-risk level prediction.These applications will help instructors, educational managers and students to improve the quality of performances by discovering at-risk students, at an early stage of their academic career and then develop a plan for minimizing poor performance and maximizing opportunities for excellent performance.

Figure 1 .
Figure 1.Methodology for fuzzy clustering of students data repository

Figure 2 .Figure 3 .
Figure 2. Graph of objective function values for students' dataset clustering

Figure 4 .
Figure 4. Graph of training error of FCM clustering system

Figure 5 .
Figure 5. Graph of test dataset output and model output

Figure 6 .
Figure 6.Graph of model testing error

Figure 7 .
Figure 7. Graph of FCM Degree of Membership of Students in each cluster

Figure 8 .
Figure 8. Silhouette plot for at-risk students clusters

Table 1 .
Clusters and clustering criteria of students

Table 2 .
Course importance value and rankThe results depicted in Table2show that core departmental courses like MTH 121 (General Mathematics II), MTH 111 (General Mathematics I), CSC 111 (Introduction to Computer Science) and CSC 121 (Introduction to Computer Programming) have strong correlation with class of degree than university wide courses like GST 111 (Use of English II) and GST 112 (Nigerian People and Culture).GST 122 (Introduction to Philosophy and Logic), a university wide course, also proved significant.Courses that have importance value greater than 45% has a cumulative effect of 79% on the class of degree, hence are significant.Courses with weight less than 0.46 were noisy and insignificant; therefore pruned from the dataset.The dataset was randomly split into two; 446 records as the training dataset and 50 as the test dataset.Students' GPA derived from the identified significant first year courses were used for training and clustering of students.In this work, at-risk students are those students who perform poorly and are likely to graduate with a weak class of degree.Each student's at-risk level is determined by his/her degree of membership in the weak students cluster.

Table 3 .
Description of performance level clusters of students

Table 5 .
Description of training parameters for ANFIS

Table 6 .
FCM degree of membership of students in each cluster

Table 7 .
At-Risk level clustering of students * Degree of Membership of Students in the Weak Students Cluster.