Study on the Data Mining Algorithm Based on Positive and Negative Association Rules

In this article, we systematically, deeply and comprehensively analyzed and studied the association rule data mining technology, and induced, analyzed and researched the typical mining algorithms of association rule and their basic principles, and objectively compare the differences among various algorithms. We used to correlation to measure the relations among item sets, and gave the computations of support level and confidence level of negative association rule based on traditional association rules, and analyzed and researched the operation principle and implementation approaches of this algorithm. Through the demonstration test of the algorithm, the results indicated that the algorithm was effective.


Introduction
Traditional database application emphasizes the online transaction processing (OLTP), and its main task is to exert the online transactions and inquiry processing.The quick increasing speed of the commercial database which intention is OLTP requires that the data mining technology which could offer the support information for the decision-making develops quickly, i.e. the online analysis processing (OLAP) could acquire and utilize information from the database.
At present, the domestic researches about data mining mainly concentrate in the optimization and improvement of the algorithm.Based on former research results, we study the association rule, i.e. the negative association rule, from another view, and make it combine with traditional association rule to form the positive and negative association rule, which will make the theory of association rule more perfect.

Basic concept of association rule
The association rule is one of very important rules which the data mining technology can apply, and it is used in the interest association among item sets of large amount of data, for example, the association rule mining can be used to find the association among different commodities (items) in the transaction database.Suppose I={i1 i2,…, im} is the set of item, and suppose the data D relative to the task is the set of the transaction in the database, and each transaction T is the set of item, and T ⊆ I.Each transaction has one identifier which is called as TID.
Suppose X is the set of item in I, and it is called as the item set, and the transaction T contains X when and only when X ⊆ T. The association rule is the formula like X ⇒ Y, and X ⊆ I, Y ⊆ I and X I Y=Φ.The rule X ⇒ Y comes into existence in the transaction D, and it has the support level, s (support), and when and only when the proportion that D contains the transactions of XUY is s, i.e. s=support(X ⇒ Y) = P(XUY) = |{T|X U Y ⊆ ∈T ∧ T∈D }|/|D| The rule X ⇒ Y comes into existence in the transaction D, and it has the confidence level, c (confidence), and when and only when the proportion that D contains the transactions of X is c, i.e.

C=confidence(X
The set of item is called as the item set.The item set contains k items is called as k-item set, for example, {printer, computer} is a 2-item set.The occurrence frequency of the item set is the amount of transaction containing the item set, which is called as the frequency of the item set, the support count or the count.The item set fulfills the minimum support, min_sup, when and only when the occurrence frequency of the item set exceeds or equals the product of min_sup and the amount of transactions in D. the item set fulfilling min_sup is called as the frequent item set.The frequent item set containing k items is called as the frequent k-item set which is generally noted as L k .The association rules which fulfill the min_sup and min_conf (minimum confidence level) synchronously are called as the strong rules.

Mining algorithm of positive and negative association rules
Traditional association rule (AR) has the form of A ⇒ B, and it is used to mine the association relation among the item sets in the database of consumer transaction, and it was first proposed by R. Agrawal et al in 1993, and he proposed a sort of quick algorithm in 1994.As one important complement of the association rule of A ⇒ B, the association rules with three forms such as A ⇒ B, A ┑ ┑ ⇒ B, A ┑ ⇒ B are studied in this article, and they are called as the negative ┑ AR (NAR).We will give a sort of simple and effective method which was used to compute the support level and the confidence level of NAR only by relative information of positive association rule, and an algorithm which could synchronously mine positive and negative association rules.The difference with existing algorithms is that the algorithm in this article can not only synchronously mine the positive and negative association rules in the frequent item set, but test and delete the inconsistent rules.

Computations of support level and confidence level in negative association rule
The confidence level (c) of the rule A ⇒ B in the transaction database D means the ratio of the amount of transaction containing A and B and the amount of transaction containing A, i.e. c(A ⇒ B).The negative association rule contains non-existing-items such as ┑A and ┑B, and because it is difficult to directly compute their support level and confidence level, so we give following theorems and computation methods.
Theorem 1 Suppose A, B ⊂ I, A∩B=Φ, so we have To prove this theorem, we need to re-denote the support level and the confidence level from the view of set, i.e. changing the set operation of item set into the set operation of transaction set, which can better apply some theorems and characters and be easy to be understood.
Suppose As denotes the set of transactions containing the item set A, and its base |As| is the amount of transaction in As.
In the same way, suppose Bs denotes the set of transactions containing the item set B, and its base |Bs| is the amount of transaction in Bs.The database D is the set of all transactions in the database, i.e. the total set which is denoted by D, its base |D| is the amount of all transactions, so the corresponding conversions are According to Theorem 1 and the definition of the confidence level, we can easily prove Deduction 1 which can be used to compute the confidence level of the negative association rule.

Algorithm of mining positive and negative association rules
In the algorithm, suppose that the frequent item set has been solved and stored in the set L.
To validate the validity of Algorithm 1, we do a test in the synthetic data, and the test is performed under the environment of Celeron 2.5, 256RAM, WIN2000, VC++.There are experimental data containing 200 transactions, and the maximum item set number is 5. Suppose that min_supp is 0.20, min_conf is 0.40, and Table 1 lists the comparison of experiment results of two algorithms.
From Table 1, the positive association rule number obtained by Algorithm 1 is significantly less than the positive association rule number obtained by traditional Apriori algorithm, which indicates some inconsistent rules have been deleted, and many negative association rules have been mined, which indicates Algorithm 1 is effective.

Research of P-S interest in positive and negative association rules
Only the rule of A ⇒ B accords with the condition of supp(AUB)-supp(A)supp(B) ≥ miniuterest>0, it is interesting, but for negative association rule, supp(AUB)-supp(A)supp(B) may be less than 0, so we use its absolute value as the condition, i. ( Theorem 2 indicates that only if the mini-interest is reasonably set up, some rules without interest can be avoided effectively, and four sorts of association rule can be restricted by one minimum interest P. When studying the positive and negative association rules at the same time, the problem of conf( A ┑ =>B)>conf(A=>B)>min_conf may occur, so the application of correlation is the effective method to solve this problem.
The correlation of association rule can be measured by supp(AUB)/(supp(A)supp(B), where s(A) ≠ O,s(B) ≠ 0. In fact, if we improve the P-S interest little, it can be used in the correlation judgment of association rules, i.e. using corr(A,B)=supp(AUB)-supp(A)supp(B) to measure the correlation.
There are three possible instances for corr(A,B).
(1) If corr(A,B)>0, so A and B are positively correlated, i.e. transaction A occurs more, transaction B occurs more too.
(2) If corr(A,B)=0, so A and B are independent each other, the occurrence of transaction B is independent of transaction A.
Theorem 3 indicates that the rule of A =>B (or A=> B) and A=> B(or A=>B) ┑ ┑ ┑ ┑ can not be the effective rules simultaneously, so the dissociable rules will be effectively prevented.