A New Approach to Discover Periodic Frequent Patterns

Mining Frequent Patterns in transaction database TD has been studied extensively in data mining research. However, most of the existing frequent pattern mining algorithm does not consider the time stamps associated with the transactions. Temporal periodicity of pattern appearance can be regarded as an important criterion for measuring the interestingness of frequent patterns in several applications. In this paper, we extend the existing frequent pattern mining framework to take into account the time stamp as periodicity i.e., the time stamp from the month January to June is as First Period and from July to December as Second Period , and discover frequent patterns for each period. An efficient tree based data structure called periodic-frequent pattern tree that captures the database TD in a highly compact manner and enables a pattern growth mining techniques to generate the complete set of Periodic-Frequent patterns. Example illustrating the proposed approach is given. The characteristics of the algorithm are discussed.


Introduction
A transaction database TD usually consists of a set of time-stamped transactions.Mining Frequent patterns or itemsets from a transaction database is one of the fundamental and essential operations in many data mining applications, such as discovering association rules, strong rules, correlations and many other important discovery tasks.The problem of mining frequent itemsets is formulated as fining all the itemsets that satisfy user specified support threshold.The important criterion for identifying the interestingness of the frequent patterns might be the shape of occurrence.i.e., whether they occur periodically, irregularly or mostly in specific time interval in the database.
In a retail market, among all frequently sold products, the user may be interested only on the periodically sold products.As for the stock market, the set of higher stocks indices that rise periodically may be of special interest to companies and individuals.We define such a frequent pattern that appears in a period/interval in a transaction database as a Periodic -Frequent patterns.In the previous work, most of the existing pattern mining algorithms do not consider the time stamps.In this paper, we extend the traditional frequent pattern mining framework to take into account the time-stamp i.e., in periods.
For example a transaction database TD has 16 transactions of 8 items.Let's focus on two patterns P1P2 and P1P3 without considering time information.P1,P2 and P1,P3 have the same significance in the traditional frequent pattern framework.Since they may have the same frequency of 62.50%.However interesting differences between these two patterns cane is found after when we consider the time information.For simplicity consider one transaction per month.January to June Pattern P1,P2 occurs frequently and July to December pattern P1,P3 occurs frequently every month.
The above observation has shown that frequent patterns discovered by standard frequent pattern mining algorithm are not frequent for entire year.However such patterns are considered to be periodic patterns.The objective of the research presented in this paper is to distinguish such frequent patterns.
The rest of the paper is organized as follows.Section 2 gives the view of the related works.Section3 gives the statement of problem.Section4 presents the frequent pattern generation algorithm.Scetion5 gives the example of the proposed algorithm.Section6 shows the experimental results of the performance of the algorithm.Section7 Concluding remarks are described.

Related Work
Since it was introduced in (R.Agrawal,T.Imielinski and A.N. Swami,1993).The problem of frequent itemset mining has been studied extensively by many researchers.As a result, a large number of algorithms have been developed in order to efficiently solve the problem (R.agrawal, R.Srikant, 1994, J.Han, J.Pel, Y.Yin, 2000).In practice, the number of frequent patterns generated from a dataset can often become excessively large, and most of them are useless or simply redundant.Thus there has been recent interest in discovering a class of new patterns, including maximal frequent itemsets(R.J. Bayardo,1998 The work presented here differs from the related work in some aspects are as follows: Frequent Pattern tree (J.Han, J.Pel, Y.Yin, 2000) is generated for First and Second periods.Second mining of Frequent Pattern from the tree is done parallel for both periods.

Problem statement
The problem of mining association rules was introduced in (R.Agrawal,T.Imielinski and A.N.Swami,1993).There are two steps in association rule mining.First step is to find Frequent itemsets and step is to generate Association rules.We focus on first step i.e., finding Frequent itemsets.Let I = {i 1, i 2, i 3,….i m ) be a set of m items.A k-itemset is an itemset that contains k items.Let TD = {T 1, T 2, T 3,….T n ) be a set of n transactions called a transaction database TD, where each transaction T j (j € {1,2,3,….n}) is a set of items such that T j € I.Each transaction is associated with a unique identifier, called its TID.A transaction Tj contains an itemset X if and only if X C T j .The Support Count of an itemset X is calculated as Sup TD (X)/N, where Sup TD (X) is the number of transactions in TD containing an itemset X and N is the total number transactions in the database.
The objective of periodic frequent pattern mining is to distinguish frequent patterns from different periods, that cannot be discovered through (R.Agrawal,T.Imielinski and A.N.Swami,1993).In this work, an algorithm PFP-tree is proposed, to find the frequent patterns for different periods.More specifically, given a transaction database TD, a minimum Support and periods.i.e., the time-stamps converted into periods.

Proposed Algorithm
Algorithm PFP-tree Input: 1. Transaction Database TD converted with periods 2. min support Output: Periodic-Frequent Pattern tree i.e., PFP-tree 1. Scan the TD once; generate a Frequent (F) of 1-itemsets and their counts.

Generate an ordered frequency list (OL) by filtering out infrequent items(items who do
not pass the minimum support) 3. Sort the list(OL) in frequency descending order as OL1.These ordered lists are used to build header tables.User specified minimum support count = 4, and prune the itemset that does not satisfy the minimum support count specified by the user.In the following table3 itemset P7 and P8 are pruned.2 Construct the FPF-tree using the proposed algorithm for period 1 and 2 in the same tree.

Header Table
Each node is divided into 3 parts.First part contains the item name and second parts contains support count for period 1 and third part contains support count for period 2.so that using the FPF-Growth mining techniques frequent itemset are mined for both the periods.In Period 1 P5 has the P2,P1:1 and P6,P4,P2,P1:1 and for the period 2, P5 has the P6,P4,P3,P2,P1:1 , P4,P3,P1:1 , P6,P4:1 and P3,P1:1.Condition pattern base and condition pattern tree is constructed for P5 and finally frequent pattern for P5 is generated.Like wise it proceeds for the remaining items.

Analysis
This section analyses some of the characteristics of the proposed algorithm.The first characteristic is the time effect.Only twice database are scanned as well as frequent itemset for both periods are mined in parallel.A second characteristic is data structure for storing both the period is efficient.

Conclusion
In this work, frequent itemset is discovered for different periods in parallel and the algorithm for proposed work is presented.The proposed algorithm automatically generates the itemset.Example illustrating the proposed work is given and characteristics of the algorithm are analyzed.

4 .
Create the root of the FPF-tree T with label "Null" 5.For each transaction trans in TD do the following 6.Select and sort frequent items in trans according to OL1. 7. Let the sorted item list in trans be [p/P],where p is the first element and P is the remaining list 8.Call Insert_tree([p/P],T) 9. End for

Table 1 .
Transaction Database TD

Table 3 .
Pruned Frequent 1-Itemset Table(OL1) Intervals are assigned.The timestamp from Jan to June is considered to be period1 and from July to Dec are period 2.