To Construct Search Engine Analyzer for Electrical Enterprises Based on Lucene


  •  Kehe Wu    
  •  Xia He    
  •  Tingshun Li    
  •  Hongyu Tao    

Abstract

<!-- /* Font Definitions */ @font-face {font-family:??; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-alt:SimSun; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 135135232 16 0 262145 0;} @font-face {font-family:"\@??"; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 135135232 16 0 262145 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; text-align:justify; text-justify:inter-ideograph; mso-pagination:none; font-size:10.5pt; mso-bidi-font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:??; mso-font-kerning:1.0pt;} /* Page Definitions */ @page {mso-page-border-surround-header:no; mso-page-border-surround-footer:no;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.Section1 {page:Section1;} -->

There are many professional vocabularies in electrical enterprises, and existing analyzer could not fulfill the application when constructing the search engine for electrical enterprises. In this article, we take the operation system of electrical enterprises as the background, and put forward a sort of word segmentation algorithm based on the implementation of vocabulary in order to design the analyzer of search engine which could be applied in electrical enterprises. The analyzer is completed based on the electrical professional dictionary and could solve many unsatisfactory problems of existing analyzer. At the same time, we adopt the method constructing the word tree, and when loading the vocabulary, first construct a words and expressions tree in the memory, and corresponding word could be segmented only by traversing the tree when segmenting word, which could solve the limitation that one maximum word length must be enacted in usual maximum matching algorithm, and largely enhance the efficiency of word segmentation and avoid meaningless matching algorithm. Finally, we compare the analyzer with two interior analyzers in Lucene, and the result indicated that the analyzer was better than the internal analyzer in Lucene whether for time and the efficiency of word segmentation for the application system of electrical enterprise, which proved that the analyzer could fulfill the requirement to construct the search engine for electrical enterprises.



This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: semiannual

Journal Metrics

WJCI (2022): 0.636

Impact Factor 2022 (by WJCI):  0.419

h-index (January 2024): 43

i10-index (January 2024): 193

h5-index (January 2024): N/A

h5-median(January 2024): N/A

( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )

Contact