A Method of Semantic Hidden Reduction Based on Collocation


  •  Licai Zhu    

Abstract

Semantic hiding is the technology of using semantic knowledge to embed secret information into text carrier. Among the many methods of semantic hiding, "synonym substitution" is paid more and more attention by semantic hiding. The main idea of this method is to hide the secret information by replacing synonyms in text so as to retain its original meaning as much as possible. In order to effectively restore hidden information, we need to find the synonym replacement location as accurately as possible, so it is very important to recognize the collocation of words. So far, however, there is no effective way to identify and match Natural Language Processing, that is, it is very difficult to tell exactly whether or not the words in the text have been replaced.

In this paper, a hidden reduction method based on collocation is proposed. By analyzing the characteristics of synonyms and their collocation, this paper treats their relation as the relation between the pairs of samples in statistical sense. According to the nature of the statistic, we design several decision features to identify the collocations. At the same time, we introduce the form of point mutual information in the information theory as a feature to use the independence of quantifier pairs. In order to recognize word collocation effectively, this paper combines these features, and uses genetic algorithm to get the recognition degree of each feature. Then, a replacement recognition system based on immune abnormality mechanism is designed. Synonyms for collocation are regarded as "normal", while substitutions are regarded as "anomalies"". The experimental samples are generated by semantic hidden software TLEX. To better render the restore process, we rewrote the TLEX to add the key selection module. 

 



This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: quarterly

Journal Metrics

WJCI (2020): 0.439

Impact Factor 2020 (by WJCI): 0.247

Google Scholar Citations (March 2022): 6907

Google-based Impact Factor (2021): 0.68

h-index (December 2021): 37

i10-index (December 2021): 172

(Click Here to Learn More)

Contact