Journal of Jianghan University(Natural Science Edition) ›› 2013, Vol. 41 ›› Issue (2): 47-52.

Previous Articles     Next Articles

Encoding Scheme Based on Words

CHENG Yuan-bin   

  1. School of Mathematics and Computer Science,Jianghan University,Wuhan 430056,Hubei,China
  • Online:2013-04-12 Published:2014-01-07

Abstract: Language is the main tool of thinking. Words are the basic unit of language. However,character encoding is the present encoding method in computer information processing. With in-depth development of computer information processing,the disadvantages of character encoding increasingly appear. From the basic needs of information processing and the basic characteristics of the words,an unified encoding scheme on comprehensive consideration of word-character,and word-oriented is proposed. The scheme based on the existing coding standard UTF-16,maintains the existing character encoding,adds words coding;words encoding are logical organized with the concept space tree including some semantic information and semantic relationship,adapting to cluster retrieval and language code convert between two languages are the principles of spatial organization. At last,points out several problems which need further study.

Key words: words encoding, UTF-16, cluster retrieval, concept space tree, natural language processing

CLC Number: