The Past, Present and Future of Machine Translation in China: A Visualization Study Based on CNKI Literature (1959 − 2021)

Machine translation, a product of technical growth, is advancing the field of translation. With the aid of visualization analysis techniques, this study examines the machine translation literature published in the CNKI database from 1959 to 2021 while analyzing its history, key issues, arguments, and future directions of related research. The proposal is then made that, in light of the tendency toward technological advancement, the fields of translation research and education should take on the disciplinary duties associated with the motivation for “collaborations between humans and machine and use of machines for humans.” The study also emphasizes how important it is to improve students’ technological competence, support the localization of language services, and integrate information technology into language and translation instruction. text sentiment, and tendency) to examine the translation quality. This study discussed three aspects of assessing translation quality: linguistic knowledge as a parameter, computer technology as an aid, and translation fidelity as a criterion; various instruments are discussed in exploring assessment methods: mutual information methods to calculate relevance (with formulas and cases), word frequency statistics, Stanford Parser tool (for syntactic analysis), cosine similarity and implicit Dirichlet distribution models to analyze text similarity and keyword matching at the semantic level. Similar studies are also available, such as Han Na (2018), which selected general texts, tourism texts, science and technology texts as the test data and compared Google and Youdao translation systems with human translation to analyze the differences at the lexical, syntactic, and discourse levels. It also discussed METEOR evaluation criteria based on machine translation, stating that error correction, which is done by humans, provides a corpus for training machines, which helps to optimize the technology of machine translation. machine-translated relative clauses and human-translated relative clauses. According to this study, features in forms are strongly correlated with the quality of translation.


Research Background
With the progress of high technology, machine translation has experienced a long history and quick growth. The following phases have been reached in its development, according to Lulu Gao and Wen Zhao (2020): (1) the period of initial development (1949−1961): In 1949, Warren Weaver proposed the use of computers for translation, arguing that in order to avoid "word-for-word" translation, it was essential to consider how context affects lexical meaning, linguistic logic and reasoning, the translation and decoding process, as well as the universal characteristics of the language (Poibeau, 2017). (2) The period of sluggish development (1960−1967) saw the establishment of a number of machine translation models based on syntactic analysis. In 1958, Bar-Hillel drew attention to the semantic barrier that automatic machine translation systems faced. He proposed that they be overcome through machine-assisted translation to give the translator the aid they need during the pre-and post-translation processes (Bar-Hillel, 1958. The Automatic Language Processing Advisory Committee (ALPAC) was founded around this time to evaluate the financing for the advancement of machine translation. (3) the period of slow development (1967−1990): Although machine translation as a whole developed slowly, there was some significant potential in some nations and regions, including Canada, France, and the European Union. For instance, the European Union collaborated with Systran, one of the first machine translation companies founded by Peter Toma in 1968, to realize automatic machine translation between EU member languages. (4) The period of new development (1990−): The International Conference on Computational Linguistics in Finland in 1990 launched the statistical machine translation era based on massively parallel corpora, which was followed by the emergence of neural machine translation based on "deep learning" (Hinton et al., 2012).
In recent years, continuous breakthroughs in machine translation technology have been making headlines. For example, on September 27, 2016, "Google Brain Group" announced the launch of a new neural machine translation system GNMT (Google Neural Machine Translation system) (Note 1), which can significantly reduce the translation error rate between key language pairs done by the more classic machine translation models significantly down to 55%−85%; on December 22, 2016, the research team rode the wave of success to relaunch the Zero-shot multilingual neural machine translation system, which enables translation of language pairs and phrases that have not been previously trained for direct translation (Note 2). As research on big data, deep learning, and neural network machine translation continues to gain momentum, machine translation is bound to ijel.ccsenet. replace hu job marke translation study com history, fo 2021.   1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 1984 1959 1959 1959 1959 1959 2017 1959 1959 1959 1959 1959 1982 1997 2017 2009 1982 2000 e translation"   nternational Jou iterature in 19 fferentiation f quality", etc.) ooperation bet "human transla pment of techn characteristics telligence," "p o reflects that th on," "post-tran machine transla uration of citat y terms like " " and "post-tra g," and "neur anslation" refe anslation mod Learning," pro results and ef network" is to e the best resu w the human b ns.

Researc
the papers wit oreign languag top 20 author been cited mo urnal of English 103 959, some trad features," "enh ) have contin tween machine ation" and "po nology and th of the times h post-translation he keywords w nslation editing ation," "compu tion bursts, in "neural machi anslation editi al network" fu fers to the cor dels by word oposed by Hin fficient perform enable the ma ults in machin brain learns an th the title con ge and literatur s with the high ore frequently Linguistics ditional keywo hanced learni nued to be ho e translation a ost-translation he deepening have emerged n editing, "s with more prom g," "statistical utational lingu 2021, the rese ine translation ng," among o fully reflect th rpus-based sta alignment, p nton et al. in mance in proc achine to have ne translation. nd processes in ntaining "mach re in CNKI hest frequency in recent yea ords (such as ng of multilin ot research co and human tra editing" arise of technical d in recent yea statistical mac minent citation l machine tran uistics," "atten earch in this fie n," "neural net others.

Review of Representative Studies
After outlining the overall trend of machine translation, the following review will be conducted according to the categories based on the perspectives of literature reviews on machine translation, studies on text-based features of translation, comparative studies on human translation and machine translation, interdisciplinary studies on machine translation and linguistics, and challenges and applications of machine translation.

Literature Reviews on Machine Translation
The studies reviewing the research on machine translation include Zhiwei Feng (2018), Hanji Li (2018), Xiangling Wang (2019), Lulu Gao and Wen Zhao (2020). Zhiwei Feng (2018) demonstrated the close connection between machine translation and human translation. At the same time, analyzing the prevailing situation, the study concluded that although machine translation predates artificial intelligence, both are immature and still in the development stage. Hanji Li (2018) and Xiangling Wang (2019) provided an overview of the relevant research trends with the visualization tool of CiteSpace. Hanji Li (2018) discussed the papers related to "machine translation" published in CSSCI and Chinese core journals in CNKI during the period of 2007−2016, reported the hot areas and keywords of machine translation research in China and proposed that the linguistic community and the computer science community should cooperate to improve machine translation.
Reviewing 12 core foreign language journals included in CNKI, Xiangling Wang (2019) compared the history of machine translation research explored by domestic translation scholars in the past 60 years, and agreed with the operation model of "machine translation + machine-assisted tools + post-translation editing," pointing out that the cooperation and exchange among scholars, as well as those between scholars and industry, should be further strengthened. It also suggested that it should be crucial to integrate the training of translation technology into traditional translation curricula. Lulu Gao and Wen Zhao (2020) outlined the technical characteristics, development trends, advantages and limitations of machine translation at various stages, and the opportunities and challenges facing its development. At the same time, the study pointed out that given the statistics related to machine translation, for literary and philosophical texts with less corpus, a combination of machine translation and human translation can be chosen.
Regarding corpus construction, Zhiwei Feng (2020) proposed that parallel corpora are the basis of the methods for statistical machine translation as well as neural machine translation. He emphasized the importance of large-scale and authentic corpus and pointed out that the lack of linguistic data resources is a challenge for neural machine translation at present. For corpus-based machine translation, Jie Zhu (2019) compared the results of two approaches, rule-based and corpus-based, and concluded that a rule-based approach is a rationalist approach with rigid grammatical rules, low quality and high labor cost; a corpus-based approach is an empiricist approach with high accuracy and easy understanding. ijel.ccsenet.org International Journal of English Linguistics Vol. 12, No. 6; As to the research on statistical machine translation and neural machine translation, Yunlu He et al. (2019) proposed the principles of rule-based machine translation, statistical machine translation and neural machine translation, and analyzed the advantages and disadvantages of machine translation. Jinsong Su et al. (2020) investigated machine translation from discourse and introduced the existing research on neural machine translation from three aspects: context modeling (single encoder, multiple encoder, and multiple decoders), model training, and model analysis. They finally analyzed the main difficulties currently faced by neural machine translation and claimed that future work should focus more on building massively parallel data, designing more reasonable automatic evaluation indexes for translations, and conducting research on simplifying the neural machine translation models of discourse.
Regarding research-based teaching, Ying Qin (2018) discussed the quality assessment of machine translation based on neural networks and the impact on translation teaching. She proposed that in the teaching of translation, the division of labor between human and machine should be clarified, and traditional teaching methods should be improved with the introduction of post-translation editing courses.

Studies on Machine Translation Based on Text Types
The applicability of machine translation to diverse types of texts is also the focus of related research because text types and features directly affect the quality of machine translation. Considering the machine translation of legal texts, Falian Zhang (2020) found that there are some issues with language comprehension and logic judgment in the machine translation of legal texts by comparing the translation capabilities of four machine translators, namely Google, Bing, Oulu, and Tencent Translate. He proposed that machine translation still requires human translators with professional knowledge for post-translation editing and reviewing.
On the machine translation of medical texts, Bin Tang and Shuo Chen (2020) compared the performance of medical texts in seven online machine translation software (Baidu, Sogou, Google, Bing, Jinshan, Youdao, and Tencent) through data processing and quantitative analysis. They found differences in the seven machine translations, and none of them could fully achieve translations that met medical standards. This study also suggested that combining human post-translation editing should be a critical way to ensure the quality of medical text translation at present and in the future.
On the machine translation of subtitles, using TED talks as the research materials, Weiqing Xiao and Jiahui Gao (2020) used the FAR model (functional equivalence, acceptability, reading experience) to compare the quality of English to Chinese subtitles automatically generated by "Wangyi jianwai" of NetEase (网易见外) with those of NetEase Open Class subtitles. This study reported a significant difference between the two, with machine translation subtitles having more mistakes in semantic segmentation and selection. According to it, we should concentrate on advancing translation technology and semantic segmentation in this area of translation while cultivating corresponding post-translation editing skills.

Comparative Studies on Human Translation and Machine Translation
Machine translation has unique advantages. Jun Yang (2018) proposed two crucial technologies in the revolution of machine translation: neural machine translation and speech recognition technology. Humans and machine have distinctive characteristics and uses, and they can be divided but cooperate. Machine translation has great market prospects in the face of daily communication that does not require exceptionally high quality. Wenhong Huang (2018) proposed that machine translation has three main functions nowadays: query function, speed-up function, and reference function, and the application methods are mainly direct translation (low cost and high efficiency), auxiliary translation (human translation as the main body), and modular translation (translators and translation software form good interaction).
On the other hand, machine translation has inherent defects, the most important of which is that it cannot have human intelligence to understand human language. Starting from the subjectivity of translators, Wei Chen (2020) discussed the essential difference between machine translation and human translation and emphasized that even if the technology of "deep neural network" is advanced, it still lacks the skills of a human brain ⎯ the social attributes of human beings.
Quality assessment of machine translation is the core of research. Qing Wang and Xiao Ma (2020) analyzed the linguistic problems (lexical polysemy, differences in syntactic structures, word separation, logic of meaning, context) and the quality assessment in machine translation. They tried to propose a set of quality assessment methods covering techniques such as statistics, cosine similarity, text sentiment analysis, etc., respectively, from macroscopic fidelity and accuracy (vocabulary, sentence, semantics, pragmatics) and microscopic perspectives (word frequency, word collocation, average sentence length, syntactic analysis, text similarity, subject matter, text sentiment, and tendency) to examine the translation quality. This study discussed three aspects of assessing translation quality: linguistic knowledge as a parameter, computer technology as an aid, and translation fidelity as a criterion; various instruments are discussed in exploring assessment methods: mutual information methods to calculate relevance (with formulas and cases), word frequency statistics, Stanford Parser tool (for syntactic analysis), cosine similarity and implicit Dirichlet distribution models to analyze text similarity and keyword matching at the semantic level. Similar studies are also available, such as Han Na (2018), which selected general texts, tourism texts, science and technology texts as the test data and compared Google and Youdao translation systems with human translation to analyze the differences at the lexical, syntactic, and discourse levels. It also discussed METEOR evaluation criteria based on machine translation, stating that error correction, which is done by humans, provides a corpus for training machines, which helps to optimize the technology of machine translation.
The tug-of-war between humans and machines revolves around the two main topics of whether humans will be replaced by machines and how humans and machines can work together. The prevailing view of current research is that machine translation cannot wholly replace human translation. Research on the future relationship between machine translation and human translation can be found in Jiangbo Tian (2018), Shi-Zhen Song (2019), Ying-Yu Pang (2019), Gulipina·Jiadi (2020), Li Li (2020), Chaowei Zhu (2018), etc. Chaowei Zhu (2018) argued that machine translation is unlikely to replace humans, especially since machine translation will never be competent in areas such as literary translation. The study cited characteristics of literary works such as imagery, emotion, fiction, metaphor, and polysemy, all of which are difficulties that machine translation has to overcome. In addition, the study suggested that in the context of quickly evolving web technology, machine-assisted translation systems, such as Trados, can be used to achieve human-machine cooperation and fully develop well-rounded translators. In terms of translation education and teaching, the study advised that the "whole-person education" of translation should be strengthened, dealing with the relationship between translation ability and humanistic literacy, the relationship between a specialist and a generalist, and the relationship between national standards and talents' qualities. Since there is no substitute, the best solution is cooperation. Shi-Zhen Song (2019), Benjin Zou (2018), and Rong Li (2019) have provided insights on solutions, such as realizing the combination of technology and humans; implementing human-machine cooperation models for translating different texts, including machine-led (e.g., processing texts such as WeChat and email), human-guided (e.g., processing literary and artistic texts), and human-machine cooperation (e.g., dealing with news and technology texts).
The study of specific models of human-machine cooperation is the key to the implementation of the vision, with discussions such as those by Xiangling Wang, Tingting Wang (2019), and Mei-Lin Yuan (2019). Xiangling Wang and Tingting Wang (2019) conducted an empirical study comparing human translation with machine translation integrated with post-translation editing (human-machine combination). 31 MTI students were recruited for the study to compare the speed, quality, and attitude of translating scientific and technical texts from English to Chinese and the MQM (Multidimensional Quality Metric) method was used to measure the effectiveness of the translation. The findings of this study demonstrate that when human post-editing is used, machine translation is superior to human translation. The ability of the translators, their attitude (positive or negative) toward post-editing, and the caliber of the machine translation all affect the quality of the translation. The best results can be achieved when a translator approaches post-editing with a positive mindset. This study recommended introducing post-editing in translation teaching and cultivating students' post-editing ability to make them more competitive in the translation market.

Interdisciplinary Studies on Machine Translation and Linguistics
The development of linguistic research is essential for machine translation breakthroughs. Huijun Zhao and ijel.ccsenet.org International Journal of English Linguistics Vol. 12, No. 6; Guobin Lin (2020) explored the linguistic path of machine translation intelligence, pointing out that the current machine translation lacks the guidance of linguistic principles and entirely relies on mathematical model algorithms. The study further proposed that the fully intelligent machine translation needs sufficient corpus and scientific learning methods, i.e., solutions in the direction of linguistic studies.
In terms of specific linguistic theory application, Xi Wang and Yang Chen (2019) examined machine-translated texts from a functional linguistic perspective. The study compared the translations supplied by Google Translate, Baidu Translate, and the official translations. It focused on evaluating the translations in terms of formal equivalence, meaning equivalence, and functional equivalence in systematic functional linguistics. According to this study, humaneness is one of the apparent factors why machine translation cannot replace human translation. As a result, it is crucial in specific translation research and teaching to focus on using AI research and machine translation to aid in translation research and education as well as to help students build their skills in evaluating, deciphering, editing, and amending machine-translation output.
In terms of translation processing of specific language difficulties, several studies also explored lexical, semantic, and syntactic perspectives. First, at the lexical and semantic level, Chi Yang and Xianze Yang (2020) combined linguistics and computer science research to discuss the processing of words. For example, polysemous words can be adequately treated in different specialized contexts, and a machine translation system can be used to build a dictionary of synonyms. Also, for polysemous words, Shuying Zhu (2019) examined the mistranslation brought on by multiple meanings in Chinese-English online machine translation, and five sentences with varying treatments in "Google Translate" and "Youdao Translation" were analyzed with the aid of NLPIR-ICTCLAS, a tool developed by the Chinese Academy of Sciences. The study concluded that the error in translation is due to machine translation's inability to effectively segment words and the misjudgment of the original text as ungrammatical; as a result, it suggested enlarging the corpus and enhancing the treatment of polysemous words as a fix. Second, at the syntactic level, long sentences have always been a focus of translation research. Jing  proposed a method of English long sentence segmentation based on dependent syntactic analysis and sequence annotation, which showed that machine translation based on long sentence segmentation could achieve better results.
The processing of chunks is also an important topic at the syntactic level. Fumao Hu and Keliang Zhang (2018) studied bilingual chunk correspondence for machine translation. They proposed a theoretical model for chunk composition analysis (i.e., triangular graphs with constructions, chunks, and words) and an analysis of chunk characteristics within the framework of constructional grammar. In addition, the study presented a theoretical basis and empirical support for the establishment of an English-Chinese construction database of business letters, which explored a new way to improve the translation accuracy of machine translation.
The subject-verb-object structure is a core category in syntactic structure. Using 50 English news articles as the corpus, Yuan Chen (2020b) analyzed the lexical and syntactic errors produced by machine translation in the subject-verb-object structure, concluding that grammatical rules must be added to the machine translation system's foundation in order to improve the quality.
Chinese is a topic-prominent language, and arguments of verbs can be omitted when the context is clear, thus also posing difficulties for linguistic analysis and cross-linguistic translation. Taking Google online translation as an example, Le Chang and Jun Gao (2020) summarized two types of subject mistranslation in Chinese-English machine translation, i.e., linguistic mistranslation and pragmatic mistranslation. They proposed post-translation editing strategies of complementary subject translation and subject adjustment.
Special and complex sentences have cross-linguistic characteristics and are naturally an important topic in linguistic research. The differences in the use of English-Chinese passive sentences are significant. Yiwei Ji (2019) took the translations of Google Translate and Youdao as an example to evaluate the quality of machine translations of Chinese passive sentences and found that Youdao is better than Google in terms of translation quality, thus illustrating the importance of building a sufficient text database to improve the translation. Relative clauses have also traditionally been the focus of linguistic research. Taking Google machine translation as an example, Wenzhao He and Defeng Li (2019) used a combination of quantitative analysis and qualitative research to evaluate Chinese translations of English relative clauses. They discovered significant differences between machine-translated relative clauses and human-translated relative clauses. According to this study, features in forms are strongly correlated with the quality of translation.

Studies on Challenges and Applications of Machine Translation
The role of machine translation in cross-cultural communication is also a selected topic in previous studies. Luo Huifang and Caiqi Ren (2018) discussed localization and machine translation technologies as significant development drivers. This paper argued for the advancement of AI and the integration of machine translation with human translation, stressing the importance of machine translation and localization in enhancing the nation's cultural reputation while connecting with foreign audiences. Chunfang Zhou (2019) examined the issue of avoiding or depending too heavily on machine translation, which causes people to overlook the cultural components of "intercultural" conversation. The study emphasized that culture and thought are human attributes that cannot be substituted by machines. Hence machine translation cannot resolve every issue with cross-cultural communication.
Machine translation shows a continuing upward trend, and ethical issues will become increasingly prominent in translation practice. Wen Ren (2019) examined the changes that machine translation activities bring to the field of translation today and proposed that machine translation challenges traditional translation ethics, covering these main aspects: the principle of importance (translated vs. original), the principle of responsibility (who is responsible for the translation), the principle of loyalty (interpersonal, client, translator), the principle of fairness and justice (the interests of the translator), the principle of harmonious ethical relations (stakeholders), the potential risks of other non-ethical behavior (linguistic diversity and creative expression). The study offered solutions to unlock related issues, advocating the adherence to traditional ethical norms of translation (e.g., fidelity, loyalty, etc.) and the optimization of ethical statutes of creative responsibility as guidelines for machine translation activities in the technological age.

Concluding Remarks
The current research on machine translation reflects the trend towards human-machine cooperation and language services. The advantage of machines lies in their speed, cost, and uniformity, while the disadvantage lies in their inability to think like humans. In the face of these technological trends and advancements, what the study and teaching of languages can do is to provide language services in the context of "collaborations between humans and machine and use of machines for humans," mainly through the following initiatives: First, it is crucial to strengthen interdisciplinary research in linguistics, literature, and culture in the service of machine translation.
In terms of the overall research status, according to the distribution of the keywords of the current relevant journal papers, as shown in Figure 2, in the studies involving the keywords of "neural machine translation," "machine translation system," "machine translation research," "human translation," "statistical machine translation," "post-translation editorial" and other related keywords, all must return to the study of human cognition and expression of emotions in language, and all must be built on the basis of the large-scale corpus. It follows naturally that linguistic research is essential. The overall trends of the published results, as shown in Figures 3−5, indicate that the scholarly community is still relatively underrepresented in language research and that more needs to be done to promote interdisciplinary study between language research and other disciplines.
In terms of the difficulties in breakthroughs, the discussion in Part 4 shows that most of the errors found in machine translation in current research are traditional topics of linguistic analysis, and some are areas where linguistic research needs to be expanded, such as missing subjects, confusion between singular and plural, misuse of articles, tense errors, lexical ambiguity, misuse of passive voice, misuse of relative clauses, implied meaning, etc. Therefore, topics that have not yet been systematically investigated need to be targeted toward cross-linguistic differences, human cognition, cultural characteristics, etc. All of these require close interdisciplinary cooperation between linguists and computer scientists to realize the combination of literature and science.
Secondly, the training of localized foreign language services should be strengthened to enhance students' ability to use technology and renovate it. As seen in the discussion in Part 4, there have already been some studies on translation education under the guidelines of human-computer cooperation, such as Ying Qin (2018) (2019), etc. However, in response to the need of the new era, more focused and organized studies are required for the educational reform of translation as well as the development of language services. Under the premise of consolidating the fundamental language abilities, and literary and cultural skills, the technical course contents need to be increased, and teachers and students need to be guided to have the consciousness of language service, the quality of technical learning, and the demand of market.