Linguistic-Based Detection of Fake News in Social Media

The tremendous growth and impact of fake news as a hot research field gained the public’s attention and threatened their safety in recent years. However, there is a wide range of developed fashions to detect fake contents, either those human-based approaches or machine-based approaches; both have shown inadequacy and limitations, especially those fully automatic approaches. The purpose of this analytic study of media news language is to investigate and identify the linguistic features and their contribution in analyzing data to detect, filter, and differentiate between fake and authentic news texts. This study outlines promising uses of linguistic indicators and adds a rather unconventional outlook to prior literature. It utilizes qualitative and quantitative data analysis as an analytic method to identify systematic nuances between fake and factual news in terms of detecting and comparing 16 attributes under three main linguistic features categories (lexical, grammatical, and syntactic features) assigned manually to news texts. The obtained datasets consist of publicly available right documents on the Politi-fact website and the raw (test) data set collected randomly from news posts on Facebook pages. The results show that linguistic features, especially grammatical features, help determine untrustworthy texts and demonstrate that most of the test news tends to be unreliable articles.

hide information, which causes behavior changes and, consequently, changes in verbal and written texts. They attempt to change their writing style to fabricate individual facts for specific purposes. It contains linguistic features change, and by investigating these features, one can reveal false texts. That challenge encourages researchers to look at several fashions for detecting deceptive texts (Rao & Rohatgi, 2000). The linguistic analysis could identify Sci.crypt anonymous authors by comparing their text contents with documents associated with the RFC database and the IPSec mailing list. Thus, the linguistic construction of news articles can help fact-checkers in identifying hoaxes and deliberate misinformation.
We can study fake news from three perspectives: (I) style: fake news writing style, (II) propagation: how fake news spread, and (III) users: how users participate in fake news and the role users can play in all these perspectives Zhou & Zafarani, 2018).
Hence, there is an urgent need to develop approaches for detecting fake news based on their content. In linguistic methods, the content of false texts is extracted and analyzed to relate language patterns with deception (Conroy et al., 2015). In this paper, the authors proposed a linguistic-based fake news detection method. This method empirically focuses on analyzing and investigating the news articles' linguistic characteristics in content structure and style as a foundation for news credibility inference. It attempts to differentiate between fake and real news and assess fake texts' truth value. Relying on the social and psychological theories as a systematic framework of the study, the authors attempt to examine authentic texts' explainable manual linguistic attributes and their contribution to detecting fake news. These theories stated some linguistic cues when a human being lies compared to when he or she tells the truth. Fake news tends to be less complicated to comprehend because deceivers' language style implies more straightforward sentences, fewer long sentences, and shorter words than truth-tellers (Burgoon et al., 2003). Undeutsch hypothesis states that fake statements vary in writing style and quality from factual statements (Udo Undeutsch, 1967).
Based on these attributes, this study aims to introduce qualitative and quantitative analytic research on the language of two types of news articles in the context of fake news detection. First, the authors attempt to examine and identify the real articles' linguistic features obtained from the Politi-fact site, then compared them with the linguistic features of a set of chosen news articles from Facebook to identify its trustworthiness.
The rest of this paper structure organized as follows: Section 2 represents the literature review. Section 4 introduces fake news definitions, section 3 defines data collection, and section 4 describes the study's methodology and model. Section 5 displays the results, section 6 discusses the results, and section 7 concludes the article and introduces possible future studies.

Significance of the Study
In this paper, the authors proposed a linguistic-based fake news detection method. This method focuses on analyzing and investigating the news articles' content structure and style based on the texts' linguistic characteristics to differentiate between fake and real news as a foundation for news credibility inference and assess fake texts' truth value.
Based on a set of linguistic features and attributes, this study aims to introduce qualitative and quantitative analytic research on the language of two types of news articles in the context of fake news detection. The authors compared the language of a set of news articles with the right articles obtained from politi-fact.com to identify deceptive news text's linguistic features and classify those set of news articles.

Related Works
Although Fake news detection is a hot research area, it is not a new phenomenon. Many works studied fake news in the context of their content, the way it spreads, and others its writing style (Zheng et al., 2006). Markowitz and Hancock (2014) demonstrated how linguistic patterns related to discourse dimensions could be used as cues to differentiate between fraudulent and genuine publications of the social psychologist Diederik Stapel's. Golbeck et al. (2018) utilized a word-based classification approach based on the Naive Bayes Multinomial Algorithm to identify the linguistic nuances between fake and satirical articles. Levi et al. (2019) proposed a machine learning method using semantic representation to identify fake news and satire's nuances. They used the Coh-Metrix tool for producing linguistic and discourse terms of texts and attempt to address the challenges of identifying the differences between fake news and satire. They stated that satire language seems to be more sophisticated than counterfeit articles. Newman et al. (2003) used some linguistic hints such as self-references or positive and negative words to distinguish truth-tellers from liars.  utilized the document's latent embedding to identify and detect false news. Wang (2017) attempted to classify fake news content based on the convolutional neural network (CNN). While Qin et al. (2005), in their work, attempted to explore and analyze the number of Other work has focused on analyzing the self-references, the number of words and sentences, affect, spatial and temporal information associated with deceptive content. Ruchansky et al. (2017) stated that people widely use social media to express their feelings and emotions, and these posts can help for feature detection. They utilized social media posts to extract the differences in temporal engagement patterns between real and fake news. Burfoot and Baldwin (2009) used a support vector machine algorithm (SVM) to automatically classify the content's lexical and semantic features to differentiate between the actual and satire contents. In their works, Ott et al. (2011), Shafqat et al. (2016, Zhang and Guan (2008), Warkentin et al. (2010), Toma and Hancock (2010) tried to do an automatic detection of deceptive content. They explored different domains such as online dating, crowd founding platforms, consumer reviews websites, and online advertising. Rubin et al. (2016) tried to detect satire news from real news using an SVM-based algorithm with five predictive features (Absurdity, Humor, Grammar, Negative Affect, and Punctuation). Their results revealed that the best prediction feature combination (Absurdity, Grammar, and Punctuation) detects satirical news with a 90% precision and 84% recall. Bessi et al. (2014) studied the spread of false news on social media. Their study proposed that users who interact using different social media are more probably use false information. Their focus was on the attention given to the false news on Facebook. Shao et al. (2016) introduced the Hoaxy platform for automatic tracking of both true and false online misinformation, relying on the efforts of other fact-checkers such as snopes.com. Zhou et al. (2019) used the theory-driven model in their proposed method for fake news early detection. This method investigates news content at different linguistic levels relying on well-established theories in social and forensic psychology.

Definition of Fake News
The term fake news is not new. It began as the news printing press started. As a term, it appeared in the Oxford Dictionary in 2017. Fake news is a fictitious article deliberately fabricated to deceive readers. It is a means to increase the amount of readership or to create psychological warfare. There are many studies about fake news, and there is no agreed definition of this term. Many studies connect fake news and other terms such as false news, rumor, misinformation, and maliciously false news. According to Allcott and Gentzkow (2017), fake news is news articles that are deliberately and verifiably false and could mislead readers. Conroy (2015) treats fake news as deceptive news, including heavy fabrication, hoaxes, and satires in his work. Balmas (2014) stated that fake news refers to satire news as they contain false content. Unlike fake news, satire news in its nature is entertainment-oriented.

Methodology
The reliable methodology for identifying fake news is still challenging among researchers; however, some linguistic attributes are used to explore different language categories' relationships. This section introduces the methodology through which this study was processed. The researchers downloaded twenty factious articles from Politi-fact websites and twenty news articles posts on Facebook to be analyzed based on a set of linguistic characteristics. They thus assisted in classifying news texts, either true or false. Then, they clean the obtained data in the form of texts from all "stop" lists such as posters, digits, timing, and dates. They utilized the QDA tool to process the collected datasets; QDA (Qualitative Data Analysis) tool offers a data annotation with evaluating metrics for text mining. It can analyze news, survey interviews, spreadsheets, online, videos, pictures, and audio files. The analysis and detection of the collected articles' writing content structure and style based on a bundle of discriminating linguistic features and attributes are chiefly stylistic features for natural language analysis.

Data Collection
The first step in this study is data construction. For conducting this study, the authors obtained two datasets from social media websites as follows: • Dataset 1: the first dataset includes 20 authentic texts download from the Politi-fact website (a fact-checking website led by Tampa Bay Times journalists to validate declares by elected officials and others on its Truth-O-Meter). The unique advantage of Politi-Fact is that every quote is rated on a 6-point scale, ranging from "True" (factual) to "Pants on-Fire False" (absurdly false).

•
Dataset 2: The second dataset contains 20 news reports chosen randomly from different Facebook pages to be assessed compared to real news articles in dataset 1. The obtained datasets are collected in the form of texts ijel.ccsenet. and proces

The resea empirically
To detect attributes to-infinitiv conjunctio These com actual new below: The author content an (linguistic significant linguistic indicators used for news classifications and counterfeit content detection. This new perspective uses qualitative and quantitative analysis as a considerable and effective method that investigates and provides a computational representation of the content structure's discriminated linguistic features and style in textual data. More importantly, the study attempted to highlight the noticeable linguistic differences between authentic and fake news contents, thus reducing the blurry line between them.
In this study, the authors attempt to analyze two datasets linguistically. When comparing the linguistic characteristics of dataset 2 with those authentic texts download from the Politi-fact website, the results showed that dataset 2 tends to be fake rather than actual. Another exciting research line identifies a set of lexical-, grammatical-and syntactic features of fake news. The authors plan to investigate and explore more linguistic indicators for future work, specifically semantic and pragmatic related features.