Applying News and Media Sentiment Analysis for Generating Forex Trading Signals

The objective of this research is to examine how sentiment analysis can be employed to generate trading signals for the Foreign Exchange (Forex) market. The author assessed sentiment in social media posts and news articles pertaining to the United States Dollar (USD) using a combination of methods: lexicon-based analysis and the Naive Bayes machine learning algorithm. The findings indicate that sentiment analysis proves valuable in forecasting market movements and devising trading signals. Notably, its effectiveness is consistent across different market conditions. The author concludes that by analyzing sentiment expressed in news and social media, traders can glean insights into prevailing market sentiments towards the USD and other pertinent countries, thereby aiding trading decision-making. This study underscores the importance of weaving sentiment analysis into trading strategies as a pivotal tool for predicting market dynamics.


Introduction
The Forex market, also known as the Foreign Exchange market, is a marketplace where currencies are traded in pairs [1].Traders engage in buying and selling currencies, aiming to profit from changes in their exchange rates.This market is highly dynamic and volatile, influenced by factors such as news, social media posts and political events.To stay ahead of the game, traders need to stay updated on the news and events that can impact currency exchange rates [2].Having the ability to accurately predict when to buy or sell prices is a tool for traders and investors.
In this paper, we aim to explore the potential of sentiment analysis for generating trading signals.While studies have been conducted on using news data to predict stock prices, there has been limited research on applying sentiment analysis to trading.
Sentiment analysis has emerged as a tool for understanding market sentiments and forecasting currency pair movements within the market.Our main objective is to utilize sentiment analysis techniques on both news data and social media information to generate trading signals.

Literature review
The prediction of currency exchange rates has been a puzzle and an area of interest for researchers, economists and financial policymakers.Sentiment analysis, which has historically played a role in fields like media analysis, product reviews and financial markets, has now gained prominence as a tool.In the realm of trading, sentiment analysis has become an element for predicting market movements.This literature review thoroughly explores research that utilizes sentiment analysis in trading.
One pioneering study conducted in 2011 took an approach by combining Google Trends data with sentiment analysis of news articles to forecast stock market movements.Their methodology showed an impact on predicting stock market trends, laying the groundwork for researchers [3].
Digging deeper into the practicality of sentiment analysis, the research conducted by Chen et al. (2014) deserves attention [4].Focusing on analyzing sentiments in news articles, they employed a support vector machine (SVM) classifier to identify whether sentiments were neutral, positive or negative.Notably, their findings emphasized the superiority of their approach compared to baseline methods in predicting stock returns.
Combining long short-term memory (LSTM) and convolutional neural network (CNN), for Social Media Sentiment.Shifting towards methods and datasets, the work carried out by Zhang and colleagues in 2019 [5] stands out.They decided to explore the sea of Twitter1,2 data using a combination of advanced LSTM and CNN architectures.Their main goal was to classify the sentiments expressed in Twitter posts into negative categories.The results they obtained were quite compelling, indicating that sentiment analysis on real time platforms, such as Twitter, has an impact when it comes to predicting forex exchange rates.
Most studies adopt a perspective by focusing on well-known currency pairs without exploring the intricacies and potential trading opportunities offered by lesser-known pairs.There is also a lack of research on the impact of sentiment analysis on different currency pairs.There is a need for more research on the integration of sentiment analysis with other technical indicators to improve Forex trading strategies.

Problem formulation and research questions
This paper aims to utilize sentiment analysis to generate Forex trading signals by using both news articles and social media data.Trading forex involves buying and selling currencies with the aim of making profit based on exchange rate fluctuations; it also involves buying and selling currency pairs based on their relative value.Traders employ tools and strategies to analyze market trends and make informed trading decisions.
A crucial factor influencing market trends is the impact of news and events on the economies of countries whose currencies are being traded.Sentiment analysis plays a role in examining text data from news articles and social media posts to uncover sentiments [6][7][8].It entails identifying and categorizing the polarity of expressed sentiment in text as positive, negative or neutral.

rbes.fa.ru
To develop a system that generates trading signals based on sentiment analysis results, we will utilize both lexicon based and machine learning based approaches.The lexicon-based approach involves employing the VADER (Valence Aware Dictionary and Sentiment Reasoner) tool to determine the polarity of news articles.On the other hand, the machine learning approach utilizes the Naive Bayes algorithm for sentiment classification of these news articles [9].Additionally, incorporating indicators as confirmation signals to enhance the accuracy of trading signals will be explored.
To accomplish this objective, a dataset comprising news articles pertaining to USD and other major currencies, in the market will be utilized.The dataset contains text data, as well as metadata such as the publication date and the source of the news article.The dataset was collected from reputable news sources such as Reuters, Bloomberg, and Twitter posts.
The main research questions to address in this paper are presented in Table 1.
By answering these research questions, this paper aims to provide insights into the potential of sentiment analysis in Forex trading and its impact on market trends.

Technical design
The technical design of this sentiment analysis based on a forex trading system involves different components, which include data pre-pro-cessing, sentiment analysis, and trading signal generation.In this section, a detailed description of the methodology and the techniques used for the components is provided.

Data pre-processing
News articles were collected from financial news websites and social media platforms that are related to the United States Dollar (USD).The USD is one of the major currencies used in the Forex market, and it is considered the world's most dominant reserve currency.It is widely accepted in international transactions.The data were preprocessed to remove noise and irrelevant information, such as stop words and punctuation.Text normalization was also performed; this involves converting all text to lowercase and removing any special characters and symbols [10,11].

Sentiment analysis techniques
Two main techniques used include Lexiconbased analysis and Naïve Bayes.

Lexicon-based analysis
After the words were pre-processed, the VAD-ER (Valence Aware Dictionary and Sentiment Reasoner) tool was utilized.VADER uses a predefined sentiment lexicon to assign a polarity score to each text document in the dataset.This contains a list of words and their associated neutral, positive, and negative scores.VADER To evaluate the accuracy of the signals generated using sentiment analysis Source: Developed by the author.

Review of Business and Economics Studies
assigns a score between -1 and +1 to each text document, where -1 indicates highly negative sentiment, and +1 indicates highly positive sentiment, and 0 indicates neutral sentiment.
To obtain an overall sentiment score for each currency pair, the polarity scores of all the news articles related to that currency pair were aggregated.The aggregation was done using a weighted average, where the weight of each polarity score was determined based on the relevance of the news article to the currency pair.

( ) (
) where Polarity score is the sentiment polarity score assigned by VADER to the news article or social media post, and Weight is the relevance weight assigned to the news article.
The above is used by VADER, where valence scores are the set of values assigned to each word in the sentiment lexicon, indicating its positivity or negativity.Alpha, in this case, is a smoothing factor to normalize the score.This algorithm was selected because, unlike other algorithms, it doesn't solely focus on words; it also considers the context in which they are used.This distinction enables it to effectively discern the sentiment behind nuanced statements commonly found on social media platforms.Furthermore, it is adeptly tuned to grasp the sentiment conveyed by slang, acronyms, and prevalent internet terms.When juxtaposed with models like deep learning approaches, VADER achieves a harmonious balance between computational efficiency and accuracy.

Naïve Bayes
Naïve Bayes is a machine learning algorithm that can be used for classification.It is also a probabilistic algorithm that works on the assumption that each feature is independent of all other features.To use this for sentiment analysis, the model needs to be trained using a dataset of news articles and social media posts.In this case, they were labeled as either neutral, positive, or negative.
After training the model, it was used to classify the sentiment of the news articles and social media posts that are related to USD.The model then computed the probability that a given text in the news article or social media post belonged to each sentiment class (positive, negative, or neutral) based on the frequency of words in the text that appeared in each class of the training data.
The application of Naïve Bayes theorem to a class y with three possible outcome (neutral, positive, or negative) and a dependent vector ) is expressed as: where: • P(y|x): The probability that a given piece of text falls under sentiment class y based on the word frequencies represented by vector x.
• P(x|y): Likelihood of encountering the word frequencies in vector x in texts classified under sentiment class y.
• P(y): Base probability of a text being categorized under sentiment class y, without considering its word frequencies.
• P(x): Likelihood of the word frequencies denoted by a vector appearing across a sentiment classification.
If one assumes that all features are independent, this relationship can be further simplified to: appearing in texts of sentiment class y, given the assumption that words appear independently.
Using the above equation, we deduce: A rejection model variant for classification is used here; for a classification to be accepted, the probability needs to be greater than a specific threshold, or projection.If this threshold is not met, the mode will reject classifying a given sample.The Naïve Bayes classifier is useful for sentiment analysis because its computational complexity is light weight, making it resilient to over-fitting and suitable for use with large datasets from news articles and social media posts [5].
To further illustrate the Naïve Bayes procedure.Here is an example of how this model works.Suppose a news article is titled "Federal Reserve announces interest rate hike, boosting dollar".After the noise is removed from this news article.The probability is calculated that the words in the news data appear in each sentiment class, which is neutral, positive, or negative based on the training data.The word "boosting", for example, appears to be in positive texts rather than in negative or neutral texts.
The probability that the text belongs to one of the sentiment classes based on the probabilities of the individual words is then calculated [8].By using this procedure on a large dataset of news articles and social media posts, the trading signals were generated based on the prevailing sentiment towards USD.
This approach was chosen because of its simplicity and rapid performance, making it ideally suited for generating real-time Forex trading signals.While other machine learning methods, such as support vector machines (SVM) or decision trees, offer nuanced decision bounda-ries, the probabilistic nature of Naive Bayes facilitates sentiment estimation based on the provided features.Its inherent capability to manage feature spaces, which are common in text data, grants it a distinct advantage.

Trading signal generation process
After the sentiment analysis scores for the news articles and social media data are obtained, the next step is to use the scores to generate Forex trading signals based on the sentiment analysis results.Forex trading signals are indications that tell Forex traders when to buy, sell or hold a particular currency pair based on market analysis.
To generate this Forex trading signal, the sentiment analysis results were combined with technical analysis indicators.Technical analysis is a forex market evaluation process that assists traders in predicting the next direction of currency pair values based on past price movements and chart patterns.According to technical traders, markets usually move in predictable patterns that may be observed and identified on trading charts.
There are numerous technical analysis indicators that are used in Forex trading that can be analyzed using data mining techniques to generate trading signals.These indicators are used to confirm the signals generated by the sentiment analysis.Some of the technical indicators include moving average convergence divergence (MACD), which helps Forex traders identify emerging price trends, upward or downward.Bollinger Bands, which is used to lay out trend lines two standard deviations away from the simple moving average price of a financial instrument.Stochastic Oscillator helps to compare a currency pair's closing price to its price range over a specific period.
The moving average (MA), which is the average price of a currency pair over a set period, was used.The relative strength index (RSI), which measures the strength and weakness of a currency pair's price action, was also used.It ranges from zero to 100, and it identifies when a currency is overbought or oversold in market conditions.The selection of MSI and MA indicators stems from their established efficacy in pinpointing price momentum and trends.These indicators enhance clarity and furnish decisive signals when evaluating sentiment within Forex trading.
When a sentiment analysis score and these two technical analysis indicators align, then a trading signal is generated.For example, if the sentiment analysis indicates a positive score and the technical analysis indicator also confirms and indicates a bullish trend, then a "buy" trade signal is generated.
The sentiment analysis result was positive, and the currency price was above its 50-day moving average (MA) and the relative strength index (RSI) was above 50, we generated a stronger buy signal than if only the sentiment analysis result was positive.On the other hand, if the sentiment analysis indicates a negative score and the technical analysis indicators also indicate a bearish trend, then a "sell" trade signal is generated.
If the sentiment analysis and the technical indicators do not align, then no signal will be generated, and traders would have to wait and exercise caution.Forex trading involves risk, so traders should always exercise caution when the sentiment analysis and the technical indicators do not align.

Experimental analysis
This section shows the experimental setup and methodology used in this research, followed by a presentation of the results and an analysis of the findings.

Experimental setup and methodology
The models used for the sentiment analysis were implemented in the Python programming language using the scikit-learn library.The dataset consists of news articles from financial news websites and social media posts that are related to the USD.After the dataset was pre-processed, i. e., removing noise such as irrelevant information as mentions, links, and hashtags.The training set was used to train the sentiment analysis models and the testing set was used to evaluate the performance of the models.
Two models of sentiment analysis were used for this research, which include lexicon-based analysis and Naïve Bayes.The lexicon-based models classify news articles and social media posts using pre-built sentiment.While the Naïve Bayes model was trained using Gaussian as-sumption on the dependent variable to estimate P(y) and P(x i |y).
The scores generated by the sentiment analysis models were used to generate Forex trading signals.To confirm the accuracy of the generated signals, we use two technical indicators: the moving average (MA) and the relative strength index (RSI).The RSI indicator measures the convergence and divergence of two moving averages and identifies potential trend changes.The MA identifies trends and support, and resistance levels.

Duration and long-term trends
The experiment was based on historical data from April 1st, 2023, to July 26th, 2023.During this timeframe, there was notable market volatility.The US dollar witnessed a strengthening trend against various currencies due to several factors.These included the Federal Reserve's decision to aggressively raise interest rates to tackle inflation, the ongoing conflict in Ukraine and its impact on the global economy, as well as China's economic slowdown.Among the currencies that were hit the hardest were the euro (EUR) and pound sterling (GBP), experiencing declines of 10% and 15% against the US dollar during this timeframe, respectively.Additionally, the Japanese yen (JPY) also experienced a depreciation of around 17% compared to the USD.

Timeframe
For the purpose of this research, the H4 timeframe was chosen for analysis.This timeframe strikes a balance between the short-term fluctuations observed in M15 (15-minute) or H1 (1hour) timeframes and the longer-term trends seen in weekly timeframes.Since news articles and social media posts used for sentiment analysis may not capture immediate market reactions, analyzing the H4 timeframe provides a broader window to observe how sentiment translates into price action.

Volatility and slippage
Between April 1st and July 26th, 2023, the Forex market experienced significant volatility.Due to this heightened volatility, a discrepancy was observed between the price at which orders were placed and the price at which they were exrbes.fa.ruSource: Adopted by the author from Investing.com

Review of Business and Economics Studies
ecuted -a phenomenon commonly referred to as slippage.This slippage was especially pronounced in timeframes like M1 and M5, where algorithmic trading systems respond instantly to information.However, in the H4 timeframe these algorithms react with a cushion against news events and sentiment scores, helping to mitigate the impact of slippage.

Results and analysis
The generated forex trading signals were evaluated using historical forex data.With confirma-tion from the MA and RSI, the signals generated by the Naïve Bayes model showed that a profit of over 12% was gained over the testing period.While the lexicon-based models using confirmation from the RSI and MA indicators generated a profit of 5% over the testing period.
Naïve Bayes is the most effective in analyzing sentiment in this research project based on the evaluation metrics.The accuracy of the Naïve Bayes model indicates that it correctly classified 85% of the news articles and social media posts as either neutral, negative, or positive sentiments.The lexicon-based model has lower accuracy, which suggests that it may not be as effective as Naïve Bayes in identifying the correct sentiment in Forex trading (Table 2).
The lexicon-based model has a precision score of 0.72, a recall of 0.70 and an F1 score of 0.71.The Naïve Bayes model has a precision score of 0.87, recall of 0.85 and an F1 score of 0.86.These results suggest that 87% of the positive classifications made by the Naïve Bayes model were indeed correct when it classified news articles or social media posts as either neutral, positive, or negative.When the model encounters a neutral, positive, or negative sentiment, 85% of the time the recall score correctly identified it.The F1 score shows that the Naïve Bayes model has a good balance between precision and recall.
In terms of using technical analysis indicators for confirmation, the analysis showed that the moving average (MA) and relative strength index (RSI) are effective in confirming the trading signal models generated by the sentiment analysis model.The sentiment analysis can generate a positive score, and then the MA and RSI confirm the buy signal, resulting in a profitable trade.Similarly, the models can also generate a negative score, and the MA and RSI confirm the sell signal, which results in a profitable trade (Tables 3-5).

Limitations and potential improvements
Sentiment analysis can sometimes be biased, and this system relies on it.One potential improvement would be to use more advanced sentiment analysis, such as deep learning ap-proaches, which can be more effective in capturing nuances in language.
This paper also shows that sentiment analysis is good for signal generation in the Forex market and can improve traders' performance.

Conclusion and future work
The significance of this research lies in its potential to enhance the quality of generating profitable Forex trading signals and helping Forex traders in making decisions.This research project has investigated the potential of sentiment analysis to generate Forex trading signals.This project proposed and implemented a methodology based on lexicon-based analysis and Naïve Bayes to assign a neutral, positive, or negative score to the sentiment of news articles and social media posts related to United States Dollars (USD), and then combine technical analysis indicators to generate a signal.
As for future work, there are several areas where further research and investigation can be conducted.Since there are different factors that affect the Forex market other than financial news articles or social media posts, advanced sentiment analysis techniques such as deep learning and neural networks can be used on various factors, which include political events, economic data releases, natural disaster news, among others, to further provide traders with useful insights to generate profitable Forex trading signals.
Also, exploring the application of sentiment analysis in other financial markets beyond Forex, such as stocks or commodities and crypto trading, could also be an interesting avenue for future research.

AKNOwLEDGEMENT
The author is thankful for the opportunity to have been taught by Professor Carl Yang and Professor Fei Liu in the Department of Computer Science.Their distinct teaching methods and dedication to excellence have significantly contributed to the author's understanding of Data Mining, which was instrumental for the author's research.

∏argmax
This term sums up the likeli hood of all words in vector x showing up in texts under sentiment class y.rbes.fa.ruGiven a constant P(x) for the input, classification is determined by: This function pinpoints the sentiment class y which has the highest overall probability when considering the words in vector x.For estimating ( ) P y and ( | ) i P x y , a Gaussian assumption is leverage on the dependent variable through the Maximum A Posteriori (MAP):

Table 1
Research Questions and Objectives

Table 2
Precision, recall and F1 score Source: Developed by the author.