How and Why Large Language Models Rewrite Text: A Study in Media Bias Mitigation

Student: Neel Iyer
Table: COMP2
Experimentation location: Home
Regulated Research (Form 1c): No
Project continuation (Form 7): No

Display board image not available

Abstract:

Bibliography/Citations:

Fokkens, Antske, and Piek Vossen, eds. ‘Computational Linguistics for Subjectivity Preslav Nakov’. In Creating a More Transparent Internet, 31–54. Studies in Natural Language Processing. Cambridge: Cambridge University Press, 2022. https://doi.org/10.1017/9781108641104.003.
Grieco, Elizabeth. ‘Americans’ Main Sources for Political News Vary by Party and Age’. Pew Research Center (blog), 2020. https://www.pewresearch.org/short-reads/2020/04/01/americans-main-sources-for-political-news-vary-by-party-and-age/.
Haller, Patrick, Ansar Aynetdinov, and Alan Akbik. ‘OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs’. arXiv, 7 September 2023. https://doi.org/10.48550/arXiv.2309.03876.
Lin, Luyang, Lingzhi Wang, Xiaoyan Zhao, Jing Li, and Kam-Fai Wong. ‘IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators’. arXiv, 1 February 2024. https://doi.org/10.48550/arXiv.2402.00345.
Neumann, Terrence, Sooyong Lee, Maria De-Arteaga, Sina Fazelpour, and Matthew Lease. ‘Diverse, but Divisive: LLMs Can Exaggerate Gender Differences in Opinion Related to Harms of Misinformation’. arXiv, 29 January 2024. https://doi.org/10.48550/arXiv.2401.16558.
Radivojevic, Kristina, Nicholas Clark, and Paul Brenner. ‘LLMs Among Us: Generative AI Participating in Digital Discourse’. arXiv, 8 February 2024. https://doi.org/10.48550/arXiv.2402.07940.
Recasens, Marta, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. ‘Linguistic Models for Analyzing and Detecting Biased Language’, 1650–59. ACL, 2013. https://pure.mpg.de/pubman/faces/ViewItemOverviewPage.jsp?itemId=item_2173554.
Rodrigo-Ginés, Francisco-Javier, Jorge Carrillo-de-Albornoz, and Laura Plaza. ‘A Systematic Review on Media Bias Detection: What Is Media Bias, How It Is Expressed, and How to Detect It’. Expert Systems with Applications 237 (1 March 2024): 121641. https://doi.org/10.1016/j.eswa.2023.121641.
Rozado, David. ‘The Political Preferences of LLMs’, n.d.
Wiebe, Janyce M., Rebecca F. Bruce, and Thomas P. O’Hara. ‘Development and Use of a Gold-Standard Data Set for Subjectivity Classifications’. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 246–53. College Park, Maryland, USA: Association for Computational Linguistics, 1999. https://doi.org/10.3115/1034678.1034721.
Wiebe, Janyce, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. ‘Learning Subjective Language’. Computational Linguistics 30, no. 3 (1 September 2004): 277–308. https://doi.org/10.1162/0891201041850885.

Additional Project Information

Research paper:
Additional Resources: -- No resources provided --
Project files:
Project files
 

Research Plan:

Rationale

Past research has concluded that the role of news in democracy and fostering it is absolutely essential. Bias in news however, is a perennial issue, and manifests in different ways. Moreover, according to the NIH, “exposure to biased information can lead to negative societal outcomes, including group polarization, intolerance of dissent, and political segregation.” These negative consequences can have severe societal consequences, and it is imperative to tackle it. To better analyze the framework of bias, I use subjectivity as research concludes subjectivity is not only the closest link to bias, but also a manifestation of bias inherently. Subjectivity is also a way that bias that can be analyzed with, and can be mathematically quantified.  This project seeks to gather articles on 10+ topics minimum with a minimum dataset size of 500 articles and evaluate how large language models that are generative in nature can not only shift article tone, but rewrite articles completely. It then seeks to answer how these LLMs work, and what elements of the text are primarily being rewritten.

 

Research Questions

Can large language models (LLMs) be used to rewrite text to lower subjectivity?

Does the text rewritten with the aid of LLMs effectively maintain the content of the original text?

How does the LLM rewrite text?

Is subjectivity consistent across media? Are there particular topics where its concentrated?

How does the initial article subjectivity correlate with subjectivity reductions? Is there a correlation? 

 

Procedure

Gather a dataset of 500+ articles spanning a total of 12 topics: affirmative action, censorship, criminal justice, education, foreign aid, healthcare, income inequality, LGBTQ, marijuana, net neutrality, police reform, vaccination

Evaluate subjectivity for each article using TextBlob and its sentiment pipline

Design a LLM prompt to rewrite each article and feed all collected articles through the LLM

Compare benchmarked subjectivity to new subjectivity and run a paired t-test to see differences and whether I can reject the null hypothesis

Evaluate word embeddings for each article and calculate cosine similarity between old article and new articles to see if article preservation exists

Calculate old vs. new sentence length and sentence subjectivity to investigate how LLMs eliminate present subjectivity

Perform k-means clustering using an knee test to determine how many clusters and see properties of each individual cluster

Run a correlation analysis between original subjectivity and subjectivity percent decrease.

 

Risk and Safety

There are no inherent risks or safety measures required. Gemini (LLM used for this project) creates automatic measures to prevent the parsing of inappropriate and/or offensive text, which eliminates the issues with the propagation of stereotypes and bias. Moreover, all experimentation will be conducted at home over the internet. To prevent overload of any internet servers, any API requests will not violate a speed of 60 requests/minute. This will ensure safe and stable access to diverse resources while not placing undue burden.

Questions and Answers

  1. What was the major objective of your project and what was your plan to achieve it? 

The major objective of this project was to provide a transformation in news sources from subjective slant to objective facts. To achieve this, my plan was to create a dataset of news sources, rewrite it using large language models (LLMs) and compare subjectivity. We then tried to foster model interpretability via looking at cosine similarity, length, and clustering to better understand how a model is able to rewrite text and what changes does it make.

 

  1. Was that goal the result of any specific situation, experience, or problem you encountered? 

After the events of Jan 6th, I noticed a growth in political polarization. People would tend to associate different media sources with different political viewpoints. And while that comes naturally with opinion articles, news by itself should be objective. My teacher showed me a graph, with a spectrum of news sources organized by political leaning. I thought that effectively removing subjectivity, which was a representation of bias, would be able to make such spectrums unnecessary and drive better news. 

 

  1. Were you trying to solve a problem, answer a question, or test a hypothesis?

Under the broader problem of bias in media, this project aimed to test a hypothesis. I hypothesized that LLMs with the correct prompt, after feeding an article through, would produce decreases in subjectivity. After running a t-test and finding a p-value of less than .05, I was able to successfully reject the null hypothesis. The rest of this study was for model interpretability and to ensure rewritten text closely matched the original.

 

2. What were the major tasks you had to perform in order to complete your project?

 

First, I selected 12 topics as media queries that would result in articles with contrasting views. I used Google News as my media aggregator to promote both relevance and public traction. Aggregating media over 50+ sources, I ended up scraping a total of 1098 articles under these 12 topics, with roughly equal distributions.

Next, I had to create a benchmarking mechanism. Using a sentiment pipeline from TextBlob, a Python library, I was able to extract the subjectivity from each source. Here, subjectivity was calculated on a word based level, and multiplied by an intensity factor determined by word modifiers. After I benchmarked each article, we then used GPT3.5 to prompt engineer a prompt for the purpose of rewriting to promote objectivity and tested it on sample cases. My final prompt was: “Rewrite text to be concise, fact driven, objective, neutral statements that aren't lengthy. They shouldn't hold any political affiliation, but be fact driven and short. You should have a formal, neutral tone, and not express any one view, but present both sides equally and fairly.” 

We used Gemini, one of the best LLMs out there to rewrite all given articles. However, due to API errors, we were left only with a dataset of 768 articles. We evaluated each rewritten article with our previous subjectivity methodology.

For our conclusions, we first ran a paired t-test to ensure our results were statistically significant (they were). We noticed on average a 10% decrease in subjectivity found from LLM rewritten text. We then created word embeddings for each article using Spacy, a Python library, and performed a cosine similarity analysis between original article content and rewritten text, finding an average cosine similarity value of 0.84, indicating rewritten text was highly similar and preserved the original content. Secondly, we then compared the length of responses, finding that the model both decreased sentence length and average sentence subjectivity, indicating it removed most subjective and unnecessary content. Lastly, we clustered our articles through K-means clustering, finding that for certain clusters, there was a moderately positive correlation between initial subjectivity and efficacy of model rewriting, displaying that our model works best on highly polarized/subjective datasets.

 

3. What is new or novel about your project?

 

  1. Is there some aspect of your project's objective, or how you achieved it that you haven't done before?

While I have done preliminary analysis with zero-shot learning for utilizing pretrained models on inference classification projects, utilizing generative AI to not classify, but rather rewrite is a new aspect for me.

 

  1. Is your project's objective, or the way you implemented it, different from anything you have seen?

Previous work with language models to tackle media subjectivity has been done before. However, this project’s objective (using generative LLMs to effectively remove bias from news media) hasn’t been replicated before. Previous attempts have used models such as BERT (transformer architecture) to detect words that heavily contribute to subjectivity measures through contextualized embeddings, and then pick adjacent and neutral words. However, as detailed further in question 3C, the methodology for this paper didn’t just seek to eliminate subjective words, but change the tone of the article to remove subjective “fluff” or prevent “slant” in news.

 

       c. If you believe your work to be unique in some way, what research have you done to confirm that it is?

To confirm that my work is novel, I performed a thorough literature review of past papers spanning broader information systems research around media, media subjectivity, the link between media subjectivity and media bias, large language models (LLMs) and the utility of LLMs to this issue. I found that media bias was commonly associated with a “slanting” of facts to a certain perspective, and that this was most heavily correlated with subjectivity. However, methods to remove this subjectivity aren’t well implemented, as previous papers only remove subjectivity at a word based level, and don’t modify the inherent tone of the article. That’s why I believe my methodology of using LLMs to rewrite the text is significantly different from previous work and stands as a unique contribution to the field.

 

4. What was the most challenging part of completing your project?

 

The most challenging part of completing my project was creating a dataset that didn’t have any biases in it (sampling, distribution, etc). I originally attempted to compile a database of 15 topics, but due to sampling and distribution issues, I had to reduce it down to 12. This made me statistically analyse if I’m overrepresenting certain categories, and how to tackle that.

 

  1. What problems did you encounter, and how did you overcome them?

Firstly, I encountered a lot of LLM API issues. Parsing through the Gemini API often resulted in no responses, or issues. I tackled this by broadening the data collection in each topic, to ensure that even if only 80-85% of responses were received back, the dataset would still be large enough to make statistical inferences. Secondly, when I was trying analyse whether rewritten text would match similarly to the original text using cosine similarity, I ran into issues with numerically representing this text. However, I was able to overcome this by utilizing Spacy’s large English model to generate a unique word embedding for each word and compare them document by document.

 

  1. What did you learn from overcoming these problems?

Overcoming these problems taught me how to think about sampling statistically. When I was faced with an issue I couldn’t necessarily solve the root issue of, given it was out of my control, it showed me how to adapt to ensure the issue wouldn’t have a significant impact on the broader scope of my project. Moreover, overcoming these problems also introduced me to different libraries and modules to achieve specific and niche tasks I wouldn’t have used before. This deepens my understanding of ML + AI and allows me to build more advanced and rigorous projects in the future.

 

5. If you were going to do this project again, are there any things you would you do differently the next time?

I would like to run this project also as a time-based study over a longer period of time. This would allow for more “media-affecting” events to take place in my scraping, letting me generate a larger and more cohesive data. This would also allow me to visualize the fluctuation of subjectivity over time, and get a stronger picture of how media subjectivity is affected and contextualized by socio-political events.

 

6. Did working on this project give you any ideas for other projects? 

 

Understanding if and how large language models (LLMs) can mitigate bias has inspired me to work on other LLM research as well. Whether it be analysing the role of these LLMs in social media, and if they can prevent echo chambers by improving recommendation systems with an extension on this study to quantify bias with zero-shot learning, or investigating real-time rewriting of speeches and debates and contrasting them to the real version to promote elector integrity, I want to learn how not just LLMs, but technology as a whole can impact the society we live in. Moreover, in this project, I asked the question: “can LLMs mitigate bias?” However, this project has inspired me to also ask: “What bias do LLMs contain themselves? How can they be conscious of that and mitigate that?” Through upcoming research with Professor. Pradhan @ NJIT, I seek to conduct independent studies on the implicit encoded biases found in large language models created by disparities on power/oppression axes.

 

7. How did COVID-19 affect the completion of your project?

 

COVID-19 didn’t have any direct impacts on the methodology of the project and the proposed hypothesis. However, the pandemic had a significant impact on the news industry. Whether it be smaller news publishers being forced to close down, or journalists losing jobs, COVID-19 could have resulted in a monopolisation of news content and the loss of certain journalistic diversity. This then leaves us with the question of how subjectivity across media would have differed without it.