Using Large Language Models to Analyze the Motives Behind the Gifts to the United Nations

9 min readJul 13, 2024

By Shiming H, Aditi J, Ileana L, Hannah T — MehtA+ AI/Machine Learning Research Bootcamp students

In a project in partnership with CUNY professor, Prof. Elizabeth Macaulay, high school students in MehtA+ AI/Machine Learning Research Bootcamp were provided with a United Nations Gifts Dataset and tasked to use AI to understand why? In part 1 of a seven part series, students explore ways in which AI can help us understand archaeological gifts better.

If you would like to learn more about MehtA+ AI/Machine Learning Research Bootcamp, check out https://mehtaplustutoring.com/ai-ml-research-bootcamp/.

*******************

Since the inception of the United Nations, many artifacts have been gifted to the organization by countries around the world. However, few gifts have a clear, documented rationale behind them. To address this issue, we are employing Large Language Models (LLM) to investigate the underlying circumstances for a country to donate a gift to the UN at a particular time.

Problem Statement: “What were the motives behind a particular gift given to the UN by a specific country in a given year?”

Initially, we considered two approaches:

Why was a particular gift deemed special enough for presentation?
Why was the gift given in a specific year?

After conducting preliminary data exploration using LLM apps by manually inputting queries based on UN reports and Wikipedia articles, we decided to combine the two approaches and find the reasoning behind a gift’s timing and motive. Hence, we decided to shift our focus to using the LLMs to explore how a country’s circumstances at the time of its gift to the UN influenced its decision to send that gift.

It was also during this initial phase that our group decided to use Google Gemini-1.5 Flash as the LLM for the pipeline because of its ease of integration and our familiarity with Google Gemini at the time. Gemini-1.5 Flash offers a good balance of performance, which was why it was chosen for our project.

Hallucination and Vagueness

Our first step in addressing our problem statement was to use a variety of Large Language Models (through their online apps) to identify the reason for which a gift was gifted to the United Nations. However, we soon realized that the answers provided by all the LLMs were too general and vague, even when we provided the LLMs with specific information about the gift from the UN website. Additionally, we faced problems with the LLMs sometimes giving incorrect answers or answering with fake information, an issue known as hallucination. These hallucinations cause LLMs to be suboptimal when answering questions requiring logical reasoning but lacking detailed data.

This issue has two major solutions.

Fine Tuning the LLM: This involves training the model on data from a specific field to improve its accuracy, and it can be a rigorous process. Given the extensive pre-training LLMs undergo on vast datasets, additional fine-tuning might not drastically alter performance without substantial and high-quality data, which often isn’t readily accessible.
RAG (Retrieval-Augmented Generation): This improves the efficacy of an LLM by incorporating external data sources. Instead of relying solely on pre-trained knowledge, the model retrieves relevant information from predefined datasets, improving the accuracy and relevance of its responses.

Our group chose RAG as our solution since we could not get ahold of enough data to fine-tune the LLM. In our case, RAG would provide the LLM with external information from sources like UN reports and Wikipedia articles about the circumstances in which the gift was given. This would enable the LLM to provide more specific answers to our problem statement. At first, we manually chose what information to provide the LLM after determining that it offered value to a potential answer; however, this would not be achievable in an automatic pipeline. Hence, we began to focus on developing a general search method.

Web Scraping Prompt Generation

We first looked at possible methods of using static search prompts (ie. Country x Diplomatic Relationship with the UN in Year y). However, this approach did not cover all potential motives involved in gifting to the United Nations. To resolve this, our group used RAG by providing the LLM with the gift description from the UN website to create targeted, relevant search prompts.

Before creating the queries for creating search prompts, our group first looked into methods of increasing the accuracy and credibility of information collected by these prompts. We initially decided to use Google Scholar as the main information source. However, after attempting to perform web scraping on Google Scholar results, the issues and limitations of the Google Scholar search became apparent, leading to the decision to abandon this approach. This caused our group to shift towards using Google Searches and prioritizing certain credible websites. The web scraper would later be enhanced to use multiple Google searches and prompts generated by Google Gemini.

At first, our RAG used three queries to make Google Gemini generate Google search prompts. This combination of queries was later found to be problematic; specifically, the query that asked for search prompts related to promoting “peace and prosperity” in the donor country led to the creation of misleading prompts.

To address this, we decreased our number of queries to two and modified our queries to allow Gemini to generate the donation reasons for the search prompts, making the prompts more fitting for each situation.

Another problem we encountered with the older versions of our queries was that Google Gemini provided large amounts of Google search prompts (10+ prompts). This didn’t have any noticeable impact on improving performance, it simply consumed more SerpApi Google searches and made it take more time to run the pipeline. We solved this problem in the final version of the queries by limiting the number of Google search prompts Google Gemini can provide to five.

Final Response RAG

To answer our problem statement, our group once again utilized RAG with Google Gemini. There were two parts to the queries in this section. The first part asked Google Gemini to answer the final question using the information scraped from multiple collected websites, and the second part summarized the answers that Google Gemini provided to output a final response.

As we could not use all of the collected paragraphs to address the first part of the queries–that would cause the pipeline to take an incredible amount of time to run–we needed to decide what to compare the paragraphs to in order to determine their relevance. Our first approach was to compare the paragraphs to the final question. Unfortunately, the final responses from this method were quite vague. To improve this, we tested different numbers of relevant paragraphs being chosen to be provided to Google Gemini. However, this approach gave minimal improvement. At this point, we decided to compare the paragraphs to the Google search prompts from earlier in the pipeline instead of the final question and change the final question from something quite similar to our problem statement to a query asking Google Gemini to hypothesize possible answers. These changes substantially improved our responses, as they allowed Google Gemini to formulate its responses based on events around the time period that do not explicitly have a connection with the donation but are likely important contributing factors to the event.

Flowchart

After much consideration, we arrived at the final model of the pipeline. The process begins with inputting the link to the relevant gift from the UN website. Then, a Beautiful Soup website scraper is employed to retrieve information regarding the gift’s description, donation date, and donor country. Using a RAG method, that information, along with the finalized prompt questions optimized to procure the most relevant resources, will be used to generate search prompts to find relevant websites on Google. Then, SerpApi and Beautiful Soup collect and scrape relevant websites and the information they collect is sorted into two collections: Priority and Non-Priority.

The Priority collection contains all resources we deemed reliable–PDFs, Wikipedia and Britannica articles, and UN and government websites–whereas the Non-Priority collection includes all other resources. The information most relevant to the Google search prompts is then collected from the Priority collection using ChromaDB and given to Google Gemini using RAG to answer our problem statement: “What were the motives behind a particular gift given to the UN by a specific country in a given year?” In the case of the Poseidon of Artemision gifted by Greece and the Tajik Scalloped Crown gifted by Tajikistan, the answers are as follows.

Accuracy Metric

We did not have a concrete basis to devise the accuracy metric in this case, so we proceeded with the classic mean. The responses from multiple different gifts were rated on a scale of five, taking into account our understanding of the topic and supplemented by relevant Google research. After averaging our scores, we determined the accuracy of our model to be 76.5%. In the example of the Poseidon of Artemision statue listed above, the pipeline was able to provide many great conjectures into possible reasons behind the donation of the gift. These include the Greek Civil War, the relationship with the United States, and the tension during the Cold War, all of which the LLM supported with well-written explanations. The response highlights Greece’s connection with the United States, implying that the donation was a gesture of gratitude toward the United States, especially considering the U.S.’s strong influence over the United Nations at that time. It is because of this level of detail in the response, that we deem this a great response to our problem statement, which may not be able to be replicated by people without much effort. However, the example of the Tajik Scalloped Crown does not provide such quality conjectures, with most of the responses being general information. Even so, this response is based on the accurate circumstances of Tajikistan during the early 2020s, giving an average response. Based on these two examples we can see that the pipeline can provide a range of great and average responses, which is the reason behind the 76.5% accuracy score.

Conclusion

This project aims to provide insights for researchers studying diplomatic relations among UN member countries. By examining the circumstances of the donor country during the time the gift was donated, we can inform and inspire a deeper understanding of the dynamics and motivations that shape international diplomacy.

Future developments on this project would include:

Developing a method to remove non-sentence text from the output.
Compiling information from both priority and non-priority collections.
Filtering out inaccurate information.
Providing the Google Gemini LLM with relevant data to get more precise results.
Implementing a machine-learning model to further identify patterns in gift-giving across countries and historical periods, which would complement the results of this project.

All of these improvements will help increase the quality of the LLM’s Responses, providing a better understanding of the motivations behind the donation of a gift.

Finally, the method shown in this project’s pipeline would also be beneficial in other humanities research areas, where this use of RAG with an LLM can allow the LLM to generate quality responses that can be used as a starting point for future research.

Link to the Pipeline

https://github.com/MehtaPlusTutoring/studentprojects/blob/main/aimlresearchbootcamp/2024/midterm/UN_Gift_Motive_Analysis_Pipeline.ipynb