Language Model Query Language and Retrieval Augmented Generation¶
By Chris Swart
async def wikipedia(q):
from lmql.http import fetch
try:
q = q.strip("\n '.").strip()
pages = await fetch(f"https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles={q}&origin=*", "query.pages")
return list(pages.values())[0]["extract"][:500]
except:
return "No results"
@lmql.query(model="gpt-4")
async def norse_origins():
'''lmql
"Q: From which countries did the Norse originate?\n"
"Action: Let us search Wikipedia for the term '[TERM]\n" where STOPS_AT(TERM, "'")
wiki_result = await wikipedia(TERM)
"Result: {wiki_result}\n"
"Final Answer:[ANSWER]"
'''
result = await norse_origins()
pretty_html(result)
Action: Let us search Wikipedia for the term ' Norse'
Result: Norse is a demonym for Norsemen, a Medieval North Germanic ethnolinguistic group ancestral to modern Scandinavians, defined as speakers of Old Norse from about the 9th to the 13th centuries.
Norse may also refer to:
Final Answer: The Norse originated from the North Germanic region, which is modern Scandinavia.
@lmql.query()
async def rhetoric():
'''lmql
"Q: How was rhetoric taught in Ancient Greece?\n"
"Action: Let us search Wikipedia for the term '[TERM]\n" where STOPS_AT(TERM, "'")
wiki_result = await wikipedia(TERM)
"Result: {wiki_result}\n"
"Final Answer:[ANSWER]"
'''
result = await rhetoric()
pretty_html(result)
Action: Let us search Wikipedia for the term 'rhetoric'
Result: Rhetoric () is the art of persuasion. It is one of the three ancient arts of discourse (trivium) along with grammar and logic/dialectic. As an academic discipline within the humanities, rhetoric aims to study the techniques that speakers or writers use to inform, persuade, and motivate their audiences. Rhetoric also provides heuristics for understanding, discovering, and developing arguments for particular situations.
Aristotle defined rhetoric as "the faculty of observing in any given case the
Final Answer: According to Wikipedia, rhetoric was taught as one of the three ancient arts of discourse (along with grammar and logic/dialectic) and aimed to study the techniques of persuasion. It was also seen as a way to understand and develop arguments for specific situations. Aristotle defined rhetoric as "the faculty of observing in any given case the available means of persuasion."
@lmql.query(temperature=1.3)
async def rhetoric_v2():
'''lmql
"Q: How was rhetoric taught in Ancient Greece?\n"
for i in range(4):
"Action: Let us search Wikipedia for the wikipedia keyword '[TERM]\n" where STOPS_AT(TERM, "'")
print(TERM)
wiki_result = await wikipedia(TERM)
"Result for '{TERM}: {wiki_result}\n"
"If some of the results are relevant try to answer the question otherwise say NOT ENOUGH INFORMATION and explain why.\n"
"Final Answer:[ANSWER]"
'''
result = await rhetoric_v2()
rhetoric in Ancient Greece' rhetoric education in Ancient Greece' Ancient Greek education' Ancient Greek schooling'
pretty_html(result)
Action: Let us search Wikipedia for the wikipedia keyword 'rhetoric in Ancient Greece'
Result for 'rhetoric in Ancient Greece': No results
Action: Let us search Wikipedia for the wikipedia keyword 'rhetoric education in Ancient Greece'
Result for 'rhetoric education in Ancient Greece': No results
Action: Let us search Wikipedia for the wikipedia keyword 'Ancient Greek education'
Result for 'Ancient Greek education': No results
Action: Let us search Wikipedia for the wikipedia keyword 'Ancient Greek schooling'
Result for 'Ancient Greek schooling': No results
If some of the results are relevant try to answer the question otherwise say NOT ENOUGH INFORMATION and explain why.
Final Answer: NOT ENOUGH INFORMATION. Although rhetoric and education were important components of Ancient Greece's culture and society, it seems like there is limited information about specific ways in which rhetoric was taught in Ancient Greece. More research on primary sources and other academic articles may yield more information on the subject.
@lmql.query
def items_list(n: int):
'''lmql
for i in range(n):
"- '[ITEM]" where STOPS_AT(ITEM, "'")
'''
@lmql.query
async def rhetoric_v3():
"""
"Q: How was rethoric taught in Ancient Greece?\n"
"Relevant wikipedia search search keywords:\n[TERMS: items_list(3)]\n"
keywords = [e.split("'")[1] for e in TERMS.split("-")[1:]]
for kw in keywords:
wiki_result = await wikipedia(kw)
"Result for '{kw}': {wiki_result}\n"
"Final Answer:[ANSWER]"
"""
result = await rhetoric_v3()
pretty_html(result)
Relevant wikipedia search search keywords:
- 'rhetoric'- 'Ancient Greece'- 'education'
Result for 'rhetoric': Rhetoric () is the art of persuasion. It is one of the three ancient arts of discourse (trivium) along with grammar and logic/dialectic. As an academic discipline within the humanities, rhetoric aims to study the techniques that speakers or writers use to inform, persuade, and motivate their audiences. Rhetoric also provides heuristics for understanding, discovering, and developing arguments for particular situations.
Aristotle defined rhetoric as "the faculty of observing in any given case the
Result for 'Ancient Greece': Ancient Greece (Ancient Greek: Ἑλλάς, romanized: Hellás) was a northeastern Mediterranean civilization, existing from the Greek Dark Ages of the 12th–9th centuries BC to the end of classical antiquity (c. 600 AD), that comprised a loose collection of culturally and linguistically related city-states and other territories. Prior to the Roman period, most of these regions were officially unified once under the Kingdom of Macedon from 338 to 323 BC. In Western history, the era of classical antiquit
Result for 'education': Education is the transmission of knowledge, skills, and character traits and manifests in various forms. Formal education occurs within a structured institutional framework, such as public schools, following a curriculum. Non-formal education also follows a structured approach but occurs outside the formal schooling system, while informal education entails unstructured learning through daily experiences. Formal and non-formal education are categorized into levels, including early childhood educa
Final Answer: Rhetoric was taught in Ancient Greece as one of the three ancient arts of discourse, along with grammar and logic/dialectic. It was considered an important academic discipline within the humanities and aimed to study the techniques of persuasion and argumentation. The philosopher Aristotle defined rhetoric as the "faculty of observing in any given case the available means of persuasion." Education in Ancient Greece included both formal and non-formal methods, with a focus on developing knowledge, skills, and character traits.
print(get_stats())
OpenAI API Stats: 9 requests, 0 errors, 1489 tokens, 1.0 average batch size, reserved capacity 0/32000
Evaluating RAG¶
Context Recall¶
Prompt used
Given a context, and an answer, analyze each sentence in the answer and classify if the sentence can be attributed to the given context or not. Use only 'Yes' (1) or 'No' (0) as a binary classification. Output json with reason.
Factual Correctness¶
Where:
TP (True Positives): Statements in both Ground Truth and Generated Answer
FP (False Positives): Statements in Generated Answer but not in Ground Truth
FN (False Negatives): Statements in Ground Truth but not in Generated Answer
from IPython.display import HTML
HTML(html_content)
Factual Correctness
Faithfulness
from datasets import load_dataset
dataset = load_dataset("explodinggradients/amnesty_qa","english_v3")
/Users/chris/work/lmql-talk/venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Repo card metadata block was not found. Setting CardData to empty. Generating eval split: 20 examples [00:00, 1412.13 examples/s]
@lmql.query(model="gpt-4")
async def infer_response(user_input: str):
'''lmql
"Q: {user_input}\n"
for i in range(4):
"Action: Let's search Wikipedia for the wikipedia keyword '[TERM]\n" where STOPS_AT(TERM, "'")
result = await wikipedia(TERM)
"Result: {result}\n"
"Final Answer:[ANSWER]"
'''
inferred_result = await infer_response(user_input='When did the government of Qatar start repealing restrictions on migrant workers?')
from typing import List, Dict, Any
async def modify_dataset(dataset: EvaluationDataset) -> List[Dict[str, Any]]:
modified_samples = []
for sample in dataset:
inferred_result = await infer_response(sample.user_input)
print(inferred_result)
modified_sample = {
"user_input": sample.user_input,
"retrieved_contexts": extract_results(inferred_result.prompt),
"reference_contexts": sample.reference_contexts,
"response": inferred_result.variables["ANSWER"],
"multi_responses": sample.multi_responses,
"reference": sample.reference,
"rubric": sample.rubric
}
modified_samples.append(modified_sample)
return modified_samples
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4"))
metrics = [LLMContextRecall(), FactualCorrectness(), Faithfulness()]
results = evaluate(dataset=new_eval_dataset, metrics=metrics, llm=evaluator_llm,)
Evaluating: 0%| | 0/60 [00:00<?, ?it/s]Task was destroyed but it is pending! task: <Task cancelling name='check_done_task' coro=<Event.wait() running at /opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/locks.py:213> wait_for=<Future cancelled>> Evaluating: 23%|███████▋ | 14/60 [02:53<11:44, 15.31s/it]Exception raised in Job[32]: TimeoutError() Exception raised in Job[36]: TimeoutError() Exception raised in Job[22]: TimeoutError() Exception raised in Job[3]: TimeoutError() Exception raised in Job[23]: TimeoutError() Exception raised in Job[35]: TimeoutError() Exception raised in Job[13]: TimeoutError() Evaluating: 25%|████████▎ | 15/60 [03:00<09:29, 12.65s/it]Exception raised in Job[45]: TimeoutError() Evaluating: 37%|████████████ | 22/60 [03:04<02:16, 3.60s/it]Exception raised in Job[56]: TimeoutError() Evaluating: 38%|████████████▋ | 23/60 [03:06<02:05, 3.39s/it]Exception raised in Job[17]: TimeoutError() Evaluating: 40%|█████████████▏ | 24/60 [03:06<01:44, 2.92s/it]Exception raised in Job[19]: TimeoutError() Evaluating: 42%|█████████████▊ | 25/60 [03:07<01:28, 2.54s/it]Exception raised in Job[49]: TimeoutError() Evaluating: 47%|███████████████▍ | 28/60 [04:36<08:00, 15.01s/it]Exception raised in Job[33]: TimeoutError() Evaluating: 55%|██████████████████▏ | 33/60 [05:22<04:52, 10.82s/it]Exception raised in Job[16]: TimeoutError() Evaluating: 57%|██████████████████▋ | 34/60 [05:22<03:21, 7.76s/it]Exception raised in Job[20]: TimeoutError() Evaluating: 62%|████████████████████▎ | 37/60 [05:50<03:46, 9.86s/it]Exception raised in Job[10]: TimeoutError() Evaluating: 63%|████████████████████▉ | 38/60 [05:53<02:51, 7.80s/it]Exception raised in Job[31]: TimeoutError() Exception raised in Job[46]: TimeoutError() Evaluating: 65%|█████████████████████▍ | 39/60 [06:00<02:35, 7.41s/it]Exception raised in Job[53]: TimeoutError() Exception raised in Job[55]: TimeoutError() Evaluating: 70%|███████████████████████ | 42/60 [06:04<01:12, 4.03s/it]Exception raised in Job[25]: TimeoutError() Evaluating: 72%|███████████████████████▋ | 43/60 [06:06<01:01, 3.60s/it]Exception raised in Job[4]: TimeoutError() Evaluating: 73%|████████████████████████▏ | 44/60 [06:06<00:45, 2.87s/it]Exception raised in Job[38]: TimeoutError() Evaluating: 78%|█████████████████████████▊ | 47/60 [06:38<01:44, 8.03s/it]Exception raised in Job[2]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-alTt4mnHFE6PcR3DO6rZGbTd on tokens per min (TPM): Limit 10000, Used 9667, Requested 1303. Please try again in 5.82s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}) Evaluating: 88%|█████████████████████████████▏ | 53/60 [07:27<01:03, 9.02s/it]Exception raised in Job[0]: TimeoutError() Evaluating: 97%|███████████████████████████████▉ | 58/60 [08:13<00:20, 10.34s/it]Exception raised in Job[34]: TimeoutError() Evaluating: 98%|████████████████████████████████▍| 59/60 [08:22<00:09, 9.89s/it]Exception raised in Job[1]: TimeoutError() Evaluating: 100%|█████████████████████████████████| 60/60 [08:50<00:00, 8.84s/it]
results
{'context_recall': 0.0667, 'factual_correctness': 0.1900, 'faithfulness': 0.0909}
Context Recall¶
Prompt used
Given a context, and an answer, analyze each sentence in the answer and classify if the sentence can be attributed to the given context or not. Use only 'Yes' (1) or 'No' (0) as a binary classification. Output json with reason.
Factual Correctness¶
Where:
TP (True Positives): Statements in both Ground Truth and Generated Answer
FP (False Positives): Statements in Generated Answer but not in Ground Truth
FN (False Negatives): Statements in Ground Truth but not in Generated Answer
from IPython.display import HTML
HTML(html_content)
Factual Correctness
Faithfulness
First example¶
i = 0
print(f"""
QUESTION:
{new_eval_dataset[i].user_input}
EXPECTED ANSWER:
{new_eval_dataset[i].reference}
""")
QUESTION: What are the global implications of the USA Supreme Court ruling on abortion? EXPECTED ANSWER: The global implications of the USA Supreme Court ruling on abortion are significant. The ruling has led to limited or no access to abortion for one in three women and girls of reproductive age in states where abortion access is restricted. These states also have weaker maternal health support, higher maternal death rates, and higher child poverty rates. Additionally, the ruling has had an impact beyond national borders due to the USA's geopolitical and cultural influence globally. Organizations and activists worldwide are concerned that the ruling may inspire anti-abortion legislative and policy attacks in other countries. The ruling has also hindered progressive law reform and the implementation of abortion guidelines in certain African countries. Furthermore, the ruling has created a chilling effect in international policy spaces, empowering anti-abortion actors to undermine human rights protections.
i = 0
print(f"""
GENERATED ANSWER:
{new_eval_dataset[i].response}
RETRIEVED CONTEXTS
{[e for e in new_eval_dataset[i].retrieved_contexts if e != 'No results']}""")
GENERATED ANSWER: The information available does not provide specific details on the global implications of the USA Supreme Court ruling on abortion. However, the Roe v. Wade decision in 1973 by the U.S. Supreme Court generally protected a right to have an abortion, sparking an ongoing debate in the United States about the legality and morality of abortion. This ruling may have influenced discussions and legislation on abortion in other countries, but specific global implications are not detailed in the available information. RETRIEVED CONTEXTS ['Roe v. Wade, 410 U.S. 113 (1973), was a landmark decision of the U.S. Supreme Court in which the Court ruled that the Constitution of the United States generally protected a right to have an abortion. The decision struck down many abortion laws, and caused an ongoing abortion debate in the United States about whether, or to what extent, abortion should be legal, who should decide the legality of abortion, and what the role of moral and religious views in the political sphere should be. The decis']
Second example¶
i = 6
print(f"""
QUESTION:
{new_eval_dataset[i].user_input}
EXPECTED ANSWER:
{new_eval_dataset[i].reference}
""")
QUESTION: Which right guarantees access to comprehensive information about past human rights violations, including the identities of the perpetrators and the fate of the victims, as well as the circumstances surrounding the violations? EXPECTED ANSWER: The right that guarantees access to comprehensive information about past human rights violations, including the identities of the perpetrators and the fate of the victims, as well as the circumstances surrounding the violations, is the right to know the truth.
i = 6
print(f"""
GENERATED ANSWER:
{new_eval_dataset[i].response}
RETRIEVED CONTEXTS
{[e for e in new_eval_dataset[i].retrieved_contexts if e != 'No results']}""")
GENERATED ANSWER: The right that guarantees access to comprehensive information about past human rights violations, including the identities of the perpetrators and the fate of the victims, as well as the circumstances surrounding the violations, is known as the "Right to Truth". This right is especially relevant to transitional justice in dealing with past abuses of human rights. RETRIEVED CONTEXTS ["Right to truth is the right, in the case of grave violations of human rights, for the victims and their families or societies to have access to the truth of what happened. The right to truth is closely related to, but distinct from, the state obligation to investigate and prosecute serious state violations of human rights. Right to truth is a form of victims' rights; it is especially relevant to transitional justice in dealing with past abuses of human rights. In 2006, Yasmin Naqvi concluded tha"]
Third example¶
i = 11
print(f"""
QUESTION:
{new_eval_dataset[i].user_input}
EXPECTED ANSWER:
{new_eval_dataset[i].reference}""")
QUESTION: What conditions designate wetlands as Ramsar sites? EXPECTED ANSWER: The conditions that designate wetlands as Ramsar sites are when they fulfill the criteria for identifying wetlands of international importance, as established under the Convention on Wetlands.
i = 11
print(f"""GENERATED ANSWER:
{new_eval_dataset[i].response}
RETRIEVED CONTEXTS
{[e for e in new_eval_dataset[i].retrieved_contexts if e != 'No results']}""")
GENERATED ANSWER: A Ramsar site is a wetland site designated to be of international importance under the Ramsar Convention. This convention is an international environmental treaty signed in 1971 in Ramsar, Iran, under the auspices of UNESCO. It provides for national action and international cooperation regarding the conservation of wetlands, and wise sustainable use of their resources. The specific criteria for designation as a Ramsar site are not specified in the search results. RETRIEVED CONTEXTS ['A Ramsar site is a wetland site designated to be of international importance under the Ramsar Convention, also known as "The Convention on Wetlands", an international environmental treaty signed on 2 February 1971 in Ramsar, Iran, under the auspices of UNESCO. It came into force on 21 December 1975, when it was ratified by a sufficient number of nations. It provides for national action and international cooperation regarding the conservation of wetlands, and wise sustainable use of their resour', 'The Ramsar Convention on Wetlands of International Importance Especially as Waterfowl Habitat is an international treaty for the conservation and sustainable use of Ramsar sites (wetlands). It is also known as the Convention on Wetlands. It is named after the city of Ramsar in Iran, where the convention was signed in 1971.']