INFORM February 2025 Volume 36 (2)

24 • inform February 2025, Vol. 36 (2)

The higher the number of “poisoned” images in the train ing data, the greater the disruption. Because of how genera tive AI works, the damage from “poisoned” images also affects related prompt keywords. For example, if a “poisoned” image of a Ferrari is used in training data, prompt results for other car brands and for other related terms, such as vehicle and automobile, can also be affected. Nightshade’s developer hopes the tool will make big tech companies more respectful of copyright, but it is also possible users could abuse the tool and intentionally upload “poisoned” images to generators to try and disrupt their services. Is there an antidote? In response, stakeholders have proposed a range of technological and human solutions. The most obvious is paying greater atten tion to where input data are coming from and how they can be used. Doing so would result in less indiscriminate data harvesting. This approach does challenge a common belief among computer scientists: that data found online can be used for any purpose they see fit. Other technological fixes also include the use of “ensem ble modeling” where different models are trained on many different subsets of data and compared to locate specific outli ers. This approach can be used not only for training but also to detect and discard suspected “poisoned” images. Audits are another option. One audit approach involves developing a “test battery”—a small, highly curated, and well-labelled dataset—using “hold-out” data that are never used for training. This dataset can then be used to examine the model’s accuracy. PROVING A MODEL IS TRUSTWORTHY Machine-learning models can make mistakes and be difficult to use, so scientists have developed explanation methods to help users understand when and how they should trust a model’s predictions. These explanations are often complex, however, perhaps containing information about hundreds of model features. And they are sometimes presented as multifaceted visualizations that can be difficult for users who lack machine-learning exper tise to fully comprehend. To help people make sense of AI explanations, MIT researchers used large language models (LLMs) to transform plot-based explanations into plain language. They developed a two-part system that converts a machine-learning explanation into a paragraph of human-read able text and then automatically evaluates the quality of the narrative, so an end-user knows whether to trust it. By prompting the system with a few example explana tions, the researchers can customize its narrative descriptions to meet the preferences of users or the requirements of spe cific applications. In the long run, the researchers hope to build upon this technique by enabling users to ask a model follow-up questions about how it came up with predictions in real-world settings.

“Our goal with this research was to take the first step toward allowing users to have full-blown conversations with machine-learning models about the reasons they made certain predictions, so they can make better decisions about whether to listen to the model,” says Alexandra Zytek, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique . The cause of the problem Chatbots can wear a lot of proverbial hats: dictionary, therapist, poet, all-knowing friend. The artificial intelligence models that power these systems appear exceptionally skilled and efficient at providing answers, clarifying concepts, and distilling informa tion. But to establish trustworthiness of content generated by such models, how can we really know if a particular statement is factual, a hallucination, or just a plain misunderstanding? In many cases, AI systems gather external information to use as context when answering a particular query. For exam ple, to answer a question about a medical condition, the sys tem might reference recent research papers on the topic. Even with this relevant context, models can make mistakes with what feels like high doses of confidence. To help tackle this obstacle, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers created ContextCite, a tool that can identify the parts of external con text used to generate any particular statement, improving trust by helping users easily verify the statement. “AI assistants can be very helpful for synthesizing informa tion, but they still make mistakes,” says Ben Cohen-Wang, an MIT doctoral student in electrical engineering and computer science, CSAIL affiliate, and lead author on a new paper. “Say that I ask an AI assistant how many parameters GPT-4o has. It might start with a Google search, finding an article that says that GPT-4—an older, larger model with a similar name—has 1 trillion parameters. Using this article as its context, it might then mistakenly state that GPT-4o has 1 trillion parameters.” Existing AI assistants often provide source links, but users would have to tediously review the article themselves to spot any mistakes. “ContextCite can help directly find the specific sentence that a model used, making it easier to verify claims and detect mistakes.” When a user queries a model, ContextCite highlights the specific sources from the external context that the AI relied upon for that answer. If the AI generates an inaccurate fact, users can trace the error back to its original source and under stand the model’s reasoning. If the AI hallucinates an answer, ContextCite can indicate that the information didn’t come from any real source at all. You can imagine a tool like this would be especially valu able in industries that demand high levels of accuracy, such as health care, law, and education. Pruning irrelevant context and detecting poisoning attacks Beyond tracing sources, ContextCite can also help improve the quality of AI responses by identifying and pruning irrel evant context. Long or complex input contexts, like lengthy

Made with FlippingBook Online newsletter creator