A researcher has just finished writing a scientific paper. She knows her work could benefit from another perspective. Did she overlook something? Or perhaps there's an application of her research she hadn't thought of. A second set of eyes would be great, but even the friendliest of collaborators might not be able to spare the time to read all the required background publications to catch up.
Rapid advances in AI and ML have given way to programs that can generate creative text and useful software code. These general-purpose chatbots have recently captured the public imagination. Existing chatbots—based on large, diverse language models—lack detailed knowledge of scientific sub-domains.
By leveraging a document-retrieval method, Yager's bot is knowledgeable in areas of nanomaterial science that other bots are not. The details of this project and how other scientists can leverage this AI colleague for their own work have recently been published in Digital Discovery.
Rise of the robots
"CFN has been looking into new ways to leverage AI/ML to accelerate nanomaterial discovery for a long time. Currently, it's helping us quickly identify, catalog, and choose samples, automate experiments, control equipment, and discover new materials. Esther Tsai, a scientist in the electronic nanomaterials group at CFN, is developing an AI companion to help speed up materials research experiments at the National Synchrotron Light Source II (NSLS-II)." NSLS-II is another DOE Office of Science User Facility at Brookhaven Lab.
At CFN, there has been a lot of work on AI/ML that can help drive experiments through the use of automation, controls, robotics, and analysis, but having a program that was adept with scientific text was something that researchers hadn't explored as deeply. Being able to quickly document, understand, and convey information about an experiment can help in a number of ways—from breaking down language barriers to saving time by summarizing larger pieces of work.
Watching your language
To build a specialized chatbot, the program required domain-specific text—language taken from areas the bot is intended to focus on. In this case, the text is scientific publications. Domain-specific text helps the AI model understand new terminology and definitions and introduces it to frontier scientific concepts. Most importantly, this curated set of documents enables the AI model to ground its reasoning using trusted facts.
To emulate natural human language, AI models are trained on existing text, enabling them to learn the structure of language, memorize various facts, and develop a primitive sort of reasoning. Rather than laboriously retrain the AI model on nanoscience text, Yager gave it the ability to look up relevant information in a curated set of publications. Providing it with a library of relevant data was only half of the battle. To use this text accurately and effectively, the bot would need a way to decipher the correct context.
"A challenge that's common with language models is that sometimes they 'hallucinate' plausible sounding but untrue things," explained Yager. "This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry. We don't want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call 'embedding,' a way of categorizing and linking information quickly behind the scenes."
Embedding is a process that transforms words and phrases into numerical values. The resulting "embedding vector" quantifies the meaning of the text. When a user asks the chatbot a question, it's also sent to the ML embedding model to calculate its vector value. This vector is used to search through a pre-computed database of text chunks from scientific papers that were similarly embedded. The bot then uses text snippets it finds that are semantically related to the question to get a more complete understanding of the context.
The user's query and the text snippets are combined into a "prompt" that is sent to a large language model, an expansive program that creates text modeled on natural human language, that generates the final response. The embedding ensures that the text being pulled is relevant in the context of the user's question. By providing text chunks from the body of trusted documents, the chatbot generates answers that are factual and sourced.
"The program needs to be like a reference librarian," said Yager. "It needs to heavily rely on the documents to provide sourced answers. It needs to be able to accurately interpret what people are asking and be able to effectively piece together the context of those questions to retrieve the most relevant information. While the responses may not be perfect yet, it's already able to answer challenging questions and trigger some interesting thoughts while planning new projects and research."
Bots empowering humans
CFN is developing AI/ML systems as tools that can liberate human researchers to work on more challenging and interesting problems and to get more out of their limited time while computers automate repetitive tasks in the background. There are still many unknowns about this new way of working, but these questions are the start of important discussions scientists are having right now to ensure AI/ML use is safe and ethical.
"There are a number of tasks that a domain-specific chatbot like this could clear from a scientist's workload. Classifying and organizing documents, summarizing publications, pointing out relevant info, and getting up to speed in a new topical area are just a few potential applications," remarked Yager. "I'm excited to see where all of this will go, though. We never could have imagined where we are now three years ago, and I'm looking forward to where we'll be three years from now."
For researchers interested in trying this software out for themselves, the source code for CFN's chatbot and associated tools can be found in this GitHub repository.
More information: Kevin G. Yager, Domain-specific chatbots for science using embeddings, Digital Discovery (2023). DOI: 10.1039/D3DD00112A

News
Nanomotors: Where Are They Now?
First introduced in 2004, nanomotors have steadily advanced from a scientific curiosity to a practical technology with wide-ranging applications. This article explores the key developments, recent innovations, and major uses of nanomotors today. A [...]
Study Finds 95% of Tested Beers Contain Toxic “Forever Chemicals”
Researchers found PFAS in 95% of tested beers, with the highest levels linked to contaminated local water sources. Per- and polyfluoroalkyl substances (PFAS), better known as forever chemicals, are gaining notoriety for their ability [...]
Long COVID Symptoms Are Closer To A Stroke Or Parkinson’s Disease Than Fatigue
When most people get sick with COVID-19 today, they think of it as a brief illness, similar to a cold. However, for a large number of people, the illness doesn't end there. The World [...]
The world’s first AI Hospital, developed in China is transforming healthcare
Artificial Intelligence and its developments have had a revolutionary impact on society, and healthcare is not an exception. China has made massive strides in AI integrated healthcare, and continues to do so as AI [...]
Scientists Rewire Immune Cells To Supercharge Cancer-Fighting Power
Blocking a single protein boosts T cell metabolism and tumor-fighting strength. The discovery could lead to next-generation cancer immunotherapies. Scientists have identified a strategy to greatly enhance the cancer-fighting abilities of the immune system’s [...]
Scientists Discover 20 Percent of Human DNA Comes from a Mysterious Ancestor
Humans carry a complex genetic history that continues to reveal surprises. Scientists have found that 20% of our DNA may come from a mysterious ancestor, according to WP Tech. This discovery changes how we understand [...]
AI detects early prostate cancer missed by pathologists
Men assessed as healthy after a pathologist analyses their tissue sample may still have an early form of prostate cancer. Using AI, researchers at Uppsala University have been able to find subtle tissue changes [...]
The Rare Mutation That Makes People Immune to Viruses
Some people carry a rare mutation that makes them resistant to viruses. Now scientists have copied that effect with an experimental mRNA therapy that stopped both flu and COVID in animal trials — raising [...]
Nanopore technique for measuring DNA damage could improve cancer therapy and radiological emergency response
Scientists at the National Institute of Standards and Technology (NIST) have developed a new technology for measuring how radiation damages DNA molecules. This novel technique, which passes DNA through tiny openings called nanopores, detects [...]
AI Tool Shows Exactly When Genes Turn On and Off
Summary: Researchers have developed an AI-powered tool called chronODE that models how genes turn on and off during brain development. By combining mathematics, machine learning, and genomic data, the method identifies exact “switching points” that [...]
Your brain could get bigger – not smaller – as you age
recently asked myself if I’ll still have a healthy brain as I get older. I hold a professorship at a neurology department. Nevertheless, it is difficult for me to judge if a particular brain, [...]
Hidden Cost of Smart AI: 50× More CO₂ for a Single Question
Every time we ask an AI a question, it doesn’t just return an answer—it also burns energy and emits carbon dioxide. German researchers found that some “thinking” AI models, which generate long, step-by-step reasoning [...]
Genetically-engineered immune cells show promise for preventing organ rejection
A Medical University of South Carolina team reports in Frontiers in Immunology that it has engineered a new type of genetically modified immune cell that can precisely target and neutralize antibody-producing cells complicit in organ rejection. [...]
Building and breaking plastics with light: Chemists rethink plastic recycling
What if recycling plastics were as simple as flicking a switch? At TU/e, Assistant Professor Fabian Eisenreich is making that vision a reality by using LED light to both create and break down a [...]
Generative AI Designs Novel Antibiotics That Defeat Defiant Drug-Resistant Superbugs
Harnessing generative AI, MIT scientists have created groundbreaking antibiotics with unique membrane-targeting mechanisms, offering fresh hope against two of the world’s most formidable drug-resistant pathogens. With the help of artificial intelligence, MIT researchers have [...]
AI finds more breast tumors earlier than traditional double radiologist review
AI is detecting tumors more often and earlier in the Dutch breast cancer screening program. Those tumors can then be treated at an earlier stage. This has been demonstrated by researchers led by Radboud [...]