A researcher has just finished writing a scientific paper. She knows her work could benefit from another perspective. Did she overlook something? Or perhaps there’s an application of her research she hadn’t thought of. A second set of eyes would be great, but even the friendliest of collaborators might not be able to spare the time to read all the required background publications to catch up.
Rapid advances in AI and ML have given way to programs that can generate creative text and useful software code. These general-purpose chatbots have recently captured the public imagination. Existing chatbots—based on large, diverse language models—lack detailed knowledge of scientific sub-domains.
By leveraging a document-retrieval method, Yager’s bot is knowledgeable in areas of nanomaterial science that other bots are not. The details of this project and how other scientists can leverage this AI colleague for their own work have recently been published in Digital Discovery.
Rise of the robots
“CFN has been looking into new ways to leverage AI/ML to accelerate nanomaterial discovery for a long time. Currently, it’s helping us quickly identify, catalog, and choose samples, automate experiments, control equipment, and discover new materials. Esther Tsai, a scientist in the electronic nanomaterials group at CFN, is developing an AI companion to help speed up materials research experiments at the National Synchrotron Light Source II (NSLS-II).” NSLS-II is another DOE Office of Science User Facility at Brookhaven Lab.
At CFN, there has been a lot of work on AI/ML that can help drive experiments through the use of automation, controls, robotics, and analysis, but having a program that was adept with scientific text was something that researchers hadn’t explored as deeply. Being able to quickly document, understand, and convey information about an experiment can help in a number of ways—from breaking down language barriers to saving time by summarizing larger pieces of work.

Watching your language
To build a specialized chatbot, the program required domain-specific text—language taken from areas the bot is intended to focus on. In this case, the text is scientific publications. Domain-specific text helps the AI model understand new terminology and definitions and introduces it to frontier scientific concepts. Most importantly, this curated set of documents enables the AI model to ground its reasoning using trusted facts.
To emulate natural human language, AI models are trained on existing text, enabling them to learn the structure of language, memorize various facts, and develop a primitive sort of reasoning. Rather than laboriously retrain the AI model on nanoscience text, Yager gave it the ability to look up relevant information in a curated set of publications. Providing it with a library of relevant data was only half of the battle. To use this text accurately and effectively, the bot would need a way to decipher the correct context.
“A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things,” explained Yager. “This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry. We don’t want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call ’embedding,’ a way of categorizing and linking information quickly behind the scenes.”
Embedding is a process that transforms words and phrases into numerical values. The resulting “embedding vector” quantifies the meaning of the text. When a user asks the chatbot a question, it’s also sent to the ML embedding model to calculate its vector value. This vector is used to search through a pre-computed database of text chunks from scientific papers that were similarly embedded. The bot then uses text snippets it finds that are semantically related to the question to get a more complete understanding of the context.
The user’s query and the text snippets are combined into a “prompt” that is sent to a large language model, an expansive program that creates text modeled on natural human language, that generates the final response. The embedding ensures that the text being pulled is relevant in the context of the user’s question. By providing text chunks from the body of trusted documents, the chatbot generates answers that are factual and sourced.
“The program needs to be like a reference librarian,” said Yager. “It needs to heavily rely on the documents to provide sourced answers. It needs to be able to accurately interpret what people are asking and be able to effectively piece together the context of those questions to retrieve the most relevant information. While the responses may not be perfect yet, it’s already able to answer challenging questions and trigger some interesting thoughts while planning new projects and research.”

Bots empowering humans
CFN is developing AI/ML systems as tools that can liberate human researchers to work on more challenging and interesting problems and to get more out of their limited time while computers automate repetitive tasks in the background. There are still many unknowns about this new way of working, but these questions are the start of important discussions scientists are having right now to ensure AI/ML use is safe and ethical.
“There are a number of tasks that a domain-specific chatbot like this could clear from a scientist’s workload. Classifying and organizing documents, summarizing publications, pointing out relevant info, and getting up to speed in a new topical area are just a few potential applications,” remarked Yager. “I’m excited to see where all of this will go, though. We never could have imagined where we are now three years ago, and I’m looking forward to where we’ll be three years from now.”
For researchers interested in trying this software out for themselves, the source code for CFN’s chatbot and associated tools can be found in this GitHub repository.
More information: Kevin G. Yager, Domain-specific chatbots for science using embeddings, Digital Discovery (2023). DOI: 10.1039/D3DD00112A
News
AI Helped Scientists Stop a Virus With One Tiny Change
Using AI, researchers identified one tiny molecular interaction that viruses need to infect cells. Disrupting it stopped the virus before infection could begin. Washington State University scientists have uncovered a method to interfere with a key [...]
Deadly Hospital Fungus May Finally Have a Weakness
A deadly, drug-resistant hospital fungus may finally have a weakness—and scientists think they’ve found it. Researchers have identified a genetic process that could open the door to new treatments for a dangerous fungal infection [...]
Fever-Proof Bird Flu Variant Could Fuel the Next Pandemic
Bird flu viruses present a significant risk to humans because they can continue replicating at temperatures higher than a typical fever. Fever is one of the body’s main tools for slowing or stopping viral [...]
What could the future of nanoscience look like?
Society has a lot to thank for nanoscience. From improved health monitoring to reducing the size of electronics, scientists’ ability to delve deeper and better understand chemistry at the nanoscale has opened up numerous [...]
Scientists Melt Cancer’s Hidden “Power Hubs” and Stop Tumor Growth
Researchers discovered that in a rare kidney cancer, RNA builds droplet-like hubs that act as growth control centers inside tumor cells. By engineering a molecular switch to dissolve these hubs, they were able to halt cancer [...]
Platelet-inspired nanoparticles could improve treatment of inflammatory diseases
Scientists have developed platelet-inspired nanoparticles that deliver anti-inflammatory drugs directly to brain-computer interface implants, doubling their effectiveness. Scientists have found a way to improve the performance of brain-computer interface (BCI) electrodes by delivering anti-inflammatory drugs directly [...]
After 150 years, a new chapter in cancer therapy is finally beginning
For decades, researchers have been looking for ways to destroy cancer cells in a targeted manner without further weakening the body. But for many patients whose immune system is severely impaired by chemotherapy or radiation, [...]
Older chemical libraries show promise for fighting resistant strains of COVID-19 virus
SARS‑CoV‑2, the virus that causes COVID-19, continues to mutate, with some newer strains becoming less responsive to current antiviral treatments like Paxlovid. Now, University of California San Diego scientists and an international team of [...]
Lower doses of immunotherapy for skin cancer give better results, study suggests
According to a new study, lower doses of approved immunotherapy for malignant melanoma can give better results against tumors, while reducing side effects. This is reported by researchers at Karolinska Institutet in the Journal of the National [...]
Researchers highlight five pathways through which microplastics can harm the brain
Microplastics could be fueling neurodegenerative diseases like Alzheimer's and Parkinson's, with a new study highlighting five ways microplastics can trigger inflammation and damage in the brain. More than 57 million people live with dementia, [...]
Tiny Metal Nanodots Obliterate Cancer Cells While Largely Sparing Healthy Tissue
Scientists have developed tiny metal-oxide particles that push cancer cells past their stress limits while sparing healthy tissue. An international team led by RMIT University has developed tiny particles called nanodots, crafted from a metallic compound, [...]
Gold Nanoclusters Could Supercharge Quantum Computers
Researchers found that gold “super atoms” can behave like the atoms in top-tier quantum systems—only far easier to scale. These tiny clusters can be customized at the molecular level, offering a powerful, tunable foundation [...]
A single shot of HPV vaccine may be enough to fight cervical cancer, study finds
WASHINGTON -- A single HPV vaccination appears just as effective as two doses at preventing the viral infection that causes cervical cancer, researchers reported Wednesday. HPV, or human papillomavirus, is very common and spread [...]
New technique overcomes technological barrier in 3D brain imaging
Scientists at the Swiss Light Source SLS have succeeded in mapping a piece of brain tissue in 3D at unprecedented resolution using X-rays, non-destructively. The breakthrough overcomes a long-standing technological barrier that had limited [...]
Scientists Uncover Hidden Blood Pattern in Long COVID
Researchers found persistent microclot and NET structures in Long COVID blood that may explain long-lasting symptoms. Researchers examining Long COVID have identified a structural connection between circulating microclots and neutrophil extracellular traps (NETs). The [...]
This Cellular Trick Helps Cancer Spread, but Could Also Stop It
Groups of normal cbiells can sense far into their surroundings, helping explain cancer cell migration. Understanding this ability could lead to new ways to limit tumor spread. The tale of the princess and the [...]















