A researcher has just finished writing a scientific paper. She knows her work could benefit from another perspective. Did she overlook something? Or perhaps there’s an application of her research she hadn’t thought of. A second set of eyes would be great, but even the friendliest of collaborators might not be able to spare the time to read all the required background publications to catch up.
Rapid advances in AI and ML have given way to programs that can generate creative text and useful software code. These general-purpose chatbots have recently captured the public imagination. Existing chatbots—based on large, diverse language models—lack detailed knowledge of scientific sub-domains.
By leveraging a document-retrieval method, Yager’s bot is knowledgeable in areas of nanomaterial science that other bots are not. The details of this project and how other scientists can leverage this AI colleague for their own work have recently been published in Digital Discovery.
Rise of the robots
“CFN has been looking into new ways to leverage AI/ML to accelerate nanomaterial discovery for a long time. Currently, it’s helping us quickly identify, catalog, and choose samples, automate experiments, control equipment, and discover new materials. Esther Tsai, a scientist in the electronic nanomaterials group at CFN, is developing an AI companion to help speed up materials research experiments at the National Synchrotron Light Source II (NSLS-II).” NSLS-II is another DOE Office of Science User Facility at Brookhaven Lab.
At CFN, there has been a lot of work on AI/ML that can help drive experiments through the use of automation, controls, robotics, and analysis, but having a program that was adept with scientific text was something that researchers hadn’t explored as deeply. Being able to quickly document, understand, and convey information about an experiment can help in a number of ways—from breaking down language barriers to saving time by summarizing larger pieces of work.

Watching your language
To build a specialized chatbot, the program required domain-specific text—language taken from areas the bot is intended to focus on. In this case, the text is scientific publications. Domain-specific text helps the AI model understand new terminology and definitions and introduces it to frontier scientific concepts. Most importantly, this curated set of documents enables the AI model to ground its reasoning using trusted facts.
To emulate natural human language, AI models are trained on existing text, enabling them to learn the structure of language, memorize various facts, and develop a primitive sort of reasoning. Rather than laboriously retrain the AI model on nanoscience text, Yager gave it the ability to look up relevant information in a curated set of publications. Providing it with a library of relevant data was only half of the battle. To use this text accurately and effectively, the bot would need a way to decipher the correct context.
“A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things,” explained Yager. “This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry. We don’t want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call ’embedding,’ a way of categorizing and linking information quickly behind the scenes.”
Embedding is a process that transforms words and phrases into numerical values. The resulting “embedding vector” quantifies the meaning of the text. When a user asks the chatbot a question, it’s also sent to the ML embedding model to calculate its vector value. This vector is used to search through a pre-computed database of text chunks from scientific papers that were similarly embedded. The bot then uses text snippets it finds that are semantically related to the question to get a more complete understanding of the context.
The user’s query and the text snippets are combined into a “prompt” that is sent to a large language model, an expansive program that creates text modeled on natural human language, that generates the final response. The embedding ensures that the text being pulled is relevant in the context of the user’s question. By providing text chunks from the body of trusted documents, the chatbot generates answers that are factual and sourced.
“The program needs to be like a reference librarian,” said Yager. “It needs to heavily rely on the documents to provide sourced answers. It needs to be able to accurately interpret what people are asking and be able to effectively piece together the context of those questions to retrieve the most relevant information. While the responses may not be perfect yet, it’s already able to answer challenging questions and trigger some interesting thoughts while planning new projects and research.”

Bots empowering humans
CFN is developing AI/ML systems as tools that can liberate human researchers to work on more challenging and interesting problems and to get more out of their limited time while computers automate repetitive tasks in the background. There are still many unknowns about this new way of working, but these questions are the start of important discussions scientists are having right now to ensure AI/ML use is safe and ethical.
“There are a number of tasks that a domain-specific chatbot like this could clear from a scientist’s workload. Classifying and organizing documents, summarizing publications, pointing out relevant info, and getting up to speed in a new topical area are just a few potential applications,” remarked Yager. “I’m excited to see where all of this will go, though. We never could have imagined where we are now three years ago, and I’m looking forward to where we’ll be three years from now.”
For researchers interested in trying this software out for themselves, the source code for CFN’s chatbot and associated tools can be found in this GitHub repository.
More information: Kevin G. Yager, Domain-specific chatbots for science using embeddings, Digital Discovery (2023). DOI: 10.1039/D3DD00112A
News
Ultrasound-activated Nanoparticles Kill Liver Cancer and Activate Immune System
A new ultrasound-guided nanotherapy wipes out liver tumors while training the immune system to keep them from coming back. The study, published in Nano Today, introduces a biodegradable nanoparticle system that combines sonodynamic therapy and cell [...]
Magnetic nanoparticles that successfully navigate complex blood vessels may be ready for clinical trials
Every year, 12 million people worldwide suffer a stroke; many die or are permanently impaired. Currently, drugs are administered to dissolve the thrombus that blocks the blood vessel. These drugs spread throughout the entire [...]
Reviving Exhausted T Cells Sparks Powerful Cancer Tumor Elimination
Scientists have discovered how tumors secretly drain the energy from T cells—the immune system’s main cancer fighters—and how blocking that process can bring them back to life. The team found that cancer cells use [...]
Very low LDL-cholesterol correlates to fewer heart problems after stroke
Brigham and Women's Hospital's TIMI Study Group reports that in patients with prior ischemic stroke, very low achieved LDL-cholesterol correlated with fewer major adverse cardiovascular events and fewer recurrent strokes, without an apparent increase [...]
“Great Unified Microscope” Reveals Hidden Micro and Nano Worlds Inside Living Cells
University of Tokyo researchers have created a powerful new microscope that captures both forward- and back-scattered light at once, letting scientists see everything from large cell structures to tiny nanoscale particles in a single shot. Researchers [...]
Breakthrough Alzheimer’s Drug Has a Hidden Problem
Researchers in Japan found that although the Alzheimer’s drug lecanemab successfully removes amyloid plaques from the brain, it does not restore the brain’s waste-clearing system within the first few months of treatment. The study suggests that [...]
Concerning New Research Reveals Colon Cancer Is Skyrocketing in Adults Under 50
Colorectal cancer is striking younger adults at alarming rates, driven by lifestyle and genetic factors. Colorectal cancer (CRC) develops when abnormal cells grow uncontrollably in the colon or rectum, forming tumors that can eventually [...]
Scientists Discover a Natural, Non-Addictive Way To Block Pain That Could Replace Opioids
Scientists have discovered that the body can naturally dull pain through its own localized “benzodiazepine-like” peptides. A groundbreaking study led by a University of Leeds scientist has unveiled new insights into how the body manages pain, [...]
GLP-1 Drugs Like Ozempic Work, but New Research Reveals a Major Catch
Three new Cochrane reviews find evidence that GLP-1 drugs lead to clinically meaningful weight loss, though industry-funded studies raise concerns. Three new reviews from Cochrane have found that GLP-1 medications can lead to significant [...]
How a Palm-Sized Laser Could Change Medicine and Manufacturing
Researchers have developed an innovative and versatile system designed for a new generation of short-pulse lasers. Lasers that produce extremely short bursts of light are known for their remarkable precision, making them indispensable tools [...]
New nanoparticles stimulate the immune system to attack ovarian tumors
Cancer immunotherapy, which uses drugs that stimulate the body’s immune cells to attack tumors, is a promising approach to treating many types of cancer. However, it doesn’t work well for some tumors, including ovarian [...]
New Drug Kills Cancer 20,000x More Effectively With No Detectable Side Effects
By restructuring a common chemotherapy drug, scientists increased its potency by 20,000 times. In a significant step forward for cancer therapy, researchers at Northwestern University have redesigned the molecular structure of a well-known chemotherapy drug, greatly [...]
Lipid nanoparticles discovered that can deliver mRNA directly into heart muscle cells
Cardiovascular disease continues to be the leading cause of death worldwide. But advances in heart-failure therapeutics have stalled, largely due to the difficulty of delivering treatments at the cellular level. Now, a UC Berkeley-led [...]
The basic mechanisms of visual attention emerged over 500 million years ago, study suggests
The brain does not need its sophisticated cortex to interpret the visual world. A new study published in PLOS Biology demonstrates that a much older structure, the superior colliculus, contains the necessary circuitry to perform the [...]
AI Is Overheating. This New Technology Could Be the Fix
Engineers have developed a passive evaporative cooling membrane that dramatically improves heat removal for electronics and data centers Engineers at the University of California San Diego have created an innovative cooling system designed to greatly enhance [...]
New nanomedicine wipes out leukemia in animal study
In a promising advance for cancer treatment, Northwestern University scientists have re-engineered the molecular structure of a common chemotherapy drug, making it dramatically more soluble and effective and less toxic. In the new study, [...]















