A researcher has just finished writing a scientific paper. She knows her work could benefit from another perspective. Did she overlook something? Or perhaps there’s an application of her research she hadn’t thought of. A second set of eyes would be great, but even the friendliest of collaborators might not be able to spare the time to read all the required background publications to catch up.
Rapid advances in AI and ML have given way to programs that can generate creative text and useful software code. These general-purpose chatbots have recently captured the public imagination. Existing chatbots—based on large, diverse language models—lack detailed knowledge of scientific sub-domains.
By leveraging a document-retrieval method, Yager’s bot is knowledgeable in areas of nanomaterial science that other bots are not. The details of this project and how other scientists can leverage this AI colleague for their own work have recently been published in Digital Discovery.
Rise of the robots
“CFN has been looking into new ways to leverage AI/ML to accelerate nanomaterial discovery for a long time. Currently, it’s helping us quickly identify, catalog, and choose samples, automate experiments, control equipment, and discover new materials. Esther Tsai, a scientist in the electronic nanomaterials group at CFN, is developing an AI companion to help speed up materials research experiments at the National Synchrotron Light Source II (NSLS-II).” NSLS-II is another DOE Office of Science User Facility at Brookhaven Lab.
At CFN, there has been a lot of work on AI/ML that can help drive experiments through the use of automation, controls, robotics, and analysis, but having a program that was adept with scientific text was something that researchers hadn’t explored as deeply. Being able to quickly document, understand, and convey information about an experiment can help in a number of ways—from breaking down language barriers to saving time by summarizing larger pieces of work.
Watching your language
To build a specialized chatbot, the program required domain-specific text—language taken from areas the bot is intended to focus on. In this case, the text is scientific publications. Domain-specific text helps the AI model understand new terminology and definitions and introduces it to frontier scientific concepts. Most importantly, this curated set of documents enables the AI model to ground its reasoning using trusted facts.
To emulate natural human language, AI models are trained on existing text, enabling them to learn the structure of language, memorize various facts, and develop a primitive sort of reasoning. Rather than laboriously retrain the AI model on nanoscience text, Yager gave it the ability to look up relevant information in a curated set of publications. Providing it with a library of relevant data was only half of the battle. To use this text accurately and effectively, the bot would need a way to decipher the correct context.
“A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things,” explained Yager. “This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry. We don’t want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call ’embedding,’ a way of categorizing and linking information quickly behind the scenes.”
Embedding is a process that transforms words and phrases into numerical values. The resulting “embedding vector” quantifies the meaning of the text. When a user asks the chatbot a question, it’s also sent to the ML embedding model to calculate its vector value. This vector is used to search through a pre-computed database of text chunks from scientific papers that were similarly embedded. The bot then uses text snippets it finds that are semantically related to the question to get a more complete understanding of the context.
The user’s query and the text snippets are combined into a “prompt” that is sent to a large language model, an expansive program that creates text modeled on natural human language, that generates the final response. The embedding ensures that the text being pulled is relevant in the context of the user’s question. By providing text chunks from the body of trusted documents, the chatbot generates answers that are factual and sourced.
“The program needs to be like a reference librarian,” said Yager. “It needs to heavily rely on the documents to provide sourced answers. It needs to be able to accurately interpret what people are asking and be able to effectively piece together the context of those questions to retrieve the most relevant information. While the responses may not be perfect yet, it’s already able to answer challenging questions and trigger some interesting thoughts while planning new projects and research.”
Bots empowering humans
CFN is developing AI/ML systems as tools that can liberate human researchers to work on more challenging and interesting problems and to get more out of their limited time while computers automate repetitive tasks in the background. There are still many unknowns about this new way of working, but these questions are the start of important discussions scientists are having right now to ensure AI/ML use is safe and ethical.
“There are a number of tasks that a domain-specific chatbot like this could clear from a scientist’s workload. Classifying and organizing documents, summarizing publications, pointing out relevant info, and getting up to speed in a new topical area are just a few potential applications,” remarked Yager. “I’m excited to see where all of this will go, though. We never could have imagined where we are now three years ago, and I’m looking forward to where we’ll be three years from now.”
For researchers interested in trying this software out for themselves, the source code for CFN’s chatbot and associated tools can be found in this GitHub repository.
More information: Kevin G. Yager, Domain-specific chatbots for science using embeddings, Digital Discovery (2023). DOI: 10.1039/D3DD00112A

News
Repurposed drugs could calm the immune system’s response to nanomedicine
An international study led by researchers at the University of Colorado Anschutz Medical Campus has identified a promising strategy to enhance the safety of nanomedicines, advanced therapies often used in cancer and vaccine treatments, [...]
Nano-Enhanced Hydrogel Strategies for Cartilage Repair
A recent article in Engineering describes the development of a protein-based nanocomposite hydrogel designed to deliver two therapeutic agents—dexamethasone (Dex) and kartogenin (KGN)—to support cartilage repair. The hydrogel is engineered to modulate immune responses and promote [...]
New Cancer Drug Blocks Tumors Without Debilitating Side Effects
A new drug targets RAS-PI3Kα pathways without harmful side effects. It was developed using high-performance computing and AI. A new cancer drug candidate, developed through a collaboration between Lawrence Livermore National Laboratory (LLNL), BridgeBio Oncology [...]
Scientists Are Pretty Close to Replicating the First Thing That Ever Lived
For 400 million years, a leading hypothesis claims, Earth was an “RNA World,” meaning that life must’ve first replicated from RNA before the arrival of proteins and DNA. Unfortunately, scientists have failed to find [...]
Why ‘Peniaphobia’ Is Exploding Among Young People (And Why We Should Be Concerned)
An insidious illness is taking hold among a growing proportion of young people. Little known to the general public, peniaphobia—the fear of becoming poor—is gaining ground among teens and young adults. Discover the causes [...]
Team finds flawed data in recent study relevant to coronavirus antiviral development
The COVID pandemic illustrated how urgently we need antiviral medications capable of treating coronavirus infections. To aid this effort, researchers quickly homed in on part of SARS-CoV-2's molecular structure known as the NiRAN domain—an [...]
Drug-Coated Neural Implants Reduce Immune Rejection
Summary: A new study shows that coating neural prosthetic implants with the anti-inflammatory drug dexamethasone helps reduce the body’s immune response and scar tissue formation. This strategy enhances the long-term performance and stability of electrodes [...]
Scientists discover cancer-fighting bacteria that ‘soak up’ forever chemicals in the body
A family of healthy bacteria may help 'soak up' toxic forever chemicals in the body, warding off their cancerous effects. Forever chemicals, also known as PFAS (per- and polyfluoroalkyl substances), are toxic chemicals that [...]
Johns Hopkins Researchers Uncover a New Way To Kill Cancer Cells
A new study reveals that blocking ribosomal RNA production rewires cancer cell behavior and could help treat genetically unstable tumors. Researchers at the Johns Hopkins Kimmel Cancer Center and the Department of Radiation Oncology and Molecular [...]
AI matches doctors in mapping lung tumors for radiation therapy
In radiation therapy, precision can save lives. Oncologists must carefully map the size and location of a tumor before delivering high-dose radiation to destroy cancer cells while sparing healthy tissue. But this process, called [...]
Scientists Finally “See” Key Protein That Controls Inflammation
Researchers used advanced microscopy to uncover important protein structures. For the first time, two important protein structures in the human body are being visualized, thanks in part to cutting-edge technology at the University of [...]
AI tool detects 9 types of dementia from a single brain scan
Mayo Clinic researchers have developed a new artificial intelligence (AI) tool that helps clinicians identify brain activity patterns linked to nine types of dementia, including Alzheimer's disease, using a single, widely available scan—a transformative [...]
Is plastic packaging putting more than just food on your plate?
New research reveals that common food packaging and utensils can shed microscopic plastics into our food, prompting urgent calls for stricter testing and updated regulations to protect public health. Beyond microplastics: The analysis intentionally [...]
Aging Spreads Through the Bloodstream
Summary: New research reveals that aging isn’t just a local cellular process—it can spread throughout the body via the bloodstream. A redox-sensitive protein called ReHMGB1, secreted by senescent cells, was found to trigger aging features [...]
AI and nanomedicine find rare biomarkers for prostrate cancer and atherosclerosis
Imagine a stadium packed with 75,000 fans, all wearing green and white jerseys—except one person in a solid green shirt. Finding that person would be tough. That's how hard it is for scientists to [...]
Are Pesticides Breeding the Next Pandemic? Experts Warn of Fungal Superbugs
Fungicides used in agriculture have been linked to an increase in resistance to antifungal drugs in both humans and animals. Fungal infections are on the rise, and two UC Davis infectious disease experts, Dr. George Thompson [...]