A researcher has just finished writing a scientific paper. She knows her work could benefit from another perspective. Did she overlook something? Or perhaps there’s an application of her research she hadn’t thought of. A second set of eyes would be great, but even the friendliest of collaborators might not be able to spare the time to read all the required background publications to catch up.
Rapid advances in AI and ML have given way to programs that can generate creative text and useful software code. These general-purpose chatbots have recently captured the public imagination. Existing chatbots—based on large, diverse language models—lack detailed knowledge of scientific sub-domains.
By leveraging a document-retrieval method, Yager’s bot is knowledgeable in areas of nanomaterial science that other bots are not. The details of this project and how other scientists can leverage this AI colleague for their own work have recently been published in Digital Discovery.
Rise of the robots
“CFN has been looking into new ways to leverage AI/ML to accelerate nanomaterial discovery for a long time. Currently, it’s helping us quickly identify, catalog, and choose samples, automate experiments, control equipment, and discover new materials. Esther Tsai, a scientist in the electronic nanomaterials group at CFN, is developing an AI companion to help speed up materials research experiments at the National Synchrotron Light Source II (NSLS-II).” NSLS-II is another DOE Office of Science User Facility at Brookhaven Lab.
At CFN, there has been a lot of work on AI/ML that can help drive experiments through the use of automation, controls, robotics, and analysis, but having a program that was adept with scientific text was something that researchers hadn’t explored as deeply. Being able to quickly document, understand, and convey information about an experiment can help in a number of ways—from breaking down language barriers to saving time by summarizing larger pieces of work.
Watching your language
To build a specialized chatbot, the program required domain-specific text—language taken from areas the bot is intended to focus on. In this case, the text is scientific publications. Domain-specific text helps the AI model understand new terminology and definitions and introduces it to frontier scientific concepts. Most importantly, this curated set of documents enables the AI model to ground its reasoning using trusted facts.
To emulate natural human language, AI models are trained on existing text, enabling them to learn the structure of language, memorize various facts, and develop a primitive sort of reasoning. Rather than laboriously retrain the AI model on nanoscience text, Yager gave it the ability to look up relevant information in a curated set of publications. Providing it with a library of relevant data was only half of the battle. To use this text accurately and effectively, the bot would need a way to decipher the correct context.
“A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things,” explained Yager. “This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry. We don’t want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call ’embedding,’ a way of categorizing and linking information quickly behind the scenes.”
Embedding is a process that transforms words and phrases into numerical values. The resulting “embedding vector” quantifies the meaning of the text. When a user asks the chatbot a question, it’s also sent to the ML embedding model to calculate its vector value. This vector is used to search through a pre-computed database of text chunks from scientific papers that were similarly embedded. The bot then uses text snippets it finds that are semantically related to the question to get a more complete understanding of the context.
The user’s query and the text snippets are combined into a “prompt” that is sent to a large language model, an expansive program that creates text modeled on natural human language, that generates the final response. The embedding ensures that the text being pulled is relevant in the context of the user’s question. By providing text chunks from the body of trusted documents, the chatbot generates answers that are factual and sourced.
“The program needs to be like a reference librarian,” said Yager. “It needs to heavily rely on the documents to provide sourced answers. It needs to be able to accurately interpret what people are asking and be able to effectively piece together the context of those questions to retrieve the most relevant information. While the responses may not be perfect yet, it’s already able to answer challenging questions and trigger some interesting thoughts while planning new projects and research.”
Bots empowering humans
CFN is developing AI/ML systems as tools that can liberate human researchers to work on more challenging and interesting problems and to get more out of their limited time while computers automate repetitive tasks in the background. There are still many unknowns about this new way of working, but these questions are the start of important discussions scientists are having right now to ensure AI/ML use is safe and ethical.
“There are a number of tasks that a domain-specific chatbot like this could clear from a scientist’s workload. Classifying and organizing documents, summarizing publications, pointing out relevant info, and getting up to speed in a new topical area are just a few potential applications,” remarked Yager. “I’m excited to see where all of this will go, though. We never could have imagined where we are now three years ago, and I’m looking forward to where we’ll be three years from now.”
For researchers interested in trying this software out for themselves, the source code for CFN’s chatbot and associated tools can be found in this GitHub repository.
More information: Kevin G. Yager, Domain-specific chatbots for science using embeddings, Digital Discovery (2023). DOI: 10.1039/D3DD00112A
![](https://www.nanoappsmedical.com/wp-content/uploads/2017/05/spacer.jpg)
News
Breakthrough in Antimicrobial Technology with Cinnamon-Based Nanokiller
The need for innovative antimicrobial agents has become increasingly urgent due to the rise of antibiotic-resistant pathogens and the persistent threat of infections acquired during hospital stays. Traditional antibiotics and antiseptics are often ineffective [...]
The Silent Battle Within: How Your Organs Choose Between Mom and Dad’s Genes
Research reveals that selective expression of maternal or paternal X chromosomes varies by organ, driven by cellular competition. A new study published today (July 26) in Nature Genetics by the Lymphoid Development Group at the MRC [...]
Study identifies genes increasing risk of severe COVID-19
Whether or not a person becomes seriously ill with COVID-19 depends, among other things, on genetic factors. With this in mind, researchers from the University Hospital Bonn (UKB) and the University of Bonn, in [...]
Small regions of the brain can take micro-naps while the rest of the brain is awake and vice versa
Sleep and wake: They're totally distinct states of being that define the boundaries of our daily lives. For years, scientists have measured the difference between these instinctual brain processes by observing brain waves, with [...]
Redefining Consciousness: Small Regions of the Brain Can Take Micro-Naps While the Rest of the Brain Is Awake
The study broadly reveals how fast brain waves, previously overlooked, establish fundamental patterns of sleep and wakefulness. Scientists have developed a new method to analyze sleep and wake states by detecting ultra-fast neuronal activity [...]
AI Reveals Health Secrets Through Facial Temperature Mapping
Researchers have found that different facial temperatures correlate with chronic illnesses like diabetes and high blood pressure, and these can be detected using AI with thermal cameras. They highlight the potential of this technology [...]
Breakthrough in aging research: Blocking IL-11 extends lifespan and improves health in mice
In a recent study published in the journal Nature, a team of researchers used murine models and various pharmacological and genetic approaches to examine whether pro-inflammatory signaling involving interleukin (IL)-11, which activates signaling molecules such [...]
Promise for a universal influenza vaccine: Scientists validate theory using 1918 flu virus
New research led by Oregon Health & Science University reveals a promising approach to developing a universal influenza vaccine—a so-called "one and done" vaccine that confers lifetime immunity against an evolving virus. The study, [...]
New Projects Aim To Pioneer the Future of Neuroscience
One study will investigate the alterations in brain activity at the cellular level caused by psilocybin, the psychoactive substance found in “magic mushrooms.” How do neurons respond to the effects of magic mushrooms? What [...]
Decoding the Decline: Scientific Insights Into Long COVID’s Retreat
Research indicates a significant reduction in long COVID risk, largely due to vaccination and the virus’s evolution. The study analyzes data from over 441,000 veterans, showing lower rates of long COVID among vaccinated individuals compared [...]
Silicon Transformed: A Breakthrough in Laser Nanofabrication
A new method enables precise nanofabrication inside silicon using spatial light modulation and laser pulses, creating advanced nanostructures for potential use in electronics and photonics. Silicon, the cornerstone of modern electronics, photovoltaics, and photonics, [...]
Caught in the actinium: New research could help design better cancer treatments
The element actinium was first discovered at the turn of the 20th century, but even now, nearly 125 years later, researchers still don't have a good grasp on the metal's chemistry. That's because actinium [...]
Innovative Light-Controlled Drugs Could Revolutionize Neuropathic Pain Treatment
A team of researchers from the Institute for Bioengineering of Catalonia (IBEC) has developed light-activated derivatives of the anti-epileptic drug carbamazepine to treat neuropathic pain. Light can be harnessed to target drugs to specific [...]
Green Gold: Turning E-Waste Into a Treasure Trove of Rare Earth Metals
Scientists are developing a process inspired by nature that efficiently recovers europium from old fluorescent lamps. The approach could lead to the long-awaited recycling of rare earth metals. A small molecule that naturally serves [...]
Cambridge Study: AI Chatbots Have an “Empathy Gap,” and It Could Be Dangerous
A new study suggests a framework for “Child Safe AI” in response to recent incidents showing that many children perceive chatbots as quasi-human and reliable. A study has indicated that AI chatbots often exhibit [...]
Nanoparticle-based delivery system could offer treatment for diabetics with rare insulin allergy
Up to 3% of people with diabetes have an allergic reaction to insulin. A team at Forschungszentrum Jülich has now studied a method that could be used to deliver the active substance into the [...]