Optimizing a Machine Learning System for Materials Discovery

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Moderator
Panelist
Alternative Title
Department
Haverford College. Department of Computer Science
Type
Thesis
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
eng
Note
Table of Contents
Terms of Use
Rights Holder
Access Restrictions
Open Access
Tripod URL
Identifier
Abstract
Advanced functional materials are crucial for addressing numerous challenges in medicine, communications, and energy. As highlighted by the White House Materials Genome Initiative, computational tools are critical for improving the materials discovery process. Many of the most promising materials are inorganicorganic hybrid materials. This broad class of compounds exhibits extraordinary structural diversity that has made them a topic of interest as materials for applications including energy storage, catalysis, photovoltaics, optical engineering, and gas sorption. The Norquist lab focuses on the exploratory synthesis of organically-templated metal oxides. While computational techniques have been applied extensively to predicting material properties, the Dark Reactions Project (DRP) takes a different, underexplored, approach using machine learning to improve the synthesis process itself. This system is already in use in the Norquist lab, increasing reaction success rates, but further improvement could make it a more reliable source of new reaction suggestions in unexplored chemical space. This thesis enhances the DRP by constructing machine learning models of several types and using them to investigate different ways of describing a chemical reaction. It investigates both automated feature selection and varied descriptors based on reasoned changes and variations of model cost functions to make them more reflective of the actual use case. Many models in this work improve on the previous models, and the best achieves an average accuracy of 80% predicting unseen reactions. The Matthews coeffi cient, a more robust measure of performance which indicates the correlation between predicted and actual outcomes, increases from 0.25 to 0.43. When incorporated into the recommendation pipeline, these models should result in improved reaction recommendations from the DRP and chemical hypotheses.
Description
Subjects
Citation
Collections