Graph Grammars for Molecular Substructure Search

There are many chemical description languages that are used currently, such as the Sybyl Line Notation (SLN) or the SMILES Arbitrary Target Specification (SMARTS). All of those describe molecular structures in the form of strings. These languages tend to be very complex and require significant training efforts before they can be routinely used. Furthermore, these languages do not support some relevant queries, such as the requirement that two substructures in the molecule are arbitrary but identical. To make matters worse, parsing and matching such patterns against databases of molecules is a very complex and time-consuming task. The related search problem is NP-complete. These description languages often suffer from the lack of a formal definition of their semantics. To circumvent these problems, we want to use graph grammars to describe substructures in a more direct and hence more intuitive way. Even graph grammars of very simple form allow a high expressive power, which almost reaches that of SMARTS. Graph grammars allow to specify some interesting substructures beyond the expressive power of SMARTS. Although the related search problem remains NP-complete, as they have almost the same expressive power. We are implementing a matching algorithm based on backward substitution. We will be able to answer many queries of the practitioners and get an early feedback on the relevant results in order to appropriately define the generality of the transformation rules. Deriving efficient algorithms for the most interesting formal languages will be the main work in this project.