Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples
- Cite this paper as:
- Neglur G., Grossman R.L., Liu B. (2005) Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples. In: Ludäscher B., Raschid L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science, vol 3615. Springer, Berlin, Heidelberg
Integrating data involving chemical structures is simplified when unique identifiers (UIDs) can be associated with chemical structures. For example, these identifiers can be used as database keys. One common approach is to use the Unique SMILES notation introduced in . The Unique SMILES views a chemical structure as a graph with atoms as nodes and bonds as edges and uses a depth first traversal of the graph to generate the SMILES strings. The algorithm establishes a node ordering by using certain symmetry properties of the graphs. In this paper, we present certain molecular graphs for which the algorithm fails to generate UIDs. Indeed, we show that different graphs in the same symmetry class employed by the Unique SMILES algorithm have different Unique SMILES IDs. We tested the algorithm on the National Cancer Institute (NCI) database  and found several molecular structures for which the algorithm also failed. We have also written a python script that generates molecular graphs for which the algorithm fails.
Unable to display preview. Download preview PDF.