Materials Cartography

As the proliferation of high-throughput approaches in materials science is increasing the wealth of data in the field, the gap between accumulated-information and derived-knowledge widens. We address the issue of scientific discovery in materials databases by introducing novel analytic approaches based on structural and electronic materials fingerprints. The framework is employed to:


  1. Query large databases of materials using similarity concepts.
  2. Map the connectivity of materials space (i.e., as a materials cartograms) for rapidly identifying regions with unique organizations/properties.
  3. Develop predictive Quantitative Materials Structure–Property Relationship models for guiding materials design.


We have introduced novel materials descriptors that encode band structures (B-fingerprints), density of states (D-fingerprints), as well as crystallographic and constitutional information of materials. We employed materials fingerprints to visualize large collections of materials as a contact network, or Materials Cartograms. In the cartogram, the nodes of the network are individual materials, and similar materials are connected by an edge. Two materials (nodes) are connected only when the similarity of their fingerprints is above certain threshold.


Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints

Chem. Mater.2015, 27 (3), 735–743. DOI: 10.1021/cm503507h

Notably, we have established that in the cartogram, materials with similar properties (e.g., superconductors) tend to cluster together (side image). We have color-coded all materials on the cartogram according to their critical temperature, Tc. All high-Tc superconductors were localized in a relatively compact region centered on a tight group of Ba2Cu3XO7 compounds (so-called Y123, where X=Lanthanides). Their close grouping manifested a significant superconductivity hot-spot of materials with similar fingerprints. Thus, this representation can identify regions with distinct physical and chemical properties, the key step in searching for interesting yet unknown compounds.

Topology of the D-fingerprint network representation for ~20000 inorganic materials.


Mapping band gaps of materials on B-fingerprint network. Points colored in deep blue are metals; insulators are colored according to the band gap value. Four large communities are outlined.

Press coverage:

This paper is a tour de force for computational materials science. The authors, from the Tropsha and Curtarolo groups have applied state-of-the-art tools from chemoinformatics and machine learning to the challenging problem of materials design. In this paper, led by Olexandr Isayev, the authors employ chemical descriptors based on electronic properties such as the density of states, as well as local properties such as the modified simplex approach to catalog materials and their properties. They apply the methods to the large datasets compiled by the Curtarolo group and find very interesting domains of material space once they apply the Tanimoto similarity metric.

The authors generate networks that they call materials carotgrams, where the nodes are compounds and the connections are the similarities between them. It is nice to see that naturally, regions of similar physicochemical properties emerge from their analysis.


Prof. Alan Aspuru-Guzik, Harvard  University, Computational Chemistry Highlights, 2015