CIFStore is an application to create and manage a database of CIF files without hassle. It uses MongoDB, a open-source, non-relational database system. mmCIF files from the PDB are difficult to to store in a classical relational database because the used schema is not easy to keep up to date and the data is often quite messy. MongoDB as a document-oriented database, stores whole documents (obviously) instead of tables with columns. The great advantage here is that it does not require a predefined schema with table and column definitions - in addition, CIF files can easily be converted into JSON documents which MongoDB requires. So what does CIFStore do? The application can parse CIF files into JSON documents and insert them into the MongoDB database. A new collection is created for each directory by default and indexes are created automatically if defined in the configuration. The example configuration is ready to create collections for PDB structure and chemical component CIF files.
The structures collection containing mmCIF PDB structures can be created with:
1
| |
Querying the database with Python
MongoDB has drivers for range of programming of languages. The default language is Javascript, which should make is very easy to access CIFStore directly from within an web application.
Accessing CIFStore in Python is straightforward with PyMongo (the driver).
Launching queries is also very easy. The following query ‘finds’ the entry 2P33
and returns the comp_id of all non-polymers. Note how the queried (nested) attributes
directly correspond to the CIF format (e.g. pdbx_entity_nonpoly.comp_id).
The next query finds all entries that contain the non-polymer STI (Imatinib) and returns UniProt accessions of all the polymer entities in the structure (if any).
Much more sophisticated queries are possible as well, including MapReduce. Please refer to the MongoDB/PyMongo documentation for more information.
Obtaining CIFStore
CIFStore is released under the MIT license and can a development version can be downloaded from Bitbucket.
