iGEM Database Search Tool
This tool was developed to provide future iGEM teams an improved tool to scan the Biobricks Database. To download the files required to set up the tool along with the documentation, please click the icon below:
Click on the icon to access the Software Package
Basic instructions for setting up the tool:
- To use the tool, Python 3 is required. Python 3 can be downloaded from here.
- Download the iGEM parts xml dump from the iGEM registry API, save it as a .txt file and place it in the folder that can be downloaded by clicking the icon above.
- Run the build.py file to clean up the xml dump. This will create an up to date xml file of the iGEM parts database.
- Alternatively, we have provided a copy of the cleaned up xml file that you would obtain from the above steps accessible here. Note that the file generated is large (200 mb+) and opening it with a text editior may take some time to load.
- The tool can now be used by calling the search.py file with various arguments in command line.
What does this software do?
The search tool allows for simple searching of the iGEM database, based on various features of each part. We felt that the iGEM database was not easy to navigate and wanted to create a tool to allow teams to easily find parts that meet multiple specific criteria. The tool allows for searches based on part type, sample status, part ID, sequence, sequence motif, part results, author, and more. We also incorporated negative searching into the tool to allow for searching of parts that do not meet search criteria. We felt that a negative searching tool would be invaluable to iGEM teams, as part of the medal criteria in the iGEM competition involves improving pre-existing parts. We envisage teams using the tool to search for parts that can be improved. For example, one could use the tool to look for all ‘Reporter’ parts whose results are classified as “Issues”, and that are missing a stop codon. The tool rapidly filters through the 38,000 part database, to return a small list of parts that meet the criteria.
What have we contributed?
Along with the search tool, we provide a method for generating a clean xml file of the iGEM database. Upon trying to create the software tool, we found the database dump available at the registry API to contain errors and encoding issues. Thus the xml dump could not be easily parsed. We provide a ‘cleaned’ up xml file for use with the search tool, along with the scripts employed to build the clean file. Thus teams who wish to build tools that operate on the database no longer have to spend time trying to process the data dump from the registry API. The method of building the xml file also allows the cleaned xml to be updated every year, meaning the tool can operate on the newest version of the database.
What could be improved?
The tool has been implemented as a command-line tool, due to inexperience with web development and hosting large databases online. A command-line interface may be foreign to some users, however, we have provided extensive documentation to explain exactly how the tool is used. In future, we would like to implement the tool on a website, allowing teams to access the search features from a familiar web environment using a GUI.
How does the tool work?
The tool utilizes the python modules, lxml, and sub-module xpath, to navigate the cleaned xml file as a tree structure. The general layout of the iGEM database as a tree structure is displayed in Figure 1.
Figure 1: The layout of iGEM's part database as a tree structure. Each part contains various nodes such as part_name, and part_type.
Using xpath, various nodes on the tree, and their content, can be accessed rapidly. We built generalized expressions that take in command-line arguments, and return nodes that meet the argument’s values. Unless the structure of the database xml file changes drastically, the tool should have good longevity.
Example Use Cases
Find all the parts in the database that are workable, are categorized as Ribosome binding sequences, and do not contain a start codon (atg).
User Input:
-r Fails F –t RBS –m atg F
Program Output:
235 parts met your search criteria. Find them in Results.txt
In this case, the 38,000 part database was reduced to 235 parts that met the user search criteria. To make it more specific, and return only a handful of parts, we will search by Author.
Find all the parts made by Yanting Xue, that do not fail.
User Types:
–r Fails F –a Yanting
Program Output:
{'BBa_K082004', 'BBa_K082006', 'BBa_K082025', 'BBa_K082009', 'BBa_K082003', 'BBa_K082030', 'BBa_K082024', 'BBa_K082001', 'BBa_K082026', 'BBa_K082029', 'BBa_K082000', 'BBa_K082028', 'BBa_K082015', 'BBa_K082022', 'BBa_K082016', 'BBa_K082005', 'BBa_K082002', 'BBa_K082034', 'BBa_K082018'}
19 parts met your search criteria
When the number of parts returned is small, the part IDs are printed directly to the screen, and can then be searched by ID in the iGEM parts registry for more information.