Latest revision as of 03:06, 8 December 2018

Team:Calgary/Software - 2018.igem.org

SOFTWARE

SARA: Software Aggregating Research Assistant

To increase the utility of our outreach, and help the iGEM community, we sought to create a tool that will be useful for years to come. Each year, many teams come to the iGEM competition with software that they developed in conjunction with their research. These tools cater directly to lab work and synthetic biology, and are imperative to the success of a team. At the start of our term, we looked for a tool that could either be directly applied to our work, or could be further built upon to suit our needs. As we searched for software, we found it difficult to sort through the iGEM wikis due to the sheer number of projects, and found that the software descriptions were difficult to discern amongst the wiki content.

Inspired by the opportunity to develop a database of iGEM software, we created SARA, the Software Aggregating Research Assistant, to allow for the simplified searching and management of past iGEM software projects. SARA provides the opportunity for old software to be updated and improved to stay current, and decreases the likelihood that teams will create redundant software.

As can be seen from the figure below, the number of software tools developed by teams have drastically increased. This trend suggests that in order to keep track of the evergrowing number of tools, an organized, cohesive, searchable storage system is required.

Figure 1. The number of iGEM created software tools per year

How does SARA work?

SARA utilizes a web scraper that finds software from past wikis using the standardized address for iGEM wiki pages, the desired year, the list of teams, and the Software suffix. The Scraper visits the software page of all teams in a given year, identifies if the page has content, and records the content if it is recognized as software. A similar algorithm is used to scrape the description pages of teams in the software track. A short description from the desired page is generated and stored in an excel file. To generate an accurate description of the software we attempted three strategies. The first strategy involved grabbing the first 500 words on the software page. However, inconsistencies in the format and presentation of information on the page meant that the description usually consisted of only background information or didn't fully capture the purpose or abilities of the software. The second strategy was to use various machine learning algorithms to extract important sentences from the pages, but this did not create a cohesive or coherent narrative as the sentences were often taken out of context. Our last strategy was to generate descriptions by manually reading and paraphrasing the information on the software pages. This was the most accurate and complete method, as we were also able to record the name of the software and any github or download links, but also the most time consuming. Ultimately, we found that the manual approach was the most reliable and robust, which was needed for a tool intended to be used by future teams.

How can I access SARA?

We chose to distribute our database in two ways. A local application was created with web scraping, parsing, and database capabilities, which can be used after downloading as an .exe. Two online versions of the database and scraper were also created. To start, online hosting and database services were explored, with two viable options. The first was Caspio, a free service for creating online databases, which allowed for the uploading of a database and its customized presentation on any webpage. However we found that search functions and the integration of the web scraper were limited. Option two was independent hosting, which allowed for complete control of the database, its appearance, and its capabilities, however it was possibly expensive and difficult to implement. We ultimately went with the independent hosting option, as we felt it allowed for maximum usefulness and customizability. For both the online and local versions of SARA, each entry includes the team name, the year that the team competed, the name of the software, an accurate description of the software, and a link to a github or downloadable files. Users can search by team, year, or a keyword. The database can be updated each year by running the web scraper with the latest year inputted as a parameter. Edits or additions can be manually submitted and reviewed for accuracy by other users or administration.

Click HERE to access SARA.

Figure 2. The home page of SARA

Figure 3. The software list generated by SARA

Figure 4. Example of software entry

Figure 5. The editing view for entries

Ultimately, SARA makes it easier for teams to find iGEM software to use and to build upon. Hopefully by building on existing software and improving its functionality, iGEM software will be maintained and increase in usefulness. Improved access to existing software through SARA will also reduce teams’ workload as they will be able to effectively utilize these tools.

WORKS CITED

Luhn, H. P. (1958). The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 2(2), 159-165. doi:10.1147/rd.22.0159

@@ Line 6: / Line 6: @@
      <title>Team:Calgary/Software - 2018.igem.org</title>
      <!-- CSS -->
+    <link rel="stylesheet" href="https://2018.igem.org/Team:Calgary/info_css?action=raw&ctype=text/css">
      <style>
-         html body {
+         .info-link {
-            color: rgba(0, 0, 0, 0.9);
+             padding: 0 !important;
-        }
-        .maincontent {
-            background-color: #eeedf0;
-            margin-top: 210px;
-            margin-bottom: 210px;
-            padding: 0;
-        }
-        a:link {
-             color: #f38f8f;
-        }
-        a:visited {
-            color: #f38f8f;
-        }
-        a:hover {
-            color: #f38f8f;
-        }
-        a:active {
-            color: #f38f8f;
-        }
-        h1 {
-            font-family: 'Josefin Sans', sans-serif !important;
-            font-weight: normal !important;
-            text-align: center;
-        }
-        h2 {
-            text-align: center;
-        }
-        h3 {
-            color: #7ccfb8 !important;
-        }
-        h5 {
-            color: rgba(0, 0, 0, 0.9);
-            padding-bottom: 10px;
-        }
-        p {
-            font-family: 'Josefin Sans', sans-serif !important;
-            line-height: 1.8;
-            text-align: left !important;
-        }
-        .infotitle {
-            text-align: left;
-        }
-        .infosection {
-            padding-bottom: 100px;
-            padding-left: 100px;
-            padding-right: 100px;
-        }
-        .infosubtitle {
-            text-align: left;
-        }
-        .info-img {
-            padding-left: 10%;
-            padding-right: 10%;
-        }
-        @media only screen and (max-width: 991px) {
-            .maincontent {
-                padding-left: 5%;
-                padding-right: 5%;
-                background-color: #FFFFFF;
-            }
-            .infotitle {
-                margin-top: 20%;
-                text-align: center;
-            }
-            .bottom-left {
-                display: none;
-            }
-            .infosection {
-                padding-bottom: 10px;
-                padding-left: 10px;
-                padding-right: 10px;
-                width: 100%
-            }
          }
      </style>
@@ Line 117: / Line 27: @@
                  imperative to the success of a team. At the start of our term, we looked for a tool that could either be
                  directly applied to our work, or could be further built upon to suit our needs. As we searched for software,
-                 we found it difficult to sort through the iGEM wiki’s due to the sheer amount of projects, and found that
+                 we found it difficult to sort through the iGEM wikis due to the sheer number of projects, and found that
                  the software descriptions were difficult to discern amongst the wiki content.</p>
              <br>
@@ Line 125: / Line 35: @@
                  the likelihood that teams will create redundant software.</p>
              <br>
-             <img class="info-img"src="https://static.igem.org/mediawiki/2018/5/5c/T--Calgary--SARANumSoftware.jpeg">
+          	<p style="text-indent: 0px">As can be seen from the figure below, the number of software tools developed by teams have drastically increased. This trend
+              suggests that in order to keep track of the evergrowing number of tools, an organized, cohesive, searchable storage system is required.</p>
+            <br>
+             <img class="info-img"src="https://static.igem.org/mediawiki/2018/0/03/T--Calgary--SARAGraph.png">
+          <p class = "caption"> Figure 1. The number of iGEM created software tools per year </p>
              <br>
-             <h3 class="infosubtitle">Why CRISPR?</h3>
+             <h3 class="infosubtitle">How does SARA work?</h3>
              <br>
-             <p style="text-indent: 0px">Lorem ipsum dolor sit amet consectetur adipisicing elit. Consequuntur, totam laudantium, dolor porro laboriosam
+             <p style="text-indent: 0px">SARA utilizes a web scraper that finds software from past wikis using the standardized address for iGEM wiki pages, the desired year, the list of teams, and the Software suffix. The Scraper visits the software page of all teams in a given year, identifies if the page has content, and records the content if it is recognized as software. A similar algorithm is used to scrape the description pages of teams in the software track. A short description from the desired page is generated and stored in an excel file. To generate an accurate description of the software we attempted three strategies. The first strategy involved grabbing the first 500 words on the software page. However, inconsistencies in the format and presentation of information on the page meant that the description usually consisted of only background information or didn't fully capture the purpose or abilities of the software. The second strategy was to use various machine learning algorithms to extract important sentences from the pages, but this did not create a cohesive or coherent narrative as the sentences were often taken out of context. Our last strategy was to generate descriptions by manually reading and paraphrasing the information on the software pages. This was the most accurate and complete method, as we were also able to record the name of the software and any github or download links, but also the most time consuming.  Ultimately, we found that the manual approach was the most reliable and robust, which was needed for a tool intended to be used by future teams. </p>
-                illo tenetur velit nulla corrupti quasi non eum amet quod dolores, doloremque eius ad temporibus perferendis!
-                Lorem ipsum dolor sit, amet consectetur adipisicing elit. Sit explicabo suscipit similique id expedita cum
-                consequatur voluptatibus consectetur adipisci beatae unde, cupiditate inventore. Quis officiis quam porro
-                a expedita non.</p>
              <br>
-            <p style="text-indent: 0px">Lorem ipsum dolor sit amet consectetur adipisicing elit. Temporibus rerum vel eius ut dolore, ab obcaecati officiis
+           <h3 class="infosubtitle">How can I access SARA?</h3>
-                modi porro, sunt deleniti, consequatur assumenda asperiores aliquid recusandae tenetur neque quae suscipit!
-                Lorem ipsum dolor sit amet consectetur adipisicing elit. Optio a quam iusto quo, nesciunt odit fuga, similique
-                aspernatur veritatis nemo commodi libero nobis magnam necessitatibus, quidem maiores error debitis minima.
-                Lorem ipsum dolor sit amet consectetur adipisicing elit. Quam ipsum consequatur, deserunt assumenda odio
-                natus. Quis, ea dolor! Voluptas dolore, facere cum illo sunt consectetur nam a soluta optio perferendis.</p>
              <br>
-             <p style="text-indent: 0px">Lorem ipsum dolor sit, amet consectetur adipisicing elit. Rerum sit perferendis eum delectus odit vero saepe,
+             <p style="text-indent: 0px">We chose to distribute our database in two ways. A local application was created with  web scraping, parsing, and database capabilities, which can be used after downloading as an .exe. Two online versions of the database and scraper were also created. To start, online hosting and database services were explored, with two viable options. The first was Caspio, a free service for creating online databases, which allowed for the uploading of a database and its customized presentation on any webpage. However we found that search functions and the integration of the web scraper were limited. Option two was independent hosting, which allowed for complete control of the database, its appearance, and its capabilities, however it was possibly expensive and difficult to implement. We ultimately went with the independent hosting option, as we felt it allowed for maximum usefulness and customizability. For both the online and local versions of SARA, each entry includes the team name, the year that the team competed, the name of the software, an accurate description of the software, and a link to a github or downloadable files. Users can search by team, year, or a keyword. The database can be updated each year by running the web scraper with the latest year inputted as a parameter. Edits or additions can be manually submitted and reviewed for accuracy by other users or administration.
-                dignissimos aspernatur et libero quisquam minima soluta a suscipit tempora dolores non aliquid ratione? Lorem
+              <br>
-                ipsum dolor, sit amet consectetur adipisicing elit. Sunt excepturi quod, doloribus et asperiores similique
+          <h5><center>Click <a class="info-link" href="https://igemcalgary.ca/sara" target="_blank">HERE</a> to access SARA.
-                 tempora, mollitia possimus doloremque officia deleniti eius aut dolorum fuga reiciendis adipisci esse quisquam
+            </center></h5>
-                 quae.
+          <br>
-            </p>
+          <img class="info-img"src="https://static.igem.org/mediawiki/2018/c/c6/T--Calgary--SARAHome.png">
-        </div>
+           <p class = "caption"> Figure 2. The home page of SARA </p>
+          <br>
+          <img class="info-img"src="https://static.igem.org/mediawiki/2018/7/7a/T--Calgary--SARASoftwareList.png">
+           <p class = "caption"> Figure 3. The software list generated by SARA </p>
+          <br>
+          <img class="info-img"src="https://static.igem.org/mediawiki/2018/9/93/T--Calgary--SARADetails.png">
+           <p class = "caption"> Figure 4. Example of software entry </p>
+          <br>
+          <img class="info-img"src="https://static.igem.org/mediawiki/2018/9/96/T--Calgary--SARAEdit.png">
+           <p class = "caption"> Figure 5. The editing view for entries </p>
+          <br>
+          <p style="text-indent: 0px">Ultimately, SARA makes it easier for teams to find iGEM software to use and to build upon. Hopefully by building on existing software and improving its functionality, iGEM software will be maintained and increase in usefulness. Improved access to existing software through SARA will also reduce teams’ workload as they will be able to effectively utilize these tools.
+          </p>
+      </div>
+      <div class="apa-reference">
+                 <h4 style="text-align: center">WORKS CITED</h4>
+                 <br>
+                <p>Luhn, H. P. (1958). The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 2(2), 159-165. doi:10.1147/rd.22.0159</p>
+            </div>
+      <br>
+      <br>
+      <br>
      </div>
 </body>
 </html>