top of page

Here, you can view all my project works and their reports. A brief description of each project is given below. These projects mainly deal with NGS data analysis. The statistical tools used were SAS, JMP Pro, Tableau and Excel. The scripting languages used were UNIX, R, Python, Biopython and MySQL. The data used was retrieved from databases such as DEG, EcoCyc, NCBI, Uniprot, PDB, KEGG, Swissprot and BRENDA.

Other bioinformatic tools used are Acelerys Discovery Studio, BLAST, Clustal, iTOL, LigandScout and PhyLOT. The biotechnology procedures include ELISA, HPLC, TLC, GC, X-Ray, Western blot, immunofluorescence assay, diffraction by following GLPs and SOPs.

Evolution of Protein Complexes in Bacteria                                                                                

Jan 2017 – May 2017

  • Conducted a comparison assay between 894 distinct bacterial genomes with fractional conservation and the limits of co-conservation among components of protein complexes.

  • After analysis: no clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins.

  • 183 complexes with well-conserved components and uncharacterized proteins were identified that are used as targets for the experimental studies. Majority of protein complexes is not conserved, demonstrating an unexpected evolutionary flexibility.

  • Currently, researching on broader trends within protein complex conservation in genome-reduced species with minimal sets of protein complexes.

  • Also, considering the essential subunits from the DEG database, we are performing mapping analysis to identify the essential protein across these bacterial species to further find the relationship between the presence and absence of the subunits in the protein complexes’ functionality.

  • Presented a poster on the scientific paper based on the research done in this study for the Graduate Symposium 2017. Team Member - Wedad Albalawi

A Case Study: Monsanto and Monarch Butterfly    

Jan 2017 – Mar 2017

 

  • A team case study done on The Harvard Business Press Case, Monsanto and the Monarch Butterfly.

  • An overall study of the company, Monsanto; its growth and fall, the effect of Genetically Modified food on animals and in this case, monarch butterflies specifically, was done.

  • Business Analysis using STEPS, SWOT and Situation analysis techniques were done.

  • Finally, an oral presentation and a report summarizing facts of case were presented. Team Members - Prudhvi Meda and Wedad Albalawi

Functional Mapping Program to study the Gene Architecture

Nov 2016 

 

  • Python language was used to script the code for this program to study the genetic architecture. The program maps a series of genes to a segment of 3 genes from M. genitalium.

  • The code implemented biopython methods and regular expression methods to concatenate and to find results of occurrences of a read in the 3 genes from M. genitalium in fasta files.

  • Analysis of the results provided a useful quantitative and testable framework for assessing the interplay between gene actions or interactions and developmental changes.

3D secondary protein structure prediction                                                                                    

Jan 2016

 

  • Secondary and 3D structure prediction for H8FDZ3 protein sequence was done using GOR and Swiss-model.

  • PROSESS, SCOP, CATH and RC-plot were used to verify the structural class and quality.

  • Finally, using HTML, CSS and Jmol, an interactive visualization tool was created, which allowed the user to view the protein molecule and analyze it by moving the cursor for rotation and analyzing it by a particular amino acid.

BlastN and BlastP

Oct 2016 

 

  • Used python language to modify a code that works as a version of BlastN and BlastP.

  • The program prompts the user to input files that should be aligned and the user chooses if it should be a protein or nucleotide alignment.

  • If the user gives a nucleotide sequence as the input and chooses the BlastP alignment, the code translates the nucleotide sequence into an amino acid sequence.

  • The program was tested with the results of the NCBI blast results of the same input sequences. 

PROJECTS

Genomic comparison to identify pathogenic genes                                                                                

September 2017

  • A comparison assay was conducted of two sets of genomes to look for genes responsible for pathogenesis. The aim of this study was to gain an understanding for the basis of pathogenesis by the deadly E. coli O104:H4 strain.

  • The data was retrieved from the NCBI protein dataset. Compared the total complement of pathogenic protein with that of the non-pathogenic strain, E. coli K12. 

  • The comparison result was scanned by a program coded in python to extract the useful information from the output file. Analysis of the result lead to identification of pathogenic functions within the harmful sequence.

SHWETHA HARA SRIDHAR

Address: Richmond, VA 23219

E-mail: sridharsh@vcu.edu

Identifying DNA foreign to a genome
Nov 2016 – Dec 2016

 

  • The purpose of this study is to adapt a technique that could help make use of the information within genes and use it to identify foreign genes.

  • Programmed using python to extract sufficient amount of DNA from a single gene to permit the characteristics of foreign genes to reliably rise above random variation. 

  • Detected pathogenicity islands for the detection of individual foreign genes. 

  • Coded in python a FastA program with modules to use the Markov model that demonstrates iterable objects and creates Markov model based on set of DNA sequences. 

  • Training set of DNA from bona fide genes of Synechocystis PCC 6803 and all protein-encoding genes from Synechocystis PCC 6803

  • Coded in python a program that creates Markov model based on text in input file by creating pseudotext and scoring the FastA format DNA using a Markov model.

  • Results identified all genes in the sequenced genome of a bacterium that have foreign origins.

 

Applying structural information from one protein to another
Sep 2016 – Nov 2016

  • The purpose of the project is to find a way to produce an enzyme in E. coli for an industrial process, but at high levels of expression, the protein precipitates in an inactive form. 

  • PDB (protein data base) files were used to retrieve the protein structural information and to facilitate visualization of macromolecular structures.

  • PDB-formatted file (1DLI) containing coordinates for UDPGD from S. pyogenes was downloaded from the RCSB protein data bank

  • Jmol, a Java applet, was used for the visualisation of proteins in three dimensions. iMol was used as an alternative molecular visualisation package for MacOS X. FirstGlance in Jmol was used as the web-based interface for Jmol.

  • 1GZX was the PDB ID of the file containing the coordinates for the 3D structure of oxygenated human hemoglobin.

  • Predicted the three-dimensional structure of the protein and which amino acid was to be changed for prevention of precipitation. 

  • Analyzed comparing the similarities between the two proteins; the structure of a moderately similar protein is available in PDB and the unknown protein.

  • Results helped to predict the unknown three-dimensional structure of the protein.

  • Points of similarity between a protein with known structure and one whose structure is not known constrains the positioning of the dissimilar region and permits the approximation of the unknown protein.

  • Coded in python to read fastA files, to interconvert between amino acid naming conventions, and to superimpose one protein sequence on the structure of another. 

  • Sequence alignment of the UDPGD sequences from Streptoccus pyogenes and M. loti was done using the Clustal package (for ClustalW and Clustal X) and the amino acid residues of mutant UDP glucose dehydrogenase were identified from M. loti.

  • Prealigned Streptococcus and Mesorhizobium sequences from for ClustalW and Clustal X were used to test the program.

Biochemical pathway analysis for drug targeting
November 2016

 

  • The biochemical pathway focused in the study is the pathway of the glycolytic enzyme that causes sleeping sickness.

  • The purpose of this project is to model glycolysis computationally, so that the effects of inhibiting each enzyme can be predicted, thus directing scarce resources to the most likely targets.

  • Performed Metabolic Modeling to calculate the rate equations for enzymatically catalyzed reactions

  • Programmed to implement the concepts of Eisenthal & Cornish-Bowden and to model creating inhibitions into the model.

  • Calculated the rate equations for different types of inhibition of catalyzed reactions by metabolic modeling.

  • Programmed to model simple zero order reaction, to model glycolysis and to plot functions of a single variable.

 

 

Identification of genes turned on by a transcription factor cascade during a developmental process
Oct 2016 – Nov 2016

 

  • The purpose of this project was to study the developmental process that leads to sporulation in Bacillus subtilis, an organism that is closely related to the bacterium that causes anthrax and to learn which genes are expressed during different stages of sporulation.

  • The data used for this project is Bacillus subtilis and mouse microarrays. The microarray data was in the PCL or CDT formats and viewed in Java TreeView.

  • Supervised vs. unsupervised cluster analysis was done to the exploration of microarray datasets.

  • Coded a program in python that accomplishes part of the functionality of Eisen's Cluster, to working implementation of Average Linkage Clustering.

  • Produced output compatible with Eisen's TreeView. 

  • Large datasets like Real microarray data file with data from Bacillus subtilis microarrays and from mouse microarrays were used to test and explore the Cluster program.

Search for set of genes with common regulation
October 2016

  • The purpose of the project was to extract information from the known binding sequences of the cyanobacterium Anabaena genome.

  • Aligned the sequences using CLUSTAL Omega and CLUSTAL X

  • Identified positions in sequence alignments that carry the most information and used frequencies at those positions to characterize aligned motifs.

  • Constructed Position Specific Scoring Matrices from aligned sequences, scans genome, produces list of most plausible motifs

  • Used the positions to scan the genome of cyanobacterium Anabaena to highlight the binding sites.

Sequence alignment: comparison of mystery sequence to anthrax toxin
October 2016

  • Analyzed a DNA sequence from gene encoding the lethal factor of the toxin from Bacillus anthracis refused by BLAST to confirm the assessment.

  • Coded in python to perform Local pairwise sequence alignment, Smith-Waterman algorithm for exact alignments and modified Smith-Waterman algorithm (BLAST) for fast, approximate alignments.

  • Dissection of BLASTN was coded to investigate how BLAST works using different sequences and to find similarities between sequences or sets of sequences

  • Used Empirical scoring for scoring schemes for nucleotide sequence alignment and substitution matrices for scoring schemes for protein sequence alignment.

  • The program was tested with Dotmatrix with historical alignment, Smith-Waterman algorithm with gaps disallowed and Smith-Waterman algorithm with gaps allowed.

 

 


Identification of a possible regulatory site in genomic DNA
September 2016

  • The purpose of this project was to find a sequence upstream from an interesting gene that looks suspiciously like a regulatory sequence. Regulation of gene expression

  • Conducted a simple simulation by making up random sequences and counting the number of times a regulatory sequence arises.

  • Pattern recognition was done by scanning the entire genome, counting sequences that satisfy criteria for regulatory sequence.

  • The sequence was searched through entire genome. 

  • The program was tested with first a small part of the sequence and then the entire sequence of cyanobacterium Anabaena chromosome.

Comparison of genomes to look for genes responsible for pathogenesis
August 2016 

  • The data used for this study was retrieved from GenBank.

  • Coded in python to understand the basis for pathogenesis by E. coli O104:H4.

  • Comparative analysis was done between E. coli O104:H4 and its total complement of protein with that of the nonpathogenic strain E. coli K12. 

  • Pattern matching was done by extraction of strings through regular expressions similar to BLAST to find and parse the similarities between sequences or sets of sequences.

© 2023 by GREG SAINT. Proudly created with Wix.com

bottom of page