| HOME | SERVICES | TOOLS | WORKSHOPS | RESOURCES | PEOPLE |
The Penn Bioinformatics Core was established in order to provide compute resources, support and training to the University of Pennsylvania basic and biomedical research community. Our goal is to ensure that these scientists across the University are able to effectively exploit existing and emerging computational technologies, especially related to genomics scale datasets. We provide the following list of services in an attempt to reach these goals:
Consultation: Penn Bioinformatics Core personnel provide consulting to enable investigators to frame research questions in genomics terms and identify bioinformatics solutions to those questions. Core personnel are then available to facilitate these solutions which may involve developing databases and software tools as well as provide training and support in the use of these tools.
Grant preparation: Penn Bioinformatics Core personnel enable grant submission by helping to identify genomic and bioinformatics approaches to investigators’ research problems, framing the questions in genomics terms and providing letters of support that indicate sufficient expertise and tools exist to successfully complete the research indicated.
Data Integration: The Penn Bioinformatics Core maintains a large Oracle database server through which we have access to the GUS database developed in the Computational Biology and Informatics Laboratory here at Upenn. This database integrates much sequence and functional data available in the public domain making use of controlled vocabularies whenever available. We provide a separate instance of the GUS schema to use to store and integrate Penn investigators data generated in their laboratory research. We also write scripts as necessary to facilitate the integration of data of different types/sources.
Data Analysis: Whenever possible, data analysis is approached as a training opportunity. However, there are times when this is undesirable either due to the complexity of the data (and analysis options) or the wishes of the investigator. In these instances, we do perform data analysis but request the participation of the investigator (particularly in the case of microarray data) as a fundamental understanding of the biology and experiment that can only be supplied by the person who has done the work is required for an appropriate analysis.
Identify and make resources available: As bioinformatics/genomics needs arise in the research community at Penn or in individual laboratories, core personnel will help identify and make available resources that will address these needs. We will also train persons to use these resources effectively.
Provide web space for biological databases: PBiC provides the resources (servers, database and supervision) to facilitate setting up laboratory specific web pages and databases. Alternatively, core personnel will build the resource and train lab users in it's use. A couple of example databases have been developed and are described at the end of this document. We have developed build systems so that databases with this basic functionality (add, edit, delete and query data securely) can be easily set up, thus keeping the cost low.
Writing Scripts: Laboratories often have large or repetitive tasks that can be streamlined by writing scripts to automate much of the work. We will help identify ways that these processes can be streamlined and write the necessary scripts. Examples might include blasting or masking large numbers of sequences, reformatting sequences / data, integrating and summarizing information etc.
Supervised research: Many research projects are ongoing and/or exceed the capacity of the Core to provde solutions. In these instances, we will supervise a person from your laboratory to enable them to do the necessary bioinformatics to address the question at hand and enable your research.
Education and Training: Core personnel provide training primarily in one-on-one or small group (from a specific lab for example) settings as this has been determined empirically to be the most effective. We also provide monthly workshops for specific bioinformatics packages or approaches to larger groups as needs are identified that can be addressed in this format. Talks have been and continue to be presented to departments (and faculty meetings) to inform investigators of the value of bioinformatics to their research and the services that are provided by the Bioinformatics Facility. Please contact John Tobias if you would like someone to talk to your department.
Other needs...contact us! We are always happy to discuss science and in particular genomic approaches to research questions!! We can help with many things not outlined on this page but we can't tell you unless you ask. If we are unable to help you with your specific problem, we can usually refer you to someone who can.
Consultation: John Tobias provides a high level of consulting support for initial experimental design and determining the best analytical approaches given the data set. Letters of support for grant applications and help with grant sections are also provided.
Application support: John provides application support for the analysis of microarray datasets on a variety of tools available in the core facility. These include GeneSpring, Spotfire, Partek Genomics Suite, SAM and Ingenuity Pathways Analysis.
Data Analysis: We will provide data analysis in cases where the investigator desires this service. We ask that the investigator be present during the analysis to address questions as they arise. Generally these questions can not be anticipated ahead of time as they are analysis dependent. We feel strongly that this interaction between the investigator and person doing the analysis gives the most reliable and best analysis possible.
Shared computers: Two computers are available in the core facility (reserve blocks of time) on which users can do data analysis using a variety of tools. GeneSpring, Partek Genomics Suite, Spotfire, GenePix, SAM, R, and Bioconductor are available on these machines.
Distributed resources: We provide a number of packages in our microarray tools subscription so that users can run these packages in their own laboratories. Click here for additional information.
Consultation: Consulting to help frame research questions in genomics terms and identify bioinformatics approaches to biological problems is available. Please contact us for more information.
Application support: Support for running tools and analyses to analyze genomics data is provided. We do this primarily one on one but also offer workshops on different applications and analysis options as we identify specific needs.
Data Analysis: PBiC will also provide data analysis services in cases where the analysis is beyond the scope of the investigator. This includes data integration and summarization, high throughput analyses on large data sets (repetitive tasks), and complex or novel analyses for which desktop tools are not available.
Desktop Software: We provide access to a variety of desktop sequence analysis packages for a single subscription fee. Click here for more information.
Sequencher (Mac and PC): The tool of choice for contig
assembly on your desktop. Also includes some analysis tools and the ability
to connect to the NCBI database.
MacVector (Mac) and AccelrysGene (PC):
Complete desktop sequence analysis package provided by Accelrys. Provides
many analysis tools and can connect to the NCBI database for text and
BLAST queries.
LaserGene (Mac and PC): Complete cross-platform sequence analysis suite.
Vector NTI (Mac and PC): Alternative complete
package provided by InVitrogen. Now available at no cost from the user community pages at InVitrogen (although there is no support from InVitrogen other than the user community forums.
Server Software: A variety of server based tools are provided that can be accessed via a web browser or terminal connection. These tools tend to be more powerful than the desk top packages but are also more difficult to use in many cases. Click here for more information.
BioTeam
InQuiry web tools: Includes
many bioinformatics algorithms that are accessible via a web browser
including the EMBOSS suite. Anyone with a PennName can use this
resource.
GCG sequence analysis package: No longer supported by PBiC. GCG is being phased out in favor of the inQuiry package
above.
Database
searching/comparison: BLAST, FASTA, SIM4, BLAT, cross_match
etc.
Sequence assembly/SNP analysis: PHRED/PHRAP, consed,
polyphred.
Mapping: GHT (GeneHunter Two Locus).
Repeat Detection: RepeatMasker and Dust
Gene Finding: GenScan, GeneWise (wise tools).
EMBOSS
package:Similar to GCG in that it includes may different programs for sequence
analysis. UNIX command line access and web access via the InQuiry tools
above.
Statistical analyses: The R package with the
BioConductor libraries is installed.
Promoter analyses: BioProspector, AlignAce
and MEME for novel motif detection. PRIMA is used to identify
over-represented known motifs (transfac) in a sequence dataset.
Liniac Compute
Cluster: Used for very compute intensive applications such
as GHT or when doing many iterative tasks such as BLAST or RepeatMasking
many sequences. Consists of 256 processors, >2 terabytes storage
and associated servers.
BLAST databases: Public databases (from NCBI) are updated weekly and others updated on an as needed basis.
ORACLE: An Oracle instance is available for users who need a relational database. This includes access to a GUS database where users can securely store their data using this powerful schema for data integration (facilitated by SQL query access to the public AllGenes resource).
Lab specific databases and tools: We will develop (or provide oversight to develop) databases and interfaces for the needs of specific laboratories or groups. A couple of examples have been developed and are indicated below. We have developed build systems so that databases with this basic functionality (add, edit, delete and query data securely) can be easily set up, thus keeping the cost low.
Sequence Annotation Database: Users
are provided forms to submit sequences in batch, annotate their sequences
and query their database using text or sequence. Upon
submission, vector is removed, sequences are masked and blasted against
a set of user specified databases and all information stored. Users
may then retrieve say all sequences that match with a P Value better
than 1e-20 a protein with “kinase” in
the defline and then go on to display and manually add additional
annotation to entries if desired.
Resource Database: Databases for resources (such as
mouse strains, Taqman assays, primers etc) that allow laboratories,
groups or the entire University to add, edit, delete and query data.
Secure access can be restricted using PennKeys and/or specific user names.
Two example databases that are open to UPenn researchers are a database
for mouse strains at Penn and a database of expertise among Penn faculty.