Indiana University

KnowledgeBase

The Research Database Complex (RDC)

On this page:


System overview


The Indiana University Research Database Complex (rdc.uits.iu.edu) supports research-related databases and data-intensive applications that require databases. The RDC supports Oracle and MySQL databases, and provides an environment (rdcweb.uits.iu.edu) for database-driven web applications focusing on research.

The system runs Red Hat Enterprise Linux 5. User home directories reside on a Network-Attached Storage (NAS) device, with disk storage of 10 GB per user. This space is shared by your Big Red, Quarry, and Mason accounts, if you have accounts on those systems.

The RDC currently supports several important research projects:

Note: The RDC is strictly devoted to supporting research; it is not an instructional or classroom environment. If you need to use Oracle or Microsoft SQL Server in an instructional environment see Instructional database and web server accounts.

Databases

The RDC offers Oracle 11g Release 2 (version 11.2.0.2) and MySQL Enterprise Server (version 5.5.8 Advanced) database accounts, with a full suite of Oracle components that support:

Content: Oracle components on the RDC that support content include:

  • Advanced Security Option (ASO): Provides data encryption and strong authentication services to the Oracle database

  • Application Express: Offers development and deployment of secure applications through a rapid, web-application development tool for use with Oracle databases

  • Large objects (LOBs): Lets you store and manipulate large blocks of unstructured data, such as text, graphic images, video clips, and sound waveforms, in binary or character format

  • Oracle Multimedia (formerly Oracle interMedia): Provides a platform for a wide range of multimedia-intensive applications

  • Oracle Text: Indexes any document or textual content to add fast, accurate retrieval of information to Internet content management applications, e-business catalogs, news services, and job postings; indexes content stored in file systems, databases, or on the web

  • Oracle XML Database: Treats XML as a native datatype in the database

  • Oracle XDK: Contains the basic building blocks for reading, manipulating, transforming, and viewing XML documents, whether on a file system or stored in a database

  • Partitioning: Lets you split large tables and indexes into smaller, manageable components, without requiring changes to underlying applications

Analysis The RDC provides these Oracle components that support analysis:

  • Oracle Data Mining: Provides a way to access information buried in the data by creating models to find hidden patterns in large, complex collections of data; embeds data mining within the Oracle database; algorithms operate natively on relational tables or views, eliminating the need to extract and transfer data into other tools, applications, or servers

  • OLAP: Offers in-database, advanced multidimensional analytic capabilities

Java: The RDC offers these Oracle components that support Java:

  • JServer Java Virtual Machine: A Java Virtual Machine (VM) that runs within the Oracle database server's address space

  • Oracle Database Java Packages: Classes for relational database management system (RDBMS) features

Back to top

System information


System configuration Aggregate information Per-node information (when applicable)
Machine type Research database system
Operating system Red Hat Enterprise Linux 5
Memory model Distributed and shared
Processor cores 36
CPUs Intel Xeon E5620 2.40 GHz (HP)
Intel Xeon Quad Core 1.6 GHz (Dell)
Nodes 3 Hewlett Packard DL 180 G6 Oracle servers
1 Hewlett Packard DL 180 G6 MySQL server
1 Dell 2950 Database Driven Web Services
RAM 288 GB 72 GB (HP)
8 GB (Dell)
Local storage Hewlett Packard StorageWorks 2000fc Modular Smart Array, approximately 48 TB of usable storage
RPeak 307.2 gigaflops 76.8 gigaflops
Storage information Aggregate information Per node
File systems RDC home directory disk space is allocated on a Network-Attached Storage (NAS) storage device.
/scr (local) /tmp (local) /N/dc/scratch/username (Data Capacitor scratch space)
Total disk space 48 TB
Total scratch space /N/dc/scratch/username (Data Capacitor) /tmp (1 GB)
/scr (10 GB)
Quotas 15 MB per user
Backup and purge policies Incremental backups of the RDC Oracle databases occur at various times between 1am and 6am, Sunday through Friday.
Full backup occurs between 1am and 5am every Saturday. Backups are retained for 30 days.
Backups for MySQL database servers on the RDC are the responsibility of the user.
Availability scope Access to the RDC is available to all IU graduate students, faculty, and staff. Undergraduates and non-IU collaborators must have IU faculty sponsors.

Note: Indiana University will soon replace its current Data Capacitor with Data Capacitor II, a high-speed, high-capacity storage facility for very large data sets. With 5 PB of storage, Data Capacitor II will support big data applications used in computational research. IU partnered with DataDirect Networks, Inc. (DDN) to develop Data Capacitor II, which is scheduled to be installed in the IU Data Center in spring 2013. For more about Data Capacitor II, see the November 8, 2012, press release. If you have questions about how the change to Data Capacitor II will affect your research, email the High Performance File Systems group.

Back to top

System access

Requesting an account

Access to the RDC is available to all IU graduate students, faculty, and staff. Undergraduates and non-IU collaborators must have IU faculty sponsors. To request an account:

Note: If you don't see RDC as an option in AMS, email IU Account Administration.

You will receive a confirmation email message once your RDC account is created.

Back to top

Your database login

When you receive the email message confirming your RDC account is created, you must complete the process by requesting a database login. The confirmation email message will direct you to the online RDC Database and Web Services Account Application.

Database group accounts, where a username is shared by more than one researcher, are available on the RDC. You can request a database group account on the RDC and Web Services Account Application. To request a database group account, your group must have an existing IU Network ID, and you must provide the Network ID usernames of everyone who will be using the database group account. Whoever requests the group database account will be considered the responsible party for the account, and is responsible for communicating with the group database account's users regarding system downtime and other information. For more about IU group accounts, see Requesting a departmental or group account

When your database login has been created, you will receive another confirmation email message containing your login credentials, and information about connecting to your database.

To request an Oracle or MySQL database, email the UITS High Performance Systems (HPS) team. After the HPS team creates your database, you will receive a welcome message in email containing information about your database username and password.

Note: If you already have an Oracle or MySQL database login, but don't remember the database password, refer to your database welcome message. If you need help, email HPS.

Back to top

Connecting to your Oracle or MySQL database

Oracle: For instructions on connecting to your Oracle database on the RDC, see:

MySQL: For instructions on connecting to your Oracle database on the RDC, see:

Back to top

Computing environment

Unix shell

The shell is the primary method of interacting with the RDC. The command line interface provided by the shell lets users run built-in commands, utilities installed on the system, and even short ad hoc programs.

The RDC supports the Bourne-again (bash), TC (tcsh), C (csh), Korn (ksh), and Bourne (sh) shells. New user accounts are assigned the bash shell by default. For more on bash, see the Bash Reference Manual and the Bash (Unix shell) Wikipedia page.

To change your shell on the RDC, use the chsh command.

Note: Running chsh (instead of changeshell) changes your shell only on the node on which you run it, and leaves the other nodes of the cluster unchanged; changeshell prompts you with the shells available on the system, and changes your login shell system-wide within 15 minutes.

Environment variables: The shell uses environment variables primarily to modify shell behavior and the operation of certain commands. A good example is the PATH variable.

When the shell parses a command you have entered (i.e., after you hit Enter or Return), it interprets certain words you've typed as program files that should be executed. The shell then searches various directories on the system to locate these files. The PATH variable determines which directories are searched, and the order in which they are searched. In the bash shell, the PATH variable is a string of directories separated by colons (e.g., /bin:/usr/bin:/usr/local/bin). The shell searches for an executable file in the /bin directory, then the /usr/bin directory, and finally the /usr/local/bin directory. If files of the same name (e.g., foo) exist in all three directories, /bin/foo will be run, because the shell will find it first.

In the bash shell, use echo to display the value of an environment variable:

echo $VARNAME

To change the value of an environment variable:

export VARNAME=VALUE

Startup scripts: Shells offer much flexibility in terms of startup configuration. On login, bash by default reads and executes commands from the following directories (and in this order): /etc/profile ~/.bash_profile ~/.bashrc

Note: The ~ (tilde) represents your home directory (e.g., ~/.bash_profile is the .bash_profile file in your home directory).

On logout, the shell reads and executes ~/.bash_logout. For more on bash startup files, see the "Bash Startup Files" section of the Bash Reference Manual.

Back to top

Transferring your files to the RDC

The RDC supports SCP and SFTP for transferring files. SCP is a command line utility included with OpenSSH. Basic use is:

scp [[user@]host1:]file1 [[user@]host2:]file2

For example, to copy foo.txt from the current directory on your computer to your home directory on the RDC, use (replacing username with your Network ID username): scp foo.txt username@rdc.uits.iu.edu:foo.txt

You may specify absolute paths or paths relative to your home directory:

scp foo.txt username@rdc.uits.iu.edu:some/path/for/data/foo.txt

You also may leave the destination filename unspecified, in which case it will become the same as the source filename. For more, see In Unix, how do I use SCP to securely transfer files between two computers?

The SSH File Transfer Protocol (SFTP) provides file access, transfer, and management, and offers client functionality much like FTP. For example, from a computer with a command line SFTP client (e.g., a Linux or Mac OS X workstation), you could transfer files as follows:

$ sftp username@rdc.uits.iu.edu username@rdc.uits.iu.edu's password: Connected to rdc.uits.iu.edu. sftp> ls -l -rw------- 1 username group 113 May 19 2011 loadit.pbs.e897 -rw------- 1 username group 695 May 19 2011 loadit.pbs.o897 -rw-r--r-- 1 username group 693 May 19 2011 local_limits sftp> put foo.txt Uploading foo.txt to /N/hd02/username/RDC/foo.txt foo.txt 100% 43MB 76.9KB/s 09:39 sftp> exit $

Graphical SFTP clients are also available for many systems. For more, see Transferring files with SFTP

Back to top

Application development

Web Services: In addition to providing a home for research databases, the RDC provides an environment for database-driven web applications with a research focus. This environment is composed of a Dell 2950 with a 1.6 GHz Quad-core Intel Xeon processor and 8 GB of memory. This system (rdcweb.uits.iu.edu) runs Red Hat Enterprise Linux 5. User home directories reside on the IBM N5500 NAS storage device, with disk quotas of 10 GB per user. For details, see Web Services on the IU Research Database Complex.

Back to top

Reference

Oracle: For further documentation, see the Oracle Database Documentation Library, 11g Release 2 (11.2), and the following online guides:

MySQL: For further documentation, see the MySQL Reference Manual.

Back to top

Policies

Accounts

Access to the RDC is provided to all IU graduate students, faculty, and staff. Access is also provided to undergraduate students and non-IU collaborators, if they have IU faculty sponsors. For information about user responsibilities and security issues, see Research Database Complex (RDC) usage policies.

The RDC is strictly devoted to supporting research. The RDC is not an instructional, classroom environment. If you are not doing research and wish to use a database, such as Oracle or Microsoft SQL Server, see Database and web server access for instruction.

Accounts remain valid only while the account holder is a registered IU student, or an IU faculty or staff member. On Big Red, Quarry, and the RDC, accounts are disabled during the semester following the account holder's departure from IU, and then are purged within six months. To request that your research systems account be exempt from disabling, email IU Account Administration. If the request is approved, the account will remain activated for one calendar year beyond the user's departure from IU, and then, at the end of the year, the account will be purged. Extensions beyond one year for research accounts are granted only for accounts involved in funded research and having an IU faculty sponsor, or with approval of the Dean or Director of Research and Academic Computing.

By submitting the RDC and Web Services Account Application, you affirm that:

  • You understand use of the database is reserved for research purposes only.
  • You will acknowledge use of IU's high-performance systems in publications resulting from your research.
  • You will provide periodic listings of citations of those publications upon request.

Database group accounts

To request a database group account, your group must have an existing IU Network ID, and you must provide the Network ID usernames of everyone who will be using the database group account. Whoever requests the group database account will be considered the responsible party for the account, and is responsible for communicating with the group database account's users regarding system downtime and other information. For more about IU group accounts, see Requesting a departmental or group account

Responsibilities

As owner of a database account, you are also responsible for:

  • Creating and managing your schema objects (e.g., tables, views, procedures, triggers, and schema privileges)
  • Changing datatypes
  • Any data processes, such as data imports, deletes, modifications, transformations, and retrievals
  • Creating and maintaining copies of scripts
  • Emailing the HPS group about changes in space or database administration, or if you no longer need access to the research database
  • Adapting your schema and data as required during system and database upgrades
  • Adapting client applications and tools to system and database versions
  • Monitoring HIPAA required audit logs, if auditing is enabled

If you need help with the above, submit a request for RDC database consulting services by emailing the HPS group.

RDC database administrators are responsible for:

  • Backing up the database
  • Managing space allocation
  • Managing database and tablespace creation
  • Monitoring and reporting database performance
  • Monitoring and reporting invalid schema objects
  • Installing database and system upgrades and patches

Database security

  • HIPAA:

    Many of the technology services provided by the UITS Advanced Biomedical IT Core, Research Technologies, and Enterprise Infrastructure divisions are formally aligned with the federal Health Information Portability and Accountability Act (HIPAA). See About IU's research systems and services and HIPAA alignment.

  • Passwords:

    Once your RDC Database and Web Services Account Application is processed, you will receive a confirmation email message that describes how to access your Oracle database, and how to reset your initial database password. Your initial database password will be sent in a second email message.

    It is important to employ methods that do not transmit passwords across the Internet in plain-text format. If you use SQL*Plus to access an Oracle database, invoke it without the password. The following example shows how to connect to Oracle from rdc.uits.iu.edu:

    doe@RDC:~> sqlplus joeuser@iugp.iu.edu

    Provide the password when prompted:

    SQL*Plus: Release 10.2.0.1.0 - Production on Fri Mar 14 13:49:03 2008 Copyright (c) 1982, 2005, Oracle. All rights reserved. Enter password:

    To connect to MySQL from rdc.uits.iu.edu, enter:

    doe@RDC: $mysql --defaults-file=/N/u/<username>/RDC/.my.cnf -u root -p

    Replace username with your username. When prompted, enter your password.

Database backup and recovery

UITS performs incremental backups of the RDC Oracle databases at various times between 1am and 6am, Sunday through Friday, depending on the instance. A full backup occurs between 1am and 5am every Saturday. Backups are retained for 30 days. In the event of system failure, research databases can be restored to the point of the last good backup, which is usually from that morning. Data recovery for individual accounts is not guaranteed if data loss is the result of user error.

Recovery of a table is typically not part of the database recovery process. Dropped tables can often be recovered using Oracle's recyclebin feature; see In Oracle 10g and later, how do I recover a dropped database table?

Backups for MySQL database servers on the RDC are the responsibility of the user. Use the following command:

mysql_instance backup

For more about using the mysql_instance command, see On the RDC at IU, how do I stop or start my MySQL database?

Scheduled down time

The RDC maintenance window is the first Tuesday of each month, 8am-5pm. Notice of any emergency downtime will be posted at IT Notices.

Disk space for data loading and other applications

If you need space on the RDC for staging data, or for data-related packages and applications, email the HPS group, which will evaluate requests on a case-by-case basis.

Mail usage

Production mail service is not provided on the RDC.

Reporting problems

Support staff are available from approximately 8am-5pm Monday-Friday. Email the HPS group to report problems with the RDC. Be sure to include:

  • The name of the database server to which you were connecting
  • A description of what you were trying to do
  • A description of the problem
  • The name and version of the tool you were using to connect to the Oracle database server
  • Your computer's operating system
  • The error message number and text, if applicable
  • The time the problem occurred

Back to top

Support

The UITS High Performance Systems (HPS) group supports and administers the RDC; if you have questions about the system, email HPS. If you have questions about compilers, programming, scientific or numerical libraries, or debuggers, email UITS Scientific Applications and Performance Tuning. If you have questions about statistical and mathematical software, email Research Analytics.

Back to top

This is document amuw in domain all.
Last modified on March 14, 2013.

Search the Knowledge Base