Michigan Ross iMpact

Leading in Thought & Action

Grid References & How-Tos

Please select an option below

Linux Primers

Select one of the following links to download the .pdf file

Launching Batch Jobs

It is assumed that you've already installed PuTTY, WinSCP and know how to access your RC Grid account. If this is not the case then please email Paul Michaud (michaud@umich.edu) and he will send you the necessary information. Some information regarding how to configure and use WinSCP is given below just in case.

Launching Stata batch jobs:

  • stata -b do stataJob: For Stata IC
  • stata-se -b do stataJob: For Stata SE
  • Input job name stataJob is assumed to be stataJob.do
  • Output file name is stataJob.log.

Launching Matlab batch jobs:

  • Matlab iFile oFile Parms
  • Where:
    • iFile : Input command file
    • oFile : Output results file
    • Parms : Optional Matlab specific parameters
      • such as -nosplash, -nodesktop, etc.

Launching Sas batch jobs:

  • sas cmdFile
  • Input job name cmdFile is assumed to be cmdFile.sas
  • Two output files are generated:
    • cmdFile.log : Contains a log of the execution of the SAS commands in cmdFile
    • cmdFile.lst : Contains any output displayed by PROCs executed in cmdFile

Monitoring Batch jobs

  • bjobs -aw
  • Displays all of your RC Grid batch jobs which are either running or have completed within the last hour.

Killing Batch jobs

  • bkill PID
  • Where PID is the Process Identification Number. The PID is displayed as the first column in the bjobs output.

SAS Primer

SAS Usage:

Batch mode: sas cmdFile.sas (where cmdFile.sas is the name of the file containing the SAS commands you want executed).

  • Two output files are generated:
    • cmdFile.log, which contains information about the execution of the SAS job
    • cmdFile.lst, which contains any command/proc output that you did not explicitly direct to be output to some other file.

GUI mode: sas

  • You need to have an X-Windows Display Server running on your Win PC for the GUI to come up (we recommend using xming, look for details about xming in the New User email).

Documentation:

Stata Primer

First-time RC Grid Stata users are typically familiar with the Stata GUI (from using Stata in either a Windows of Mac environment). Therefore, for most, that is the best place to start. Details on how to do this are given below.

However, there is one potentially serious concern with using the Stata (or any) GUI on the RC Grid--the network connection. Accessing the RC Grid is done remotely (via an ssh tool, typically PuTTY). Any Stata job launched via a GUI session will die if the network connection is disrupted or terminated. Whereas this isn't much of an issue if you are connecting to the RC Grid from within the Ross School, it can be a significant problem when accessing from elsewhere, especially when running long jobs. Therefore, the safest way to run long Stata jobs is via batch mode. Details on how to do that are given below.

Opening a RC Grid session and launching the Stata GUI:

  1. Make sure that Xming is executing. Note that the first time you launch Xming you will probably get a Windows Firewall dialog box. If so, then select "Allow access." This should only happen once.
  2. Launch PuTTy (you will get the "PuTTY Configuration" window)
  3. In Putty under Session select "RC Grid" and then Load
  4. Select Open. (The first time you do this you will get a Security Alert dialog box. Select Yes.)
  5. At the "login as:" prompt type your UniqueName and enter, then your password then enter
  6. At the prompt enter xstata (for IC) or xstata-se (for SE)
  7. A few lines about connecting will pass by, then (after some propagation delay) the Stata GUI will appear

Launching a Stata batch job from the RC Grid:

  • stata -b do stataJob: For Stata IC
  • stata-se -b do stataJob: For Stata SE
  • Input job name stataJob is assumed to be stataJob.do
  • Output file name is stataJob.log.

Documentation

  • Official online documentation doesn't exist but there is a Wiki Book athttp://en.wikibooks.org/wiki/Stata/Documentation.
  • The Stata application contains a complete set of documentation. To get help on any particular Stata command type "help command-name" at the command prompt. To see the full set of documentation (you need to be running xstata) select Help (from the top bar) then PDF Documentation.
  • Books on Stata can be found at http://stata.com/bookstore/books-on-stata/
  • Workshops on both SAS and Stata are offered by UM's CSCAR (Center for Statistical Consulting and Research).

SUBMITTING JOBS TO THE GRID (FOR USE WITH USER WRITTEN C/C++, PERL, PYTHON, ETC. SCRIPTS)

Jobs which use standard third-party applications (such as SAS, Matlab, Stata, R, Mathematica and Gauss) are automatically submitted to the Grid. Nothing additional need be done for jobs of this type.

Jobs which are developed via various compiled or scripting languages (such as C/C++, Perl, Python, csh, bash, awk, etc.) must be explicitly submitted to the Grid. Otherwise they will execute on the RC Headnode. Running this programs on the Headnode isn't a problem as long as they are designed to be of short duration and are not CPU or intensive (anything requiring 80+% of a CPU on a consistent basis is decidedly CPU-intensive!).

bsub

The command to submit a job explicitly to the RC Grid is bsub. The basic use of bsub is as follows:

  • bsub -e job.err cFile1a.pl p1 p2 p3 p4

This command submits the Perl script cFile1a.pl for execution on the RC Grid (p1-p4 are parameters input to the Perl script).

The "-e job.err" parameter will cause an error log to be generated to job.err in case of an abnormal termination.

Upon job completion an email will be sent to your-unique-name@umich.edu. If you don't want an email to be sent upon job completion then use the following form of bsub:

  • bsub -e job.err -o /dev/null cFile1a.pl p1 p2 p3 p4

bjobs

To monitor the status of any jobs you have submitted to the RC Grid use the bjobs command. The basic use of bjobs is as follows (for full details use the command "man bjobs"):

  • bjobs -aw

The -a parameter will cause information to be displayed about all jobs currently running and those which have completed within the last 60 minutes.

The -w parameter will cause the full job name(s) to be output, otherwise you will see the job name in an abbreviated form.

bkill

To terminate any jobs you have submitted to the RC Grid use the bkill command. The basic use of bkill is as follows:

  • bkill JobID

The JobID is a field output by the bjobs command. There is one per job. Enter the JobID of the job which you want terminated (needless to say, you can only terminate jobs launched by your user ID).

Datastream Basic Usage

Using Datastream's AFO (Advance For Office) Excel 2010 plug-in:

To being:

  • Launch Excel
  • Select the Datastream tab

There are essentially two types of searches:

1.) Static

Select "Static Request"

This is used to find data which either do not change over time or change only infrequently (for example, name, isin, location, etc.). For those items which change *infrequently* over time there will be multiple entries within a single static variable entry. To my knowledge, no list exists of the variables which exhibit the "change infrequently" characteristics.

2.) Time Series

Select "Time Series"

This is used to find data which changes regularly over time (typically all income statement, balance sheet items, etc.).

For each type of search you must identify the sample universe

This can be done in two ways.

  1. Select a previously defined list. Select the icon just beneath the "Find Series" button (it shows a magnifying glass). This displays the names of all previously user-defined lists. Double-click on one to select it.
  2. Identify a sample using the Navigator: Select "Find Series," then select Criteria Search. This brings up the Navigator proper.

First, select a Data Category (selectable list is to the immediate right)

Then there are a variety of ways to identify firms. Most are self-explanatory. Unfortunately there doesn't exist any documentation which describes all of the available search options (this according to the Datastream support folks).

Determining how best to get at your firms of interest can be extremely tricky. However, once you have made your search criterion selection then click on "Search Now". The results of your search will now be displayed. Here is where things get really interesting. No matter how many results are displayed, you can only select 100 at a time to include in your actual search. Assuming that your sample universe consists of more than 100 firms you have the choice of (a) manually selecting 100, processing them, then manually selecting the next 100, etc., (b) shrink list size by judicious use of the Name search criteria (this too is obnoxious as no wildcard scheme exists so you therefore cannot identify ranges but rather must explicitly identify starting letters/numbers), or (c) create a list (which also has its problems).

To create a list:

  1. Bring up the Navigator
  2. Select appropriate search criterion
  3. Select Search Now
  4. Select the icon on the far right (next to the "Displayed Results" box) which contains a tiny Excel symbol. Note that you are restricted to exporting a maximum of 8000 firms. If you've got more than 2500 in your sample universe, then you're going to have to do some of the pruning that is suggested in (b) above. A dialog-box will appear.
  5. Choose which variables you want to export to Excel (probably doesn't hurt to export all of them, which is the default)
  6. Select Transfer to Excel
  7. Select the down-arrow next to the Save button
  8. Select Save-As
  9. Close dialog-box
  10. Open the newly created Excel file
  11. Select the Datastream tab
  12. Select "Create List (From Range)"
  13. Choose the column which corresponds to the DS Code (which is the Datastream propieraty ID)
  14. Enter a meaningful name under "List Description" as this is what will appear when you go to select a previously defined list (see above).
  15. Enter a file name under "List File Name", do not change the "LLT" suffix!
  16. Select OK
  17. Close the Excel file

Now if you go back to step 1 given above, and the list you just created will be displayed using the information you provided in the "List Description" field.

Identify the data items of interest

  • Select Datatypes
  • Select the appropriate Data Category (should match the Data Category you used to identify your sample firms)

Within a given data category, there are many, many data types from which to choose. You can drill down by opening and closing and selecting sub-categories (works like the Windows Explorer tool).

Use the Datatype bar options to restrict the displayed variables to the type of search you are doing. Either Static or Time Series. If you are conducting a Time Series search and select some Static variables then you will obtain no output for those static variables. The same is true for selecting Time Series variables in a Static search.

Also useful is the Find bar (it appears just above the Datatype bar and just below the Data Category selection line). This will allow you to filter the displayed variables based upon various elements of their names.

Once you've identified an appropriate sub-set of potential variables, select the ones you want by clicking on the empty box to the right of the variable name. A check-mark will appear (this is a toggle). Once you have clicked on all variables of interest then click on "Use Selected" (which appears in the Variable box header to the right of Name).

The procedure from this point depends upon the type of search you are conducting.

Static search: select Submit. The firm's will be listed vertically starting in the column in which the cursor was located when you started the Datastream search process. Each data item will occupy a separate column.

Time Series search: Enter a Start Date, enter an End Date, enter a data frequency, select Transpose Data, select Submit. The firm's will be listed vertically starting in the column in which the cursor was located when you started the Datastream search process. There will be one line per firm per data variable. Each date will occupy a separate column. This display would be reversed if you did not opt to select "Transpose Data" above. However, most folks seem to find it easier to post-process firm/variables as rows and dates as columns.

Dealscan Basic Usage

On the RC Grid all available Dealscan data (which is through 4Q2009) is located in /home/databases/dealscan.

There are two documents which comprise the entirety of the documentation provided by the Dealscan folks:

1) UniversityFeedSchema.jpg

This is a image of layout for the various Dealscan tables and how they are linked.

2) UniversityFeedDictionary.xls

This spreadsheet contains all the variable definitions for all the tables (Sheet 1). The use of the data in Sheets 2 and 3 is not clear. Table names in Sheet 2 don't align with table names in the Schema. Given that the Dealscan we were dealing with got gobbled up by Thomson-Reuters just as we were acquiring the data (back in 2010) and since we don't subscribed to the Thomson-Reuters Dealscan data, we don't have access to anyone who could provide further explanations.

On the RC Grid, in the /home/databases/dealscan directory, are a set of 22 SAS data sets which correspond to those tables found in the Schema. Those tables are also available in Stata format and can be found in the stata subdirectory. The variable names should in the SAS/Stata data sets should be the same as those found in their corresponding tables on the Schema (this can be easily confirmed in SAS via a Proc Contents).

There is a README file in /home/databases/dealscan that essentially outlines what is described herein.

Finding WRDS SAS Data Set Layouts & Variable Information

How to find WRDS SAS data set layouts

  1. Log on to the WRDS web site
  2. Select Support
  3. Select Dataset List (under the Data Contents section)
  4. Scroll down until you find the name of the Database of interest and select it
  5. Scroll through the list of available tables and select the one of interest. A table will be displayed which describes the contents of the SAS data set plus a typically not very insightful description of the variables.

How to find additional variable description information

  1. Log on to the WRDS web site
  2. Select Support
  3. Select Data Vendor Manuals (under the Data Contents section)
  4. Select the appropriate data set. The value of the manual and the ease of use varies across vendors.

An example of using Compustat:

  • Select Compustat
  • Select Compustat Online Manual
  • Select the Search tab, type in the variable name found in the table layout, press ENTER. This will give you a list of all references within the Compustat manual for that variable name. Since you are entering a specific variable name this usually resolves to a single item, the selection of which takes you to the page which describes the variable. Do this for each variable of interest.