Friday, 18 March 2011

How To Ease The Pain of SAS in Production - Part 2

In the first part we looked at the some of the data issues that I've found using SAS in production. Here are some of the coding issues.

SAS code is programming. Use descriptive names for variables and tables. Indent. Comment.
Since you’re going to delete all your working tables, don’t re-use them in a proc sort’s, data’s and sql’s. I call the de-duped result of proc sort on table X, “sorted_X”. Then I know where I am and so do you when you read the code.

You should not put usernames and passwords in code. And passwords expire. Use an input form to get those – make friends with the %Window macro.

Unless everyone in the team has the same SAS database modules use CONNECT TO ODBC (PROMPT) to access external databases.

On Mondays and Wednesdays I prefer PROC SQL to DATA steps, and on Tuesdays and Thursdays I prefer DATA steps. On Friday, I have no opinion. DATA steps are usually faster, PROC SQL is brain-space saving cross-use of a skill between SAS and the outside world.

SAS encourages you to fix problems in a production run by hacking at the original script: a quick resubmit here to run that little bit, a change of dates there, a little extra code… Do NOT do this: by the time you’ve finished the whole thing, you’re too tired and confused to un-hack the script. Next time you use it, it will not work. Fix the script. Re-run it from the start EVERY TIME. Then when it works, you know it will work next time.

SAS is too clever for its own good with dates. If it sees that a destination field in another database is a DATE field, it will assume the source field is a SAS date and translate that date into the destination database’s date format. This means you need to get Oracle dates like this: DATE - to_date('1960-01-01','yyyy-mm-dd').

If you’re starting by taking data from a database into a SAS table, do as much processing of the data in the SQL as possible, even data type conversions. I use a lot of pass-through queries to an Oracle database, and I find it a lot easier to write Oracle SQL than work out how to do the same thing in SAS. Also, lots of data manipulation is SAS makes for really scrappy looking code.

Sure, when SAS has a powerful bit of functionality it is really powerful (DATA / UPDATE combines a SQL append and update query in one simple bit of code), but when it doesn't, it hurts. This is how you do a string concatenation that references a couple of variables (startdate and enddate) in VBA…

Dim let dateConditionString as string
dateConditionString = "and DATE_B is null and DATE_A BETWEEN to_date('” & startdate & “,'yyyy-mm-dd') AND to_date('” & enddate &”,'yyyy-mm-dd')”

and this is how you do it in SAS…

%let dateConditionString = "and DATE_B is null and DATE_A BETWEEN to_date('&startdate','yyyy-mm-dd') AND to_date('&enddate','yyyy-mm-dd')";
data _null_;
call symput("dateConditionString", compress(&dateConditionString, '"'));
run;

This is not even scripting. It's DOS-level batch coding.

No comments:

Post a Comment