Time-to-event processing

[last updated - 17 August 2003]

The easiest way to do time-to-event processing is to "flatten" your data so that you have only one observation per "by group" (where this "by group" is usually a single variable such as "subject"). With your data all in one observation, also knowing how many observations belong to each by group, then you can use array processing to loop through your data. If you organise your data like that then it becomes a lot easier. Since multiple variables need to be flattened, I wrote a macro named flatten to do all the "proc transposes" on these variables and to count the number of observations per "by group" and add that to the output dataset.

Below is an extremely simple example of code to find the date at which a value rose above 1000. This is far simpler than anything you will need to do but serves to show how the data is transformed with the flatten macro and to show how to loop through the data.

data test;
  subj=1234;
  dt='01jan03'd;val=0;output;
  dt='01feb03'd;val=500;output;
  dt='01mar03'd;val=1005;output;
  dt='01apr03'd;val=2005;output;
  subj=2345;
  dt='01jan03'd;val=100;output;
  dt='01feb03'd;val=100;output;
  dt='01mar03'd;val=100;output;
  format dt date7.;
run;

%flatten(dsin=test,bygroup=subj,vars=dt val)
%put ********* _maxn_=&_maxn_;

data t2event(keep=subj date);
  set test;
  array dt {*} dt:;
  array val {*} val:;
  put (_all_) (=);
  do i=1 to nobs;
    if val(i)>1000 then do;
      date=dt(i);
      output;
      i=nobs;
    end;
  end;
  format date date7.;
run;

data _null_;
  set t2event;
  put (_all_) (=);
run;

And here is some of the log output.

426  %put ********* _maxn_=&_maxn_;
********* _maxn_=4



subj=1234 nobs=4 dt1=01JAN03 dt2=01FEB03 dt3=01MAR03 dt4=01APR03 val1=0 val2=500 val3=1005 val4=2005
subj=2345 nobs=3 dt1=01JAN03 dt2=01FEB03 dt3=01MAR03 dt4=. val1=100 val2=100 val3=100 val4=.
NOTE: There were 2 observations read from the data set WORK.TEST.
NOTE: The data set WORK.T2EVENT has 1 observations and 2 variables.
NOTE: DATA statement used:
      real time           0.05 seconds



subj=1234 date=01MAR03
NOTE: There were 1 observations read from the data set WORK.T2EVENT.
NOTE: DATA statement used:
      real time           0.00 seconds

Some notes on the code. The maximum value of the number of observations per by group gets written out to the global macro variable _maxn_. You can use this in your array statement like this:

array dt {*} dt1-dt&_maxn_;

...but you usually do not need to refer to it as you can refer to a list of variables using a colon trailer as was done in the code.

Note that I am using a form of "put _all_" that you might not be familiar with. I have used "put (_all_) (=)" to avoid putting out the automatic variables _N_ and _ERROR_. You can read more about this on the SAS web site here.

Go back to the home page.

E-mail the macro and web site author.