Other Clinical Issues

(Last updated: 05 September 2003)

Introduction

In other parts of this site you will find many macros, nearly all of which were written in support of clinical reporting programming. On the page dedicated to Clinical reporting macros I have included information about the structure of your clinical reporting. The page you are reading now takes the discussion further and fills in small details. The information contained in these pages is to get you to think about your reporting environment.

Data cleaning

From my own experience as a SAS clinical programmer, most of my time gets used up resolving data issues. Data cleaning is, of course, the job of the Data Management section and will be done in accordance with their "Data Validation" document (or whatever it is named). But it is one of the responsibilities of statisticians to have input into that document to define the checks to be done on the data. And here is where good programmer input is needed to define these checks to catch data problems. This is often not done thoroughly enough by the programmers, or maybe they are not even invited to contribute, and so it is no wonder they get plagued by data problems at a later date. A bit more organization surrounding this, to build up a battery of programmed data checks that could be run on the data on a regular basis and early enough in the collection of data, would go a long way to reducing programmer problems with data at the late and busy stage of clinical reporting. Time spent after all the data has come in is expensive because this is a delay in bringing a drug to market. There are huge cost benefits to be gained in keeping the time from last CRF page in to reports produced to a minimum. But time before last CRF page in is relatively cheap. It is much better to do as much work as you can up front, especially in the area of data cleaning where problems are well known to cause delays. There is emphasis in pharmaceutical companies of keeping the time to database freeze to reports delivered at a minimum, but they fail to spot that database freeze itself can sometimes be delayed by a week or two or even longer due to data problems. If well organized and done early then the freeze itself will occur sooner and this is every bit as important as keeping the time from database freeze to reports delivered short. Because there is usually a clear demarcation between the Data Management function and the Stats and Reporting function then this delay to database freeze never gets addressed. And yet this delay is usually caused by programmers not putting sufficient input into the data validation document or not having the chance to. It is the time from last CRF page in to statistical reports delivered to regulatory authorities, that needs to be minimized to result in cost benefits.

The Sharing of Skills

I am not a Head of Biostatistics nor am I a statistician. I am just a programmer but one with many years of experience with SAS. I used to be in mainstream computing, programming in Cobol as my main language with Microsoft Basic as my second language. In those days they trained you as a programmer. They expected very little from you for the first six months. Just promising signs that you would develop in the right direction. One year to eighteen months was a more realistic plan for turning out a good programmer. You worked for a supervisor who monitored your progress as a programmer - a very experienced person who you could learn off. Programming styles can clash, but you became a good programmer by utilizing somebody else's style and experience - if not your supervisor then somebody else. Eventually you develop your own style. But the point I am trying to stress is that the most important part of becoming a good programmer is in utilizing the experience of skilled programmers. I do not see that any programmer can become a good programmer without going through this process, although it happens rarely. Most SAS programmers have not had the benefits of this sort of transfer of skills and knowledge and, compared to the rest of the IT world, SAS programmers are well behind their colleagues in terms of expertise. And to me it is purely due to this lack of transfer of skills. The SAS Institute has maybe recognized this and have certification courses that people can do. But the SAS language is just a tool. A tool-maker should not be telling the builder how to build. And if the builder is relying on the tool-maker to tell them how to build then something is wrong. So what I am suggesting is that this transfer of skills is not happening among SAS programmers and the situation needs to be remedied. What is needed is some of the old-fashioned transfer of skills and the passing on of knowledge that happens in mainstream IT. But because good skills are thin on the ground then what I think is needed is an organized effort to pass on skills to other SAS programmers. I think there should be an effort not only within organizations to pass on skills but across organizations as well because these skills are so thinly distributed. And that this will not happen with good intentions but will instead require a driving and steering force. What is needed is for meetings between pharmaceutical companies for SAS programmers to share their skills. The programmers should be allowed the time to develop their talks. People need to be appointed within these organizations to galvanize these activities and ensure that skills do indeed get transferred in a measurable way such that the whole programming effort is changed for the better. If this is done then clinical reporting SAS programmers could be brought up to speed with other parts of the IT industry.

I hope that the macros I have provided on this web site will go some way to initiate this proposed process of sharing of skills. But firstly, SAS programmers will need to be able to find this web site. But do they have access to the Internet at work? Will they be able to allocate any time to reading what is here? Will they even be allowed? Are they encouraged in this sort of activity or discouraged? Are they so tied up in project work that they have no spare time? In which case they might be spending days developing something that is already on this web site. And don't forget, everything on this web site has been donated to the public domain. There are more than 100 macros webbed here. Do programmers know what those macros do? I very much doubt it since there are that many macros. The point I am trying to make is that time needs to be allocated to enable the process of the sharing of skills. Time that can not be allocated to a specific project. Time that, initially, will have no measurable benefit. Perhaps it would be a good idea to appoint a senior programmer to oversee and coordinate the sharing of knowledge both within and outside the organization such that this person is temporarily not linked to any project work. Then to identify each group of programmers and negotiate with the project manager a suitable date to commence the process of the pooling and sharing of code within that group. Also to ensure that there is at least some representation from that group for regular programmers meetings whose sole purpose is to increase awareness of skills and code sharing issues and these meetings should stay unadulterated with other programming or company issues since I feel this process could easily go off the rails. If done like this, and if the process stays on the rails, then there should be a point where there is a measurable increase in efficiency as programmers used good examples of pooled code (and know where to find it) and SAS skills are being passed on either through internal training or popular newsletter. But I would stress that in my opinion, it will not happen by itself or through good intentions alone.  

Reporting Systems designed by Committee and the 80% rule

You decide you want a new reporting system in your organization. You appoint twenty people or more to bring together ideas about it and come up with a proposal that gets circulated to every staff member connected with it in the organization. They all have their say and feed back their thoughts on it. At some stage, when all ideas are gathered, you put out the final proposal to acceptance or rejection. It gets accepted. Now since everybody has had plenty of time to think about it carefully and everybody has had their say they you can't go wrong. Right?

Wrong! This is a system designed by committee. It is designed in a way that aims to keep too many people happy. It will most likely be a very complex system that will require huge resources to write and be expensive and difficult to maintain. You can't keep everybody happy and still keep things simple. Ten people may be perfectly happy with a report template that only one person has to code. If this turns out to be extremely difficult to code and a potential nightmare to maintain then is it right because there are ten people who have agreed to it and only one programmer who is struggling with it? If you want a simple and maintainable system that makes reporting efficient then you should keep in mind that you will be sacrificing some functionality in the process. It is not possible to keep everybody happy and keep your reporting efficient. Some things will have to go. In creating a reporting system you should be aiming at a reporting system that can do 80% of what people have asked for. If a template proves extremely difficult to code then you should be looking at a redesign. Maybe some of those useful pieces of information that appear will have to be shown on a different report so that you can use "proc report" or "proc tabulate" in a more normal way so that you can write the report in 30 lines of code rather than 3000. The "committee" should not have the final say on the final design. They should be made to understand from the beginning that final design may be only 80% of what they want. In coming up with ideas there should be a distinction between what is essential to have and what is desirable to have. If split like that then the resulting system will be better and simpler. Much of what is desirable but not essential may have to be dropped. And even some of the essential things may need to be thought around and rehashed to achieve simplicity while delivering functionality.

Intermediate data sets

The biggest favor you can do yourselves is to report from intermediate "value added" data sets. These data sets will have all calculated values present in them. But since these pages are for discussion then I will gives my reasons and thinking behind recommending this? It might not be right for your site.

For me, "simple is best". If your code is simple it will be easier to maintain and it will last longer. Complex systems need a skilled base of people to firstly write them and another skilled base of people to thereafter maintain them. Programmers who create complex systems are creative programmers and will maybe not fit comfortably into a maintenance role. They will move on and you will need a base of equally skilled people to maintain the system. So it is possible to support a complex system provided you have a good base of very skilled programmers. They will be the ones who make the amendments and "validate" them. Efficiencies can be achieved like this as well as an assurance of quality. But we live in competitive times. The work that was done almost exclusively by programmers within pharmaceutical organizations is now going out to CROs and software houses where labor is cheap. Accountants in pharmaceutical companies are trying to reduce costs. I would say that the days of complex systems and the base of skilled programmers within organizations are numbered. They are too expensive. If we can sweep away the complexity and replace it with a "simplicity" we can trust, then we are on the way to becoming more efficient.

Listing and table code would be a lot simpler if listings only listed and tables only tabulated. If you can avoid deriving variables within your listing and table code then your code will be shorter and simpler. Also it is unwise to derive values in your listing and table code since the same value might end up getting derived in different ways leading to different values in some circumstances. So if you need to report on a derived variable it is better to put it in the data set itself. So it is better to report from derived data sets with all the important values added. That way there will be consistency with the values as well as simpler table and listing code. There is less code in coding something once rather than many times. And if your code is in the creation of your "intermediate" data sets (or whatever else you choose to call them) then this code will be relatively stable from study to study and reusable.

So to summarize, it is better to report from intermediate "value added" data sets with all derived values present and reduce your listings and table code to "listings only list, tables only tabulate" and that way make the code simple, maintainable and reusable and allow very few exceptions to that rule.

The Population data set

You will find it very useful to have a data set somewhere that lists what subjects are in what treatment arm and what population they are in such as ITT, Per-Protocol etc.. But it is better to keep this as a separate data set from those that hold information about each subject. The reason being that you should allow for a subject to be in more than one treatment arm for the safety population. It happens very rarely that a subject will either be given a drug from a different treatment arm or share a randomized medication with another subject in rare circumstances. This means that for the safety population then if they experience an AE whilst on the wrong medication then you may want to represent that person in more than one treatment arm when doing your safety reporting. It is complicated and it is a nuisance but is a fact of life, sometimes. But if you have a separate population data set then you can add a duplicate entry for a subject with just the safety population flag set in both cases for when these rare situations occur.

Another thing for the population data set. It keeps code a lot neater if you use numerics for this and 0 to indicate "not true" and 1 to indicate "true" and give them names like "itt", "perpro", "safety" and "all". In other words give these variable the same name as you would typically pass into a macro as a parameter value for these populations.

tmtord vs. tmtarm

The variable that identifies the treatment arm often has alphabetic value such as "A", "B", "C" etc. What I would suggest is to set up a derived numeric treatment variable, that you might like to call "tmtord", such that 1 corresponds to the first treatment arm you want to appear on the report - either first column or first page, and 2 corresponds to the second etc. and that you use this tmtord in all your reports and never the tmtarm (or whatever is the corresponding variable name).

Other treatment variables

There is no reason why you should limit yourself to one treatment order variable. You may want to divide up the analysis and there may be some advantage in adding other treatment variable based on tmtord and other things such as age range. You could then assign numbers to these to match the order you want them to appear in the reports. Each treatment variable you create should have its own format so that it is clear what each value represents. And if it has its own format then you can use the popfmt macro to give you population totals so long as these extra treatment variables exists in the population data set.

A Code Repository

One aspect to achieving greater efficiency should be more sharing of code and techniques and the elimination of needless repetition of work. Nobody should be writing code for the same listing or table twice. There should be one good example of this code kept somewhere and indexed in such a way that programmers know where to look for it. To this end, I would recommend setting up a code repository where all the tables and listings code members have meaningful names that explain what they do and maybe another library where the first detailed page of the output is shown so people can see at a glance whether it is suitable for use. And this library will need to be maintained by a skilled programmer so that new members that have been well written can be added. The code to build the value-added data sets could also be stored there. Maybe different versions to cope with different styles of input data together with an explanation. For statistical programs there could be a different code library maintained by statisticians and worked over by a skilled programmer to make sure the code is understandable and efficient.

The ultimate form of this idea would be to have such a complete library of code that you could pick out the code members you want, map them to new names if required and have them automatically copied down to your study programs library with names changed and maybe even with listings and tables titles generated. But if somebody were to write such a system then I would recommend keeping it simple.

Making Code Independent of Treatment Variable and Population

Obviously, somewhere in your code, you will be selecting on population and possible treatment variable, if you use more than one of these. Let us suppose each code member starts with a project macro that does things like set up global macro variables and does a call to the jobinfo macro to set up important global macro variables to do with the job. You could pass the population and treatment to this project macro and have it do some useful extra work for you. It could set up a global macro variable called _pop_ that contained the identity of the numeric population variable and also _tmtvar_ and _tmtfmt_ set to the treatment variable you will be using and its format. Once these are set then you can use the expression "where &_pop_" in your code and all references to the treatment variable could be to &_tmtvar_ instead. That way the same code member could work for different populations and treatment groups.

Go back to the main page.

E-mail the macro and web site author.

_