Is SEND the key to unlocking Historical Control Data?

“You wait all day for a bus and then 3 come along at once.” It’s a phrase I used to hear a lot in my younger days when I would often ride public transport. There’s been some of that going on this week, though not with busses. For me, this week it’s been the role of SEND in Historical Control Data (HCD) systems.

It must have been a year or two since this topic last cropped up, but this week, completely coincidentally, it’s been raised several times by different people in completely different contexts. When that topic is raised, immediately people start considering the role of standardized data, and the possibilities SEND brings.

Any potential HCD system has three components:

  1. A large wealth of data to draw on
  2. A way of harmonizing those data, particularly when the data come from different sources utilizing different structures and terms
  3. A tool to query, aggregate and visualize the data

As an organization that for many years was simply a software vendor, tool development and data harmonization would have sat well within our comfort zone. Yet without being able to draw on a significant volume of electronic data, there wasn’t much value in developing such tools.

The SEND standard is now opening that up as a possibility. This is because more data are becoming available since CROs are no longer just supplying the PDF study report, but also providing standardized electronic data.

Standardized data means consistent data, regardless of CRO or data collection system. That’s the idea that really opens up the possibilities for HCD systems. That final stumbling block to the value of a system, is now overcome.

As well as some organizations drawing on their own data, others are considering the possibility of pooling their control data. It’s an intriguing possibility. Some SEND tools, like Instem’s SEND Explorer, already have build-in visualizations for querying Historical Control Ranges. These would provide far more value when hooked up to such a vast database. This then came with questioning if there’s need for independent curation of the data, and maintenance of the database.

Anyways, having not thought about HCD for a while, I was asked about it in the context of our own data collection systems, then a query about SEND Explorer’s functionality, and then the possibility of data curation. “Just like busses, three come along at once.”

Till next time,


Is the latest Technical Conformance Guide update, the most important to date?

It was late 2020, during the FDA public webinar, as part of the CDISC face-to-face meeting, that the agency made the simplest of statements, which seemed to turn the world of SEND upside down:

The placement of a study into the eCTD format does not determine the SEND requirement

It was just one little line sitting innocuously amongst many others. Online chat lit up. For years, the CDISC fraternity had proclaimed the eCTD section, together with the study start date, as the key foundations for determining, absolutely, if a study required SEND datasets or not.

On that October morning, without fanfare or forewarning, the agency let us all know, that simply was not the case. What followed was a period of questions and honestly, confusion. Then, last week, the FDA’s Technical Conformance Guide v4.8 was published. Three additional pages of text were added under a new section called, “Scope of SEND

Well, what does it say? It opens with the line, “The following is the Agency’s current thinking of the scope of SEND…”. So, everything stated is just the ‘current thinking’, and this target is moving.

It then broadly states that SEND datasets are required for as many studies as possible, an idea probably best captured in the line, “If the nonclinical pharmacology or toxicology study is required to support a regulatory decision by the Agency… then the nonclinical study would require SEND.”

Within this sweeping statement, the document then deals with some of the specifics, particularly focusing on areas that we’ve heard questioned over the past year. What happens when one study type incorporates endpoints from another study type? The document uses the example of general toxicity studies that incorporate cardiovascular safety pharmacology or genetic toxicity. It states that the study is still in scope for SEND with the expectation that any data which could be rendered in SEND, should be rendered in SEND.

The guide states that the age of the subject doesn’t impact the decision as to whether or not SEND is required. It states that SEND is required for juvenile studies as long as the study does not “…include multiple phases [which] cannot currently be modelled…

It further states that, “The requirement for SEND is not limited to the drug substance” nor “…study report status or the finalization of the study report…” Also, regarding the GLP status “As both GLP and non-GLP toxicity studies may be submitted to the FDA to support clinical safety, the decision for inclusion of SEND is independent of GLP status.

Over the course of three pages, the agency is attempting to be as clear and helpful as they can, to show that if a study can be rendered in SEND, for a study that informs the decision for the clinical safety of the drug, then those study data should be in SEND.

However, the text is littered with caveats and exceptions, and so appears both explicit and vague at the same time. For this reason, it encourages sponsors to enter into “discussion with the review division when there is any ambiguity on the SEND requirement…”. We have to appreciate that this is an effort to clarify the expectations, and though it seems to leave many questions unanswered, one thing is certain, they are encouraging the submission of SEND datasets.

So, does your study require SEND? That question just got a whole lot harder to answer, but certainly the answer is now more likely to be, “Yes.”

Till next time,


Why Define-XML files give me the Happy Mondays

During my formative years, there was a band from the north of England, not far from where I was growing up, called the Happy Mondays. Probably the most notable thing about them was one bandmember called ‘Bez’. His contribution was somewhere between cheerleading and performance art, as all he appeared to do was dance like a maniac while shaking maracas like his life depended on it. Nobody really knew why, yet he was a key member of the band. We couldn’t hear the maracas, so he wasn’t adding anything musically, but somehow the band would not have been the same without him.

Bez is the best metaphor I can think of to describe how most of our community views that strange little xml file that accompanies the SEND xpt files as part of the study package: the Define-XML file. It doesn’t contain data, just information about the data. We are not quite sure what it’s adding, we just know the package wouldn’t be complete without it.

I’ve heard it said that the FDA does not use the Define files. I’ve also heard it said that they do use them, and they are necessary for loading the data. I have seen FDA feedback provided to sponsors which remarks on the Define file. Still, I think there’s some confusion across the industry regarding what the FDA actually do with the Define file.

Most commercial SEND solutions produce some default Define file populated with some basic information, usually based off a generic template and often, not too study specific. The file follows the Define-XML standard, but it isn’t really adding any value. Usually, the intention is that such files are the starting point and it’s assumed that the organization will complete them manually to tailor them to the individual study. From what I have seen, some organizations are doing exactly this, and manually editing the XML. However, some organizations are not doing this and instead, simply supply the default file that has been auto generated.

The three reasons why some organizations only supply the default file, are quite clear:

  • They are unsure of the value of a well-formed Define file
  • They do not have the necessary tools to produce a well-formed Define file
  • They do not have the necessary expertise

This has been the way for some time now, but I’m starting to see a change in the tide here. As mentioned earlier, the FDA often include discussion of the quality of the Define file in their sponsor feedback. Also, tools will continue to be developed and improved to accommodate this.

The Define-XML standard is something separate from the SEND standard. The standard is the same for both clinical datasets (SDTM) and nonclinical (SEND). For this reason, this week I’ve been learning a little about how Define files are both produced and used in the clinical world. It appears that clinical have tapped into the value and purpose of the Define file, yet in nonclinical, we often still view them in a similar way to how my teenage self, viewed that maraca shaking curio.

Maybe in a future post, we can discuss some of the opportunities afforded by having well-formed Define files.

Till next time,


Can SEND datasets be fully compliant…but still wrong?

In a recent post, we discussed how there’s quite a bit of emphasis at the moment on ensuring SEND datasets are compliant with the SEND Standard. Obviously, the main driver here is the activation of the FDA’s technical rejection criteria, which will result in the agency automatically rejecting applications which do not meet the required criteria. We’ve also seen the CDISC’s move into automated compliance checking with the initiation of the ‘CORE’ project. In fact, the whole topic of ‘compliance’ seems to be the biggest talking point in the world of SEND right now.

It should go without saying that no matter how compliant the datasets are, they also need to be a correct representation of the study data. However, it’s clear that some organizations are putting effort into compliance but forgetting the more basic principle of making sure the data are correct. I mean, there’s no point having beautifully rendered SEND datasets, with all variables populated and formatted in accordance with the standard, if the actual values are incorrect.

For clarification, I’m referring to occasions when a result in a SEND dataset doesn’t match the value in the study report. Typical issues can be things like having result data, say bodyweights, which are being reported in SEND for dates long after the subject was terminated; or negative lab results that should never be negative. All examples of data which can be rendered in a compliant manner, but still just obviously ‘wrong’.

Some of us may struggle to believe that could really occur, but it’s surprisingly easy to see how that can happen. Some organizations will collect data, complete the study, produce the PDF tables, review them and any such errors found would be corrected in the tables directly. At some point later, the SEND datasets are produced from an export from the data collection system. At this point, any corrections in the tables are not reflected in the data collection system and therefore not in the SEND Datasets.

Other practices and process which don’t consider SEND from the outset, can result in similar issues. For this reason, I continually quote what I consider to be one of the most important, and most often overlooked statements in the FDA’s Technical Conformance Guide (Section General Considerations) “The ideal time to implement SEND is prior to the conduct of the study as it is very important that the results presented in the accompanying study report be traceable back to the original data collected.

So, the first issue is that without proper forethought, incorrect results can occur. The second issue then is that they may be difficult and expensive to detect.

To a large degree, automated tools can be developed to check conformance, particularly with the publication of CDISC conformance rules and FDA validation rules. However, checking that a result in a SEND dataset matches the corresponding PDF table is something that still requires a human touch.

So yes, we need to ensure conformance to the standard, but how much more important is it to ensure that results themselves are correct?

Till next time,


Did you see the recent paper from the JPMA SEND Taskforce Team?

Okay, first – some context…

Without the FDA requiring SEND datasets, we would not have seen the industry-wide adoption and implementation of the standard. The change made by the industry, continues to fascinate me, in terms of both speed and scale.

This drive for submission provides us with a well-defined standard, and one that is well suited to single study review. However, it is becoming increasingly more apparent that there are some shortcomings for cross study analysis and data mining. The reason for this is that, while SEND allows for accurate representation of a study’s results in electronic, machine readable form, it also allows for a significant variability from study to study.

Now that I have set the scene,  I’d like to discuss a recent paper by the Japan Pharmaceutical Manufacturers Association (JPMA) SEND Taskforce Team. This describes their analysis of multiple SEND packages from a variety of suppliers. It details key areas of variability in how data are represented from study to study. If data mining and cross study analysis gets you as giddy-as-a-kiddy on a Christmas morning, then I’d highly recommend that you take a deep dive into the paper for yourself. It contains very detailed results, calling out things like the specific variables that are most prone to variation between providers.

One of the key areas that the paper discusses is the scope and application of SEND Controlled Terminology (CT). It will not come as a surprise to anyone who is routinely working with SEND Datasets, that many key variables do not have CT defined for them. They allow for a free text description. The paper calls out many examples, including Clinical Signs where even variables like the test name, and the severity are not controlled.

Stepping away from Clinical Signs, the discussion on CT reminded me of work being conducted by PHUSE regarding the lack of CT for the Vehicle being used on the study. While, for single study analysis, a free text description is perfectly adequate, when it comes to data mining, the lack of CT proves problematic. For this specific issue, PHUSE are recommending a particular structure, format and nomenclature be used to describe the vehicle.

Such recommendations, to enforce supplementary rules and standardization – essentially, further CT in addition to the regular SEND CT – adds complexity to the creation of SEND datasets. That complexity will then increase the time and cost to produce SEND Datasets. That discussion will open up another debate, which I’ll leave for a different day.

Suffice to say, that SEND provides an accurate representation of a study’s results in electronic form, well-suited to single study review. However, there are shortcomings relating to multi-study usage, but these can be overcome. The JPMA paper does a very good job of calling out these issues to address.

As usual, drop me a note if you like to discuss this further

Till next time,


There’s a theme developing here

Human beings have an inherent ability to see patterns in everyday objects, like recognizing shapes and faces in clouds. While that might seem ridiculous, pattern recognition is vital for us. Without it, we would not be able to do things as varied as being able to recognize faces; find the answers in a Word Search; or even appreciate music. However, this week no advanced pattern recognition skills were needed to see a common thread emerging.

My last blog post discussed the details of the FDA’s Technical Rejection Criteria (TRC). With little over one month to go (September 15, 2021) before the agency activates the TRC and begin rejecting submissions, it’s unsurprising that it’s proven to be my most popular post. Since it went live two weeks ago, there’s been a Federal Register Notice to confirm the September effective date for the TRC.

In addition, this week, we’ve seen CDISC launch the CORE project. This is a joint CDISC and Microsoft project to automate the application of the Conformance Rules. Launching with an hour and a half presentation, we were introduced to the vision and timeline for an exciting new technology that will automatically check for conformance to CDISC standards. CDISC started by creating standards including SEND, but then they added a written set of rules to ensure compliance to that standard. However, different tools may interpret that standard and even those rules in subtly, or possibly dramatically different ways. Regular readers of my blog will be well aware of my opinion on the difficulties caused by having a flexible, subjective standard that is open to interpretation. CORE sees CDISC take back ownership of the application of the rules to ensure that conformance is anything but subjective. Into this project, they are also adding the execution of non-CDISC rules, including the FDA Business Rules. CORE is something I’ll be watching and reporting back on in my blog as it takes shape.

This past week also saw CDISC publish SEND Conformance Rules v4.0 for evaluating the conformance of SENDIG-DART v1.1,  SENDIG v3.0, SENDIG v3.1, and SENDIG v3.1.1 datasets.

Throughout the week, I’ve been pouring over the FDA’s feedback on a few different studies, in order to assist some of our customers. When the FDA receive a SEND study, they will respond privately to the sponsor with feedback about their SEND datasets. This private feedback is one of the main methods employed by the FDA to continually improve the quality of SEND datasets being produced.

At first, these things: the TRC, CDISC CORE, new SEND Conformance Rules and the act of reviewing FDA feedback for several studies; may seem like a disparate set of unrelated topics, however standing back and reflecting at the end of the week, a clear pattern starts to emerge. Though independently, and seemingly co-incidentally, the quality of SEND datasets is being questioned and the bar is being raised. To that end, technology is being developed to ensure that we are all compliant, and that compliance is no longer a matter of opinion.

Till next time,


How to avoid rejection

It’s a basic human need: We want our work accepted and valued. Nobody wants to see their work rejected. It’s so obvious, it almost goes without saying. Worse still would be being rejected by a cold, heartless automated computerized system. That would be soul destroying. However, with the FDA’s latest deadline, automated rejections will start to occur.

In this post I’d like to discuss how to avoid rejection of your electronic study data – especially since we are less than two months (September 15, 2021) away from the enforcement of the Technical Rejection Criteria (TRC).

Up to now, failure of the TRC has not resulted in rejection, but instead FDA have simply issued warnings alerting the submitting sponsor to the issue. At the FDA’s recent SBIA (Small Business Industry Assistance) webcast, the agency presented data showing over 200 studies had been issued with such warnings in a single month (March 15, 2021, to April 17, 2021). From September, all such studies would result in a rejection instead of a warning.

I’ve included a link to the TRC at the end of this post and at first look, the rules can seem quite complicated. There’s mention of a TS domain, certain eCTD sections where the TRC does not apply, discussion about the study tagging file and so on.

At Instem, we are well versed in helping our customers navigate the intricacies of the TRC, and for certain studies, this can get quite complicated, however, in its simplest form, the principle is first to check if the eCTD section is applicable, and then to check the study start date.

Certain sections of the eCTD are exempt from TRC. These sections are listed in the TRC so we can easily check if a particular study would be exempt. If the study is to be placed in an applicable section, then the next things to consider are the study start date, submission type and study type.

For an IND submission:

  • Single Dose Toxicity, Repeat Dose Toxicity, and Carcinogenicity Studies starting after December 16, 2017 require a full SEND package
  • Cardiovascular and Respiratory Safety Pharmacology Studies starting after March 15, 2020 also require a full SEND package.

The NDA/BLA requirement is exactly the same as IND, expect in each case it is a year earlier, so December 16, 2016, for Repeat Dose Toxicity, for example.

However, even if the study starts before these dates, it is still not fully exempt if it still falls in the appropriate eCTD section. In that case, a ‘Simplified ts.xpt’ file is still required. This is an electronic data file stating the Study ID and the start date. While not a SEND file itself, it’s format generally follows the SEND standard for the TS domain, albeit with just a single record.

As I said when I started this post, nobody likes being rejected, so it’s vital that sponsors understand the TRC requirement as it becomes enforced in September. The TRC is just the first step in the process employed by the agency in ensuring they are receiving good quality, usable SEND datasets. Maybe in a future post we can discuss some of the further measures they employ.

Till next time,


Access the full TRC here:

A unique insight into SEND

This week I caught up with Debra Oetzman from our SEND Services team. As one of the authors of the SEND standard, an active CDISC and a PHUSE volunteer, many of you may have already interacted with her. At Instem, Debra performs verifications of datasets from suppliers right across the industry.

We got talking about the blog, and some of the issues it raises and she said to me, “One of the things you’ve mentioned in your blog previously, Marc, is how flexible the SEND standard is and how that is both good and challenging at the same time.  When doing a verification, this is actually one of things that makes it more difficult…and more interesting…at the same time!

Our Verification service provides an opportunity to see datasets created for an array study types by a fair number of providers.  Certainly, when reviewing a v3.1 dataset, if you see ‘NORMAL’ in MISTRESC instead of ‘UNREMARKABLE,’ you can look at the SEND IG and determine ‘that is incorrect.’  However, if you look at some of the Trial Designs…well that is a gray area.  Does the set up represent the study?  Would it cause an issue with analysis?  Does the presentation cause (or increase) ambiguity…or maybe it is unusual, but it decreases ambiguity?  Does it align with the study report and protocol?  Are there errata, best practices, TCG statements, or other documentation that may influence a recommendation one way or the other?  When reviewing a dataset package, we try to put ourselves in the shoes of the reviewer…if we, who live and breathe SEND every day, are struggling to understand the package, perhaps we can provide some suggestions to make it easier once it is submitted.”

I also asked Debra about the quality of the SEND datasets she sees and she said “That is something else you have touched on in your previous blogs…getting to the point of having a large volume of good quality datasets.  We have been fortunate that the FDA has been willing to feed back to industry some of the pain points they have encountered when reviewing datasets.  Their Webinars are so useful to the industry and to the CDISC organization when working on the next version of the IG or, proactively, when creating standards for a new domain…making sure not to add the same pain point!

We’ve heard people say that they don’t believe the datasets are being used at the agency.  Certainly, there are many of us who have been in enough CDISC meetings with FDA representatives to KNOW that SEND datasets are being used whenever possible…perhaps not all of them, or all parts of all of them, and certainly that comes back, at least in part, to quality.  If the datasets do not follow the standard, if they do not align with the study report, or if out of three single-dose tox studies each has been presented differently, that makes it much more difficult to get reviewers trained and buying into the fact that ‘standardized’ data will make their lives easier.

I would say that, overall, the quality of the datasets we review has improved over time.  We can tell when vendors have made improvements to their systems (both collection systems and SEND generation systems) that have been implemented at different facilities; we can tell when processes have been adapted to make ‘good SEND’ easier to produce; we can tell when protocol and report templates have been improved to include some of the metadata that makes the dataset more traceable.  There is certainly still room for improvement, but the direction is very encouraging.”

I really appreciated getting her point of view and I thought you might find it insightful too.

Till next time


Is SEND really that exciting?!

It was one of the finest moments of my career, but it didn’t exactly get off to a great start. At the Safety Pharmacology Society, I was invited to speak at the DSI Data Blast (think Safety Pharmacology meets WrestleMania). I introduced myself and explained that I was going to be talking about the Standard for the Exchange of Nonclinical Data. They booed. Actually booed. As anyone familiar with the Data Blast will know, it was a forum where booing wasn’t exactly discouraged, but still, this seemed more vehement than usual, and I swear I heard someone shout “No! We hate Standards!”. Actually, putting that in writing makes it sound far worse than it really was. In truth, it was more jovial than aggressive. I guess you needed to be there.

Still, the idea of ‘standards’ doesn’t exactly set the world alight. SEND is a data standard. A formal set of rules dictating how data are to be represented. Can you think of anything more dull, dry and less inspiring?

Why is it then, that this week I heard someone say, SEND is almost a religion to some people, they get so passionate about it and it was greeted with knowing smiles and affectionate nods. I thought “Yes, to some people it is, and you are addressing a few of us right now”.

In my first blog post, I admitted I was a total geek for SEND. That shows no signs of dissipating, and I know I’m not alone. It’s true, some people get really passionate about SEND. What, to some people would seem a dry, dull topic; can get some of us fully animated and soapbox preaching for hours.

For the uninitiated, it is a strange phenomenon to behold, but those of us up to our elbows in SEND really, really care about it.  We will give chapter and verse on the correct population of a variable, how that variable is to be used and the implications of the data representation.

Yes, we like feeling good about knowing how helps FDA reviewers be far more efficient enabling them to spend more time evaluating drugs than manipulating PDFs. Yes, we get excited by the possibilities of SEND, the opportunities it brings for data mining, cross study analysis and Historical Control Data. Yes, we’ll get enthusiastic about the benefits of being able to exchange clear, accurate electronic data. But, more than that, we are completely obsessive about our standard itself. Yes, we believe that this is our standard and so we are completely zealous in our dedication to ensuring that our variables are used and never misused.

Of all the things we could be so emotionally invested in, who would have thought that our obsession would be the standardization of the exchange of nonclinical data?

I started this post recalling the story of the most fervent display of resistance to SEND that I’ve ever witnessed. I fully accept that opinion, as I’m sure that there’ll be some readers of this blog with that opinion. Yet, I’m also certain that there’ll be readers who can fully relate to my efforts to capture in words, just how energized we can get about SEND.

Till next time


Proudly, I still have the T-shirt from the wonderful Data Blast!

The pros and cons of having a flexible standard

I feel I need to make a confession. I need to admit that being a vendor of SEND software and services, drives a strong bias in how I believe the SEND standard should be defined and implemented.

In the last couple of postings, we’ve been discussing how flexible, or ‘subjective’ the SEND standard is. You may have noticed from my tone, that I have a particular dislike for such flexibility.  I’ve tried to hide it, but unfortunately my own bias seeps through.

To explain how being a vendor drives that bias, I thought we’d have a quick look at a few of the pros and cons of having such a flexible standard.


Flexibility was meant to ensure that the SEND Standard was easier to adopt. Due to the vast differences in how some data are collected and reported, the SEND standard couldn’t be too prescriptive in what it expected of data. It needed to be loose enough that if the study had a particular piece of data or metadata associated with a result, then there was a place for it, but if it wasn’t captured, then that was ok too.

That has the advantages of meaning that all organizations can embrace SEND, regardless of whether or not they capture this particular piece of information. The same principle applies to timing variables. Looking at the study data, if there are only calendar dates, then that’s fine. If there are dates and times, then that’s fine too. However, if there are just study day numbers instead, then that’s equally acceptable. Just populate what timing information there is and don’t worry about what’s missing.

Taking this approach means we have a far more inclusive standard that should allow for a wider range of studies and collection methods.

That sounds great, right? So, what’s the problem?


For many organizations consuming SEND datasets, the lack of consistency from study to study presents various issues. The obvious one being a hamstringing of cross-study analysis. How can we compare 2 similar studies when the data are rendered so differently? This is a common struggle for anyone trying to develop such tools. It is difficult when we can’t rely on variables being populated in a consistent manner. As an example, just last week we were debating the Nominal label, as this is a variable that, as well as being able to contain either a day or week number, may or may not also include timepoint information, categorical information, or scheduling information. All of these are acceptable uses, but what does that mean for the tool consuming the data?

Another difficulty is that each organization has developed their own interpretation of the standard. That makes things difficult for the humble vendor who needs to produce tools to both create and consume SEND in a range of different ways to suit the various organizations. So, obviously my bias is coming through strong here as I’d much prefer it if we could all just do SEND the same way. No options, no flexibility.

I started this post with the confession that, as a vendor, I have a poorly concealed bias against allowing SEND to be so flexible, and by end, the post is laying that bias out for all to see. As well as making things much easier for the tool developers, I also think the whole industry would benefit from SEND being more restrictive. Whether you agree or disagree, I’d love to know your opinion. Emails to the usual address

Till next time