adam sas
adam sas
Preparing Analysis Data Model (ADaM) Data Sets and Related Files for FDA
Submission with SAS®
Sandra Minjoe, Accenture Life Sciences; John Troxell, Accenture Life Sciences
ABSTRACT
This paper compiles information from documents produced by the U.S. Food and Drug Administration
(FDA), the Clinical Data Interchange Standards Consortium (CDISC), and Computational Sciences
Symposium (CSS) workgroups to identify what analysis data and other documentation is to be included in
submissions and where it all needs to go. It not only describes requirements, but also includes
recommendations for things that aren't so cut-and-dried. It focuses on the New Drug Application (NDA)
submissions and a subset of Biologic License Application (BLA) submissions that are covered by the FDA
binding guidance documents. Where applicable, SAS® tools are described and examples given.
INTRODUCTION
The purpose of this paper is to describe how to assemble analysis data and related files for the
submission of NDAs and most BLAs to FDA CDER and CBER. The deliverables discussed are analysis
datasets, other files related to analysis datasets, analysis programs, data definition files (define.xml) and
the Analysis Data Reviewers Guide (ADRG).
The material included here is based on requirements described in the two December 2014 FDA Binding
Guidance documents:
Providing Regulatory Submissions in Electronic Format — Submissions Under Section 745A(a) of
the Federal Food, Drug, and Cosmetic Act
Providing Regulatory Submissions In Electronic Format — Standardized Study Data
Three other FDA documents that are related to these binding guidance documents and contain material
relevant to this paper are:
Data Standards Catalog v4.5.1 (08-31-2016)
Study Data Technical Conformance Guide v3.2 (October 2016)
Technical Rejection Criteria for Study Data (Revised 11142016)
Additional documents used to compile this paper are published by the Clinical Data Standards
Interchange Consortium (CDISC), the Computational Sciences Symposium (CSS) workgroups, and the
Japan Pharmaceuticals and Medical Devices Agency (PMDA).
The References section of this paper contains links to websites where all of these documents can be
downloaded.
1
(2) follows the ADaM fundamental principles defined in the ADaM model document and
adheres as closely as possible to the ADaMIG variable naming and other conventions.
Non-ADaM analysis dataset – A non-ADaM analysis dataset is an analysis dataset that is not an
ADaM dataset. Examples of non-ADaM analysis datasets include:
• an analysis dataset created according to a legacy company standard
• an analysis dataset that does not follow the ADaM fundamental principles.
This same document includes a figure showing the relationships of these types of datasets:
Basically, an analysis dataset is either an ADaM dataset or a non-ADaM analysis dataset. There are
three standard structural classes of ADaM datasets:
• ADSL (Subject-Level Analysis Dataset)
• BDS (Basic Data Structure)
• OCCDS (Occurrence Data Structure) if using ADaMIG v1.1; or ADAE (Adverse Event Analysis
Dataset) if using ADaMIG v1.0.
Occasionally, there may be an analysis need which no standard structure can address. For example, no
standard structure enables generation of a correlation matrix of time-varying dependent variables. In that
case, the unmet analysis need can be addressed by designing a dataset with a non-standard structure.
Such a dataset is an ADaM dataset only if follows all of the ADaM fundamental principles and other
ADaM conventions. These true ADaM datasets that cannot follow a standard ADaM structure are
considered to be members of the ADaM Other class of ADaM datasets.
A non-ADaM analysis dataset is any analysis dataset that is not compliant with ADaM. Non-ADaM
analysis datasets are not broken down into structures or classes the way ADaM datasets are.
2
STANDARDS ACCEPTED BY FDA
The FDA Data Standards Catalog v4.5.1 (08-31-2016) lists all supported and required standards. For
analysis data, the only standards included are ADaM v2.1 and ADaMIG v1.0.
The FDA Study Data Technical Conformance Guide (SDTCG) v3.2 states that they will also accept
standards described in the following CDISC Therapeutic Area User Guides (TAUGs):
• Chronic Hepatitis C
• Dyslipidemia
• Diabetes
• QT Studies
• Tuberculosis
These TAUG standards are developed quickly and are often finalized before ADaM documents can be
updated.
3
To ensure that no data or formatting is lost when creating the SAS transport file, consider using a
validation process such as:
(1) Create a SAS dataset
(2) Create a SAS v5 transport file from the SAS dataset using SAS PROC COPY or the DATA step.
For example:
libname adam "C:\desktop\data\adam";
libname xptfile xport "C:\desktop\data\xport\adsl.xpt";
data xptfile.adsl;
set adam.adsl;
run;
(3) Convert the SAS v5 transport file into a new SAS dataset. For example:
libname xptfile xport "C:\desktop\data\xport\adsl.xpt";
libname new "C:\desktop\data\new\";
data new.adsl;
set xptfile.adsl;
run;
(4) Use SAS PROC COMPARE to compare the new dataset with the original version to check for
discrepancies. For example:
libname adam "C:\desktop\data\adam";
libname new "C:\desktop\data\new\";
4
variables with the same name vary across datasets. Also, in some cases, it may pay to anticipate future
uses such as data integration when setting variable lengths.
The FDA split rule described above was put in place to handle the CDISC Study Data Tabulation Model
(SDTM) data requirement that all data of the same type be put into a single dataset. For example, all
laboratory tests are required to be part of domain LB, even if that means the dataset will be larger than 5
GB.
In ADaM, there is no requirement that all data of the same type be put into a single dataset. Not only do
smaller datasets not require splitting at submission time, they are nimbler and can reduce program run
times. When it makes sense, consider creating multiple smaller, focused datasets rather than fewer large,
cumbersome ones.
Figure Error! Use the Home tab to apply 0 to the text that you want to appear here.2: copy of “FDA
SDTCG v3.2 Figure 1: Folder Structure for Study Datasets”
Additionally, ADaMIG v1.1 describes that for ease of use with the define file and in the eCTD folder
structure, all analysis datasets for a study should be kept in a single folder, either adam or legacy, using
the following rules:
• If a set of analysis datasets includes an ADaM-compliant ADSL dataset (as required for a CDISC-
conformant submission), then the whole set of analysis datasets for that study belongs in the
adam folder
5
• If not, the whole set of analysis datasets for that study belongs in the legacy folder.
Figure Error! Use the Home tab to apply 0 to the text that you want to appear here.3: Analysis
Dataset Submission Folders
MISCELLANEOUS DATA
Figure 2 includes a folder called misc. The FDA SDTCG v3.2 specifies that miscellaneous datasets,
which don’t qualify as analysis, profile, or tabulation datasets, should be put in this folder.
Although not specified in the SDTCG, miscellaneous datasets would include any data not captured in
SDTM but used to create ADaM datasets. Look-up tables, such as a list of prohibited concomitant
medications, and deviations collected somewhere other than on the CRF are examples of this
miscellaneous data.
6
ANALYSIS PROGRAMS
Recall that all the analysis datasets for a study are placed in either the adam or legacy datasets folder.
Within each of these folders, at the same level as the datasets folder, is a programs folder. The FDA
SDTCG states that the programs folder is where to put programs used to create analysis datasets,
tables, and figures associated with primary and secondary efficacy.
Figure Error! Use the Home tab to apply 0 to the text that you want to appear here.4: Analysis
Programs Submission Folders
The FDA SDTCG document describes that the purpose of these programs is to understand the process
and confirm analysis algorithms. This implies that that programs not expected to be run directly on the
FDA system. The SDTCG requires that submitted programs to be ASCII text files (*.txt) or PDF files
(*.pdf).
7
DEFINE CONTENT
CDISC has useful document packages on define.xml that can be downloaded for free. In addition to
robust specifications, these document packages each include examples of how to lay out a define.xml
file. The Analysis Results Metadata Specification v1.0 for Define-XML v2 (Jan 2015) contains examples
and instructions for creating all the metadata needed for an analysis dataset submission:
Dataset-level Metadata
Variable-level Metadata
Parameter Value-level Metadata, when appropriate
o Note that Value-Level Metadata is essential for describing ADaM Basic Data Structure
datasets containing metadata that vary according to analysis parameter
Results-level Metadata (recommended for critical analyses)
Controlled terminology and codes
Links to other documents, such as
o Statistical Analysis Plan (SAP)
o Analysis Data Reviewers Guide (ADRG)
DEFINE VERSION
The FDA Data Standards Catalog v4.5.1 lists define.xml v1.0 and define.xml v2.0. The define.pdf is not
included in the Data Standards Catalog v4.5.1, but it was a former standard and might be allowed via a
waiver.
The DSTCG recommends using the standard define.xml v2.0. One reason for this recommendation is that
version 2.0 allows printing of the define.xml file, something reviewers regularly need to do. Additionally,
define.xml v1.0 only included dataset-level and variable-level metadata, because it was written before any
of the current ADaM documents and designed specifically for the submission of SDTM data. The
define.xml v2.0 added value-level metadata. The Analysis Results Metadata Specification v1.0 for Define-
XML v2 (Jan 2015) added results-level metadata, and is the best option to accompany ADaM datasets.
8
Below is an example of some typical define files. Note that they are shown here along with the ADaM
datasets.
Figure Error! Use the Home tab to apply 0 to the text that you want to appear here.6: Folder for
Analysis Data Definition file
ADRG PURPOSE
The introduction of the CSS ADRG Completion Guideline describes that the purpose of the submitted
ADRG is to provide “FDA Reviewers with additional context for analysis datasets (AD) received as part of
a regulatory submission.” It goes on to state that the “ADRG purposefully duplicates limited information
found in other submission documentation (e.g., the protocol, statistical analysis plan, clinical study report,
define.xml) in order to provide FDA Reviewers with a single point of orientation to the analysis datasets.”
It also notes that “submission of a reviewer guide does not obviate the requirement to submit a complete
and informative define.xml document to accompany the analysis datasets.”
The DSTCG states “The ADRG provides FDA reviewers with context for analysis datasets and
terminology, received as part of a regulatory product submission, additional to what is presented within
the data definition file (i.e., define.xml).” and also “It should be noted that the submission of an ADRG
9
does not eliminate the requirement to submit a complete and informative define.xml file corresponding to
the analysis datasets.”
The Analysis Data Reviewers Guide (ADRG) package was created by the Computational Sciences
Symposium (CSS). A zip file with a template, guidelines for completion, and examples can be
downloaded from phusewiki.org, and the CDISC Analysis Results Metadata Specification v1.0 for Define-
XML v2 also contains an example ADRG.
ADRG CONTENT
The ADRG is set up with standard sections and leading questions to prompt on what to say.
The section on Dataset Processing is a good place to explain any complex data flows. For example, the
figure below shows the dependencies for a suite of ADaM datasets. Here ADAE, ADLB, and ADTR are
used to create ADTTE; then ADTTE and ADBASE are used to create ADEFF:
ADSL
ADTTE ADBASE
ADEFF
10
SUMMARY
For ADaM data, the following figure summarizes what to submit where:
Datasets
(SAS v5
transport)
ARDG
Define
files
REFERENCES
United States Food and Drug Administration. 2017. “Study Data Standards Resources.” Accessed
January 30, 2017. http://www.fda.gov/ForIndustry/DataStandards/StudyDataStandards/default.htm.
This site contains all the FDA documents referenced in this paper. It is also where you’ll find
email addresses to ask questions to CDER/CBER.
Clinical Data Interchange Standards Consortium. 2017. “Analysis Data Model (ADaM).” Accessed
January 30, 2017. https://www.cdisc.org/standards/foundational/adam.
This site contains all the CDISC ADaM documents, including the Analysis Results Metadata.
11
Clinical Data Implementation Standards Consortium. 2017. “Define-XML” Accessed January 30, 2017.
https://www.cdisc.org/standards/foundational/define-xml.
This site contains all the CDISC define.xml documents.
PhUSE wiki. 2017. “Optimizing the Use of Data Standards.” Accessed January 30, 2017.
http://www.phusewiki.org/wiki/index.php?title=Optimizing_the_Use_of_Data_Standards.
This site contains the ADRG package.
Japan Pharmaceuticals and Medical Devices Agency. 2017. “Notification No. 0427001”. Accessed
February 25, 2017. https://www.pmda.go.jp/files/000206449.pdf.
This site contains the English translation of the PMDA Technical Conformance Guide.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
Sandra Minjoe John Troxell
Accenture Life Sciences Accenture Life Sciences
[email protected] [email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other brand and product names are trademarks of their respective companies.
12