PDX FINDER DATA FLOW

A summary of the PDX Finder data flow.

(1) Submission
  • PDX data is submitted by PDX producers via data access end-points (APIs) or direct upload of files collection.
  • PDX producers can contact the PDX finder via email to submit data using helpdesk
(2) Standardization and harmonization
  • PDX data is stored in a secure Neo4j graph database.
  • Automated integration processes periodically consume data sources and ensure standardization/adherence to
    PDX-MI and across center data harmonization.
(3) Data Distribution
  • PDX Finder supports multiple search attributes thereby allowing various points of entry into the data (e.g: tumor
    type, diagnosis, molecular markers).
  • Search results are displayed in a table to allow easy visualization, further filtering and selection.
  • Links allow users to drill down in selected PDX model data to discover additional patient, tumour, model attributes
    and associated studies (e.g dose response studies, ‘OMIC’ studies).
  • Links to the data sources the model originate from are provided (5).
PDX data submission

Many academic and commercial sources of PDX models have emerged in recent years and the size of the resources and the processes for creating and characterizing PDX models and their related data is variable (e.g: custom database or individual files/tables collection). To account for this variability, multiple source data submission is achieved using bespoke automatic ingest pipelines. PDX Data can be submitted by PDX producers via direct upload of files collection or via data access end-points (APIs). An interface for submitting PDX model metadata, as well as experimental data will be implemented in the future. A PDX producer can contact the PDX Finder via email to submit their data into the resource using helpdesk

PDX data standardization and harmonization

PDX data is stored in a secure Neo4j graph database which provides a powerful framework for storage, querying and envisioning of biological data. Following data submission, data standardization (adherence to PDX-Minimum information standard) and harmonization is necessary to ensure cross-source data integration. More specifically, PDX model metadata includes clinical, chemical and experimental factors that can be described in a number of ways. Most producers of PDX models adhere to an internal vocabulary partially based on community-developed standards. For example, cancer diagnosis are represented at different resources with NCI-Thesaurus , MeSH , and the Disease Ontology . Other descriptors use free text or a mix of generic, commercial and chemical labels (e.g drugs). Expert knowledge is necessary to map between differing standards in order to achieve data harmonization. As this is a major issue across bioinformatics, ontology resources tools like ZOOMA have been developed to semi-automate mapping of standards which produce unified indexes that facilitate data query and discovery. Following database deposition, multiple source PDX data is standardized and harmonized in a two-step process using both Automated mappings of ontology terms with ZOOMA and manual curation of remaining anomalous events. The automated standardization process will also check data compliance with the PDX Minimal Information standard and skip/report records that do not conform. The standardized and harmonized records will be then integrated into a cohesive ontological data model which supports consistent searching across sources. Ontological annotation will assist users in their search providing representation of the data and its relationship. For example, searching for ‘Breast Cancer’ related PDX models will allow users to uncover all subclasses of breast cancer models in a single query, without having to look for each subtype individually. Rather than impose a limited set of terms to describe a given minimal information attribute, we will accept a source’s internal standards and generate unified semantic indexes to describe the models. We will work with the PDX sources and local experts to will ensure the quality of the generated mappings and provide feedback to the developers of ontology tools.

 

PDX data distribution

PDX data will be served through the PDX Finder interface allowing users query PDX models based on their research needs. To facilitate flexibility for users, multiple search attributes will be supported allowing various points of entry into the data (e.g: tumor type, diagnosis, molecular markers). Search results will be displayed in a table to allow easy visualization and further filtering and selection. Clickable links will allow users to drill down in their selected PDX model data to discover additional patient/tumor/model attributes and associated studies (e.g dose response studies, ‘OMIC’ datasets). Furthermore, links to the data sources the model originate from will be provided to allow users to contact the relevant institution for further collaboration/model acquisition.