As the adoption of digital business models and automation across industries advances, managing data and generating insights from data at scale becomes a key challenge. This is specifically the case for the pharma industry: With data at the heart of their value chain, they have built significant capabilities in handling and analyzing it. However, according to our recent study on the future of health1, the majority of pharma executives reported gaps in their data value chain and said that they feel constrained by a lack of access to valuable data. New business models – especially in precision medicine – put additional requirements on pharma companies. As it is outlined in our study on preparing for the data-driven future of pharma2, pharma companies need to clean up their own data and start to collaborate to get there.
One potential lever to help with this can be the found outside the borders of each company: The increasing availability of new products and services around data. With new market entrants and innovative offerings, the data market is reconfigured in every part of its value chain - exceeding the traditional definition of outsourcing by far. We observe that some pharma companies turn the game around by monetizing data capabilities externally on their own.
Simplified data value chain with examples of observed emerging services
Data as a service can include buying raw datasets from the original collector (originator) or from data brokers that aggregate data from multiple originators. The rationale behind data services is to increase breadth and depth (or both) of available internal data for usage. There are 3 promising developments in this area:
Data brokering: Prominent examples of a data brokers are IQVIA3 or Optum4, offering an expansive portfolio across EMRs, LRx, claims, hospital, genomic, oncology and biomedical data. But they are not the only one, as data becomes relevant to almost all areas of the pharma business. Some providers focus on offering more data breadth, i.e., getting categories of data that the pharmaceutical company has no access to. An array of examples to show these interesting market developments are hospital funded spinoffs that sell deidentified treatment data, e.g., Anumana and Lucem5 Health of Mayo Clinic, Realyze6 of Pittsburgh’s UPMC, Sema47 of New York’s Mt. Sinai, and Paige AI8 of Memorial Sloan Kettering. Next to this, there are data brokers that add data depth, combining thousands of different datasets. Examples of pure data brokers are Datavant9 or Healthverity10. There are also consortia of companies that have decided to commercialize their aggregated data externally, such as Blue Health Intelligence11 or Truveta12.
Data partnerships: The planned EU Health Data Space13, or NHS’ data strategy14, will lay the foundation for data interoperability and aggregation. In advance of these standards, more and more pharma players are already taking action and start establishing win-win-partnerships with healthcare providers or insurances. One typical scenario is that a healthcare provider shares data on treatments and benefits from the analytical power pharma companies now have, for example, to analyze diseases progression and treat early. This ensures advantages for the pharmaceutical company (improving offering), the insurance (lowering the number of severely ill people) and the patients themselves (being treated before becoming severely ill & lower cost). Those data-focused partnerships can even be seen for whole ecosystems on a specific therapeutic area, e.g., the Optima consortium15 in oncology. In other cases, such as Melloddy16 (Amgen, Boehringer Ingelheim, Bayer, GSK, Merck, Novartis and more), pharmaceutical companies establish networks to nourish joint data assets, like a small molecule library.
Data marketplaces: Hyperscale computing providers, such as AWS, Azure and Google Cloud Platform started to create health data marketplaces complementing the analytic toolkit they have been offering to enable data-driven digital processes in their cloud environments. Examples go beyond above discussed data brokerage, e.g., electronic health record data models or workflows to analyze medical images using novel algorithms17.
Before raw data can be used to gain meaningful insights the data typically must go to several extract, transform, load (ETL) and modeling steps. Traditional outsourcing is widely used in this "data engineering". But for data to be used productively, further data management tasks such as annotation and quality management are necessary and unfortunately very costly. This is well illustrated by the example of genetic data: DNA sequences as data sets are not comparable without metadata on their experimental conditions. And only the metadata on the cell series, the organism and maybe the colleague involved in the collection make them discoverable and interpretable. For the outsourcing of such tasks as services, we also see three new areas:
Automated data management: A way to overcome these bottlenecks are AI and automated data management services. Software companies, such as CRM provider Veeva, are developing data products that partially automate data management18. Data that pharmaceutical companies already maintain in a CRM is automatically matched with data products. This allows gaps to be filled, duplicates to be removed, and typos to be eliminated. This is possible, for example, for pharmaceutical customer data (HCP), data on medical experts (KOL) or prescription data (LRx) in a uniform data model.
Crowdsourcing: Another option is the use of crowdsourcing for data management by incentivizing internet users to contribute tiny bits of data, like in Google’s reCAPTCHA or whole reports like in Wikipedia articles. Recent examples are crowdsourced COVID maps19, or MIT & Novartis’ crowdsourcing challenge that helped prepare data for the development of new predictive models20. The University of Washington applied data crowdsourcing with the online game Foldit21. Players with no prior knowledge can compete against each other here to solve puzzles by folding protein structures to best match a target in AIDS research.
Standardized platforms: The observable trend of using standardized (commodity) IT platforms, data models, cloud frameworks and APIs will allow much automation soon. It will take time for actual standards to establish, but there is a recent push by health authorities such as NHS and EMA. UK’s National Health Service (NHS) is explicitly working on data models and dictionaries in the health space22. The European Medicines Agency (EMA) is driving the ISO IDMP standard for the pharmaceutical sector23. The ambitious EU project of the European Health Data Space even foresees a complete networking and standardization of all relevant health data. Especially in combination with the Data Governance Act24 for standardized data exchange at all levels, many internal data management challenges could be externalized here.
Even if Pharma companies managed to capture and bring together data from various sources – e.g. genomic and proteomic with compound knowledge graphs - generating novel and innovative insights stays highly complex. Also here, some techdriven advancements can be observed:
Insight platforms: Scientific institutes, but also service providers such as Google, enable so-called federated learning. A successful example is Nvidia’s service “Clara”: It allows hospitals to collaborate on a global model for brain tumor segmentation. Each participating party trains the model further and profits from the data used on it from other parties, without accessing data that is too sensitive to share25.
Pharma as an insights provider: Pharmaceutical companies themselves also offer analytics as a service, examples being Novartis data4226 and Roche Information Solutions (RIS)27. The step of offering these capabilities, which are important for the company's own internal value creation, externally is a bold strategic move, but one that is in high demand on the market.
Data brokers as insights providers: Even younger companies that previously specialized in data brokerage are now offering processed insights into their aggregated data. One example is Komodo Health, which has aggregated large amounts of data and offers ready-to-use dashboards and tools based on it in an online platform28. Komodo's pharma and CDO clients can use it to determine locations for their clinical trials or match their own insights with public health data. This is a huge step forward, especially for smaller companies that do not operate their own analytics departments.
What we observe are drastic changes in a highly specialized and already data-driven industry. Pharma companies will have to double down on partnerships and carefully choose the capabilities to invest into and internalize.
For all industries, the boundaries between internal and external data and insights continue to blur and the constant reconfiguration of this value chain promises interesting market dynamics.
Maximilian Bucher and Hans-Fabian Ahrens co-authored this article.