Differential Privacy has emerged as a powerful technique to protect individual privacy while still reaping the benefits of data-driven insights. In this blog, we’ll explore differential privacy, how it works, and how media companies can use it to safeguard sensitive data about consumers.
Differential privacy is a privacy-enhancing technology (PET) that allows organizations to analyze data while preserving the privacy of individual people. The core principle is to ensure that no specific piece of information about an individual can be inferred from the results of a query or analysis. This means that the results of the analysis will look nearly the same regardless of whether any single individual's information was included in the analysis or not.
Differential privacy is achieved by adding carefully curated random noise to the dataset at a high enough rate that it protects privacy, but not so high that it diminishes utility. This can be achieved in two ways:
Any statistical analysis, whether using differential privacy or not, still leaks some information about the end users whose data are analyzed. As more and more analyses are performed on the same individuals or end users, this privacy loss can quickly accumulate. Fortunately, differential privacy provides formal methods for tracking and limiting this cumulative privacy loss.
Differential privacy facilitates secure data sharing among media organizations and marketers, promoting collaboration without compromising any individual’s privacy. This technology is particularly helpful when companies are trying to gather consumer insights from:
When brands and media companies use differential privacy as one of their PETs, it helps them comply with data privacy regulations as well as build trust with consumers by assuring them that their data is handled with care.
As data continues to play an essential role in finding and retaining user interest, media companies must implement differential privacy to harness data-driven insights while respecting individual privacy rights. It is poised to be an integral part of data analytics and sharing in a privacy-conscious world.
The need to safeguard sensitive data and ensure the confidentiality of transactions has never been more critical. The Trusted Execution Environment (TEE) emerges as a pivotal technology in the demand for increased data privacy. In this blog, we will delve into the world of TEE, understand what it is, and explore its applications as a privacy-enhancing technology.
TEE is a secure and isolated area within a computer or mobile device's central processing unit (CPU). It’s designed to execute code and processes in a highly protected environment, ensuring that sensitive data remains secure and isolated from all other software in the system. It achieves this level of security via special hardware that keeps data encrypted while in use in main memory. This ensures that any software or user even with full privilege only sees encrypted data at any point in time.
Using special hardware, TEEs encrypt all data that exits to the main memory. And decrypt back any data returning before processing, allowing the code and analytics to operate on plaintext data. This means that TEE can scale very well compared to other pure cryptographic secure computation approaches.
TEEs also offer a useful feature called remote attestation. This means remote clients can establish trust on the TEE by verifying the integrity of the code and data loaded in the TEE and establish a secure connection with it.
TEEs are an attractive option for media companies who want to safely scale their data operations in a secure environment. TEEs offer the following benefits:
Now, let’s look at a real-world example of data collaboration using a TEE. In our last blog post, we saw that one way to perform the secure matching in the IAB’s Open Private Join & Activation proposal is using an MPC protocol. Another way to perform this secure matching is using a TEE. With TEE, only one helper server is involved. First, the advertiser and the publisher establish the trust of the TEE via remote attestation. Then, they -each forward their encrypted PII data to the TEE server which decrypts them and performs the match on plaintext data.
TEEs come with their own privacy risks. They are vulnerable to side-channel attacks, such as memory access pattern attacks, which can be exploited to reveal information about the underlying data. Adding side-channel protections can help counter these attacks, but significantly increases the computational overhead. Fortunately, despite this TEEs scale very well.
In an industry facing ongoing scrutiny over data privacy concerns, TEEs are becoming a standard. This PET technology will continue to evolve and we expect to see it playing an increasingly vital role in data collaboration.
In an era where data is the new gold, ensuring its privacy and security has never been more critical. Secure computation, is a powerful branch of cryptography, allowing companies to perform computations on sensitive data without revealing the actual information being processed. In this blog, we’ll explore what secure computation is and how it’s used to protect consumer data.
Secure computation is a cryptographic technique that enables multiple parties to jointly compute a function over their individual inputs while keeping those inputs private. This is known as "encryption in use" because the underlying data remains encrypted while it is being processed on remote servers or in the cloud.
The primary goal of secure computation is to ensure the confidentiality, integrity, and privacy of data throughout the computation process. It accomplishes this without relying on a trusted third party, making it particularly valuable in scenarios where data sharing and privacy are paramount. This means that two or more parties can collaborate on data analysis or computations without exposing their sensitive data to one another.
Secure computation is applied in a range of scenarios where privacy and data security are paramount. Naturally, secure computation is a great fit for data sharing and collaboration among publishers and advertisers.
Both publishers and advertisers can benefit from a type of secure computation called Private Set Intersection (PSI) protocol. It allows two or more parties to compute the intersection of their private datasets without revealing any information about the records not in the intersection. Optable, for instance, provides an open-source matching utility that allows partners of Optable customers to securely match their first-party data sets with them using a PSI protocol.
Secure computation can be implemented in two main ways: 1) via pure cryptography (using Fully Homomorphic Encryption (FHE) and Secure Multi-Party Computation (MPC)) or 2) through secure hardware (using Trusted Execution Environments (TEEs).
FHE is an incredibly powerful tool for protecting data privacy in the digital age. It enables analytics to be performed on encrypted data without ever having to decrypt it. The ad tech industry can certainly benefit from full-scale analytics without the risk of exposing personally identifiable information (PII).
While FHE has the potential to revolutionize the advertising ecosystem, it is unfortunately quite computationally intensive and limited in its current capabilities. Therefore it is not yet ready for widespread adoption. There is ongoing research to make FHE more efficient and functional in the future.
MPC is a form of secure computation that uses a cryptographic protocol to enable two or more businesses with private data to perform a joint computation while keeping their individual inputs private. Each entity only learns what can be inferred from the computation result.
Often, the secure computation part is outsourced to two helper servers. Before data leaves a user's device, it is encrypted to both helper servers, which decrypt it partially and perform computation on the partially encrypted data. Neither server is ever able to see the original user data.
MPC protocols provide a high level of security but come with a tradeoff. They require sophisticated cryptographic operations which incur higher computation and communication costs. This makes this technology tailored for specific tasks, which can get very expensive.
In the past year, Optable has been a leading contributor to the IAB Tech Lab’s Open Private Join and Activation (OPJA) that enables interoperable privacy safe ad activation based on PII data. At the heart of OPJA is a secure match using a PSI protocol that allows advertisers and publishers to match their PII data. One of the ways to perform this match is using MPC — the respective clean room vendors act as the MPC helper servers, which jointly compute the overlap without ever learning the identifiers not in the overlap.
In an age where data privacy is a growing concern, secure computation emerges as a vital technology that plays an important role helping companies comply with data protection regulations while still fostering innovation and cooperation among business partners.
The digital world has brought unprecedented convenience and connectivity but also raised significant concerns about data privacy. As we share more of our lives online, the need for robust privacy-enhancing technologies has become paramount. On-device learning has emerged as a powerful tool to protect personal data while enabling advanced capabilities. In this blog, we will explore on-device learning, its role in enhancing privacy, and how it’s used.
On-device learning, sometimes referred to as federated learning, is a machine learning approach that allows training models directly on a user’s device with data available on their device. Only updated model parameters are sent to a remote server or cloud. This means that a user’s smartphone, tablet, or other device can learn and adapt to their preferences without constantly sending their data to remote servers. This gives users more control over their data, protects their privacy, and reduces the need to send raw individual user data to external servers.
On-device learning operates with the following four principles:
With on-device learning, online retailers can gain insights on consumers’ preferences and behaviors without tracking their individual preferences. The way this works is, each consumer’s device downloads the current model, improves it by learning from the data on their phone. The model updates from each of these devices are then collected, compiled, and are fed back into and improved on the central model. Thus, the marketers just learn the overall purchase pattern or behavior without ever learning individual consumer preferences or behaviors.
Let’s look at a real-world example of a data collection sequence that uses on-device learning:
On-device learning is not perfect from a privacy perspective. When model parameters leave users’ devices they still leak information about the underlying local training data. So, the risk of sensitive information being shared is only reduced and not completely eliminated.To prevent this, on-device learning is often combined with other PETs such as differential privacy and secure computation, which we will cover in different posts on our blog.
In today's data-driven world, concerns about privacy and data security have never been more critical. k-Anonymity is a privacy concept and technique that plays a pivotal role in safeguarding sensitive data. Let’s explore what k-anonymity is and how it‘s used to protect personal information.
k-Anonymity is a privacy model designed to protect the identities of individuals when their data is being shared, published, or analyzed. It ensures that data cannot be linked to a specific person by making it indistinguishable from the data of at least 'k-1' other individuals. In simpler terms, k-anonymity obscures personal information within a crowd, making it impossible to identify a particular individual.
The 'k' in k-anonymity represents the minimum number of similar individuals (or the “anonymity set”) within the dataset that an individual's data must blend with to guarantee their privacy. For example, if k is set to 5, the data must be indistinguishable from at least four other people's data.
To implement k-anonymity, data must be generalized to make it less identifiable, while ensuring that each data point is identical to a minimum of ‘k-1’ other entries. This is commonly done through two methods:
Online retailers use k-anonymity to protect customer data while analyzing purchase histories and preferences to enhance their services and recommendations.
For example, individual users can be associated with data cohorts based on their interests on their mobile device. An advertiser can then target individuals in specific cohorts. This way, the advertiser does not learn any personally identifiable information (PII) and only learns that a specific individual belongs to certain cohorts. And as long as the cohorts are k-anonymous, they protect users from re-identification, especially for large values of k.
A drawback to using k-anonymity is that sometimes revealing just the cohort a user belongs to can leak sensitive information about a user. This is true, especially when the cohorts are based on sensitive topics such as race, religion, sexual orientation, etc. A simple solution to this problem is to use predefined and publicly visible cohort categories, such as in Google Topics.
In any case, cohorts can still be combined or correlated and used to re-identify users across multiple sites. That said, k-anonymity is often combined with other privacy protections to further reduce the probability of re-identification.
As people spend more and more time online, consumers have demanded more control over their digital privacy. They’ve become particularly uncomfortable with digital tracking technology like third-party cookies that enable marketers to gather information about their browsing behavior. But eliminating third-party cookies puts marketers in a tough spot. Their businesses have relied on cookies to find new customers for over two decades.
Government agencies in the US and Europe have responded to consumer demands by enacting regulations that offer more protection and control to users over how their data is collected and processed. And many of the web browsers have already phased out cookies. Google has been the last hold out and they’re expected to fully phase out cookies by the end of 2024.
But simply eliminating cookies won’t solve the privacy protection problem for consumers. Digital footprints are always expanding and companies need to be more vigilant than ever about protecting their customers’ data. There’s an enormous opportunity to build an ad ecosystem that respects users' privacy more than ever.
Privacy Enhancing Technologies (PETs) have emerged as a crucial ally for safeguarding consumer data. This emerging technology uses advanced cryptographic and statistical techniques to protect consumer information while still allowing marketers to glean valuable insights.
PETs are a set of tools and methods designed to help organizations maintain digital privacy. They provide a layer of defense against unwanted surveillance, data breaches, and unwarranted data collection by enhancing user control and safeguarding data during its lifecycle. PETs are instrumental in upholding privacy, security, and freedom in the digital realm.
There are several types of PETs being used throughout the digital advertising ecosystem:
PETs will play a vital role in creating an advertising ecosystem that is primarily privacy focused. Optable is exploring the use of multiple types of PETs as we build a privacy-safe environment where clients can safely collaborate with their data partners. The following blog series will demystify the complex world of PETs and take a closer look at how advertisers are using them.
At Optable we view interoperability first and foremost through the lens of digital advertising’s critical systems. And when you consider the systems used for ad campaign planning, activation, and measurement, you quickly realize that these systems were all inherently interoperable for a long time thanks to widespread data sharing. With identity and data sharing on their way out for a variety of reasons, new ways of interoperating within each of these systems are required. Clean rooms are a way to achieve data interoperability in advertising, and that’s why we have invested significantly in this area.
But, the trouble with clean rooms is that both parties have to agree to use the same one in order to interoperate. The central idea with clean room technologies is that two or more parties come together around a neutral compute environment, enabling them to agree on operations to perform on their respective datasets, on the structure of their input datasets, on the outputs generated by the operations and, importantly, on who has access to the outputs. Additionally, various privacy enhancing technologies may be used to limit and constrain the outputs and the information pertaining to the underlying input datasets that is revealed.
So, what does true interoperability look like for data collaboration platforms, built from the ground up for digital advertising? Here are three important pillars:
✅ Integration with leading DWH clean room service layers. A DWH clean room service layer is the set of primitives (APIs and interfaces) made available by leading DWHes (Google, AWS, Snowflake, etc), that enables joining of disparate organization datasets, and purpose limited computation. Optable streamlines this by automating the flow of minimized data to/from DWHes, and by federating code to these environments. The end result? A collaborator with audience data sitting in Snowflake can easily match their audience data to an Optable customer's first party data, all within Snowflake using Snowflake DCR primitives to enable trust, without the Optable customer lifting a finger. In this example the matching itself happens inside of Snowflake, but the same thing can be done with other DWH clean room service layers as well.
✅ Compatibility with open, secure multi-party compute protocols like Private Set Intersection (PSI). What if your partner wants to match their audience data with you but they cannot move their data into a cloud based DWH? SMPC protocols such as PSI enable double blind matching on encrypted datasets, without requiring decryption of data throughout. Open-source implementations provide an independently verifiable, albeit purpose constrained clean room service layer. The end result? A collaborator with audience data sitting on premise can execute an encrypted match with an Optable customer using a free, open-source utility.
✅ Built-in entity resolution, audience management and activation, with deep integration to all major cloud and data environments. In the real world, few organizations have all of their user data assets neatly connected in a single environment. Sure, they exist, but more often than not, organizations need to do quite a bit of work to gather, normalize, sanitize, and connect their user data so that they can effectively plan, activate, and measure using data collaboration systems. It’s therefore no wonder that when the IAB issued their State of Data report earlier this year, respondents cited time frames of months up to years to get up and running with clean room tech! Moreover, even when one company has got their user data together, their partners often require help with entity resolution. These are the reasons why Optable makes it easy to connect user data sitting in any cloud environment or system into a cohesive and unified user record view, out of the box, with no code required. Got part of your user data in your CRM? And another sitting in cloud storage? And another in your DWH? No problem.
At Optable, we believe that these pillars are the groundwork on top of which interoperability can happen, and we’re partnering with industry peers who share the same vision. Stay tuned for more exciting announcements on this front!
One of the most common misconceptions about data clean rooms and data collaboration is that there are requirements on having tons of identified data.
Most publishers we meet have this concern: “Do we really have enough data to drive significant revenue? Won’t we be limited by the size of the match, and therefore won’t be able to run any media at scale?“
Typically they are surprised to learn that mitigating low volumes of identified data is part of the solutions offered today by this class of data collaboration technology.
No matter how little identified data any given publisher has, they can benefit from growth using data collaboration technologies. The reason is quite simple: any campaign is better off when it starts with real data.
Following a match with an advertiser, the publisher has a few options: one, a simple one, is simply to have insights on the matched audience. The publisher can better understand the brand’s customers or prospects as a function of their own data, which in turn allows them to create better media products. It also shows the brand that the publisher reaches the right audience for them. Insights are offered as a report that provides aggregate numbers – by definition, it is a privacy-safe product.
The second, and an important one, is the possibility of creating a prospecting audience out of the match. Optable’s prospecting clean room app automatically creates an expanded audience that provides scale, performance and value when it comes to reaching the right audience. Not only that, but we do it in a privacy-safe manner, since the publisher does not learn the intersection – only the prospecting audience becomes eligible for targeting.
Considering that a publisher’s audience consists of both identified and unidentified users who share a number of traits, Optable prospecting clean room app allows a publisher to configure a model that ultimately creates an addressable audience that is sizable enough to drive significant growth.
For brands, the use of customer or prospect data also doesn’t have a limiting factor – in fact, there are few brands that can boast having significant data on all their customers. For everyone else, the objective is to have some data – enough to allow our systems to make better audience decisions.
We make publisher-driven data collaboration easy for all parties: our end-to-end solution includes direct integration for activation straight from the clean room environment, and offers frictionless interoperability.
Given the emergence of retail media and the democratization of data through data warehouse clean room APIs, data collaboration is quickly becoming a major revenue opportunity.
Forward-looking publishers who are looking for revenue growth must prioritize future-proof, privacy-safe solutions to driving revenue.
Canadian news and journalism outlets have entered into a fierce battle with Google and Meta over the recently enacted Bill C-18, also known as the Online News Act. This legislation, passed by the Canadian government on June 22, 2023, aims to support the Canadian journalism ecosystem by establishing a tax that "digital news intermediaries" such as Google and Meta must pay to the content owners they link to.
In a familiar pattern observed in similar laws like Australia's News Media and Digital Platforms Mandatory Bargaining Code, Meta and Google have retaliated by removing links from their platforms including Instagram, Facebook, and Google Search. Unfortunately, this response undermines the very essence of the bill and is expected to inflict financial harm on Canadian journalism. While Google and Meta argue that they only seek a fair market share for their services, publishers contend that this is unjustified since Google and Meta generate billions in advertising revenue while journalists struggle to make ends meet.
The dynamics at play here are further complicated by the fact that media agencies and brands, responsible for a significant portion of news media revenues, control advertising spend. This advertising spend is the primary source of revenue for Google & Meta, which famously represent 80% of online advertising revenue in the country.
Traditionally, Canadian brands and their agencies have allocated the majority of their advertising budgets to these two companies. However, there is a growing trend, driven by recent legislation and broader shifts in advertising, to directly invest media dollars with local publishers. Many agencies and brands have committed to supporting Canadian publishers in light of this impasse. For example, the A2C in Quebec has already taken steps to incentivize collaboration between agencies, brands, and local publishers. Some agencies view this issue as a matter of ethics and social responsibility. Prominent figures in the agency world, like Sarah Thompson, President of Dentsu Media and Brian Cuddy, SVP Responsible Media Solutions at Cossette have been vocal advocates for supporting Canadian news publishers. In response to the announcement from Facebook that all Canadian news will be removed from their platforms within weeks Sarah took to her LinkedIn to share support for local news “We are at a moment of time where action is required to support local owned media, which is more than news.”
In addition to developments within the Canadian ecosystem, there are emerging trends in how marketers allocate their paid media budgets. Advertising executives are increasingly interested in investing more heavily in contextual advertising and leveraging publishers' first-party data for better targeting. There is also heightened scrutiny around programmatic channels, which lack transparency in terms of media ROI. Consequently, there is a growing preference for direct buying. Moreover, measurement strategies are shifting away from the digital attribution focus of the past decade towards more traditional methods, such as brand lift analysis, media mix modeling, third-party audience measurement, and the use of consumer research data and studies.
In essence, these trends indicate a change in the attitudes and choices of CMOs and agency leaders. They are actively supporting a more open and equitable internet through their advertising investments.
Similar to other legislations, it is probable that Google and Meta will have to pay millions of dollars directly to media owners to avoid taxation. However, the process of finalizing these deals will require time, leaving publishers to suffer from decreased traffic and increased competition with these tech giants for ad revenue. In the long run, there is a possibility that Google and Meta might modify their platforms by completely removing links. The economic landscape has evolved for these companies, and it is not unreasonable to consider their initial link removal as a test to assess long-term effects on user engagement and potential revenue.
To minimize risk, publishers can take proactive measures to future-proof their businesses.
Here are some recommendations:
Canadian publishers are witnessing promising support from agencies, brands, and the public, indicating a positive trajectory. Coupled with the growth of future-proof data collaboration technologies, this presents remarkable opportunities for news media publishers to revolutionize their advertising revenue generation. The Online News Act, a legislation that foreshadows the future of news consumption, holds great significance not only for Canadians, but also for Americans, as similar bills have reached Congress. In the midst of these advancements, we find ourselves at a critical juncture for the open internet, journalism, and democracy as a whole. Numerous Canadian publishers have already partnered with Optable to safeguard their advertising businesses, and for those who haven't, we are prepared to provide our assistance!