top of page

Data warehousing reimagined: Snowflake in modern investment data management


The Snowflake cloud feeding to and from the ATHENA platform.
Snowflake and ATHENA

The leaps ahead in variable computing power, architecture, and the use and sharing of structured and semi-structured data are transforming the investment data management landscape. Snowflake is a great example of a new technology with an uncommonly impressive rate of adoption in investment firms.


ICS operates at the centre of a community of such investment management clients and prospects that are early adopters of Snowflake. Whether this cohort are already Snowflake customers or currently only testing the fences, the innovative data tech’s early impact is dominating the market’s current conversation.


Having said that, the use of Snowflake as an enterprise data warehouse has already presented a common challenge. This is best encapsulated by an often-posed question:


"How do I master investment reference data (investments, exposures, securities, portfolios, and more) on Snowflake?"

The near universality of this question is indicative of a common problem amongst investment organizations adopting Snowflake in absence of a robust investment data management platform.


It's not to say that Snowflake cannot add tremendous value to an investment manager's data experience, particularly around the speedy analysis of large volumes of homogenised investment data across multiple use cases.


As ever in this space, however, referring back to first principles of architectural design and target operating model design will offer a much more obviously beneficial experience and a quicker ROI on the data management spend.


There’s a lot to like about Snowflake

Below are some of the use cases we see for Snowflake based on community feedback.


1. Data Cloud Marketplace

Democratizing quick to market, curated market data is a definite win. The range of data providers on the Snowflake marketplace is increasing, and we're now seeing market data providers such as Bloomberg, FactSet, and Refinitiv become available. These can be incorporated with clients' own enterprise financial data from custodians and other service providers to great analytical and reporting benefits.


2. Enterprise Data Warehouse (EDW)

Financial enterprises can access numerous data domains and vastly improve analysis and reporting outcomes where these domains are blended together in their analytics and subsequently reporting engines. Snowflake is an excellent EDW for such purposes. It's fast, flexible and provides a great alternative to platforms such as Microsoft Azure and Amazon Redshift for blending data domains that have already passed through extensive data quality and control frameworks.


Snowflake can also vastly improve the experience of sandpit environment users. Fast access to externally licensed datasets and enterprise data sets alike enables quants and other analysts to test various investment theses at impressive speeds. This supports fast, iterative growth in investment insights and will allow the evolution of analysis to progress further prior to clients incorporating new datasets into a more controlled framework.


The Challenge: Data management and mastering for investment organizations in Snowflake

Master Data Management (MDM) is a discipline that supports both business and IT teams in their collaboration to arrive at data that is consistent, accurate, available, complete, and timely across an organization.


For perspective, below are typical examples of data domains related to buy-side investment data:


  • Security/Investment Master

  • Portfolio Holdings and Exposures

  • Transactions

  • Investment Performance Returns

  • Performance Contribution and Attribution

  • Investment Risk Analytics

  • Asset Allocations / Model Portfolios

  • Portfolio / Security / Party Classifications

  • Portfolio Master

  • Parties - Counterparties / Brokers / Issuers master

  • Calendars

  • Countries / Regions / Markets / Currencies

  • Prices / Security analytics

  • FX Rates

  • Exchanges

  • Corporate Actions

  • Credit Ratings

  • Indexes/Benchmarks

  • ESG and Sustainability

Developing data models that cover all of these data domains has historically proved elusive. It would be easy to interpret Snowflake's core capabilities as being potentially transformative to the MDM journey. In reality, however, the practicalities of unifying these data domains in a common framework need to be overcome before applying the solutions of computing power, storage, and retrievability to the problem.


ICS recently attended a fascinating Snowflake webinar that, to the data management insider, elicited understandable questions from attendees that hinted at that pre-existing challenge.


"...how can security mastering potentially be solved in Snowflake?"


The short answer, though not the one offered by the presenter who appeared understandably to misinterpret the inquiry, is that if one builds data mastering capabilities inside your Snowflake environment, then it will be there. And if one does not, then the pre-existing challenge will cascade into Snowflake and persist.


Storing and warehousing data on Snowflake is great, but mastering data for such complex and interrelated domains utilized in research in the investment management industry requires unique tools and skills.


As was further expressed in the attendee's probing related to 'clean' data, mastering and 'cleaning' data according to your priority and confidence in your providers, is just one aspect of the broader answer to that question.


Another aspect relates to linking data domains together through time so that each domain is “aware of” and “understands” all of the other data sets being drawn into the analytics. Experience informs us that this problem most often needs to be solved with a comprehensive investment data model. Inevitably adopting databases where this has not been solved for will run into the bigger challenge of managing time-series and bi-temporal data. Let’s have a look at what that is.


The Bi-Temporal Data Model Conundrum

Bitemporal data is an approach in data management that stores information with two time components, capturing both the moment the data became true in reality.


For example:

  • What was an asset’s closing price?

  • What happened to that asset’s closing price through the intervention of mastering – applying a rule set to determine which closing price you want to use out of a few you might be presented with - e.g. custodian, market data provider, benchmark etc?

  • And then what ultimately did that mastered price become to allow it became known to the system?

  • When did that happen?

Bitemporal data is available in bitemporal databases and data warehouses to ensure that reports can be produced on a historical ‘as-was’ basis and is especially significant in investment management. It's not just about "what you know," but also about "when you knew it".


This dual-time tracking makes it easier to understand how data has changed over time, thereby enhancing decision-making processes, risk assessments, and regulatory compliance. Significant and powerful analytics can add a tremendous value to the investment management decision making process in bi-temporal data models, to the point where simple snapshots of what a data fact is today in a relative sense, is essentially rendered of very low value by comparison.


The challenge of creating these datasets, and contemplating the functional processes inside Snowflake, is often why organisations utilise other platforms that are built for the specific purpose of investment data management, which are subsequently used to feed data into Snowflake.


In sum, master all datasets before the data enters Snowflake with a platform like ATHENA – axiomatically, in light of the requirements demanded by our many clients onto the platform – or build that capability yourselves inside your instance of Snowflake.


In fairness to Snowflake, this is highly likely a demand side problem amongst users rather than a supply side problem originating with Snowflake.


Best practice

What ICS sees as best practice is the implementation of a dedicated platform for investment data management which supports and makes the data available to Snowflake. Thinking figuratively of data as a flow from source to destination, an investment data management platform should really be upstream of Snowflake.


Building that inside any platform would require a significant build-effort (time and internal IT resources) and really pre-supposes that Snowflake adopters would be fine with repeating that effort firm by firm rather than more sensibly, accessing that as a service from a specialist provider.


This is supported by a recent survey from Cutter Associates showing 68% of surveyed firms are using a Reference Data Management Solution (RMDS), and 20% without an RDMS plan to implement one. (Ref: Cutter Associates 2023)


If investment managers are looking to explore Snowflake, or potentially struggling with the challenges mentioned above, we’d recommend an outreach to the investment data management experts at ICS.


The Other Challenge: Data Access to Your Consumers

Snowflake is by default a technical platform with limited features for easy data access to the business user without the need for technical skills.


Consumers want to engage with data in many different ways (No-code reporting, BI, Excel connectivity etc), and we recently wrote an article about this, which you can find here.

Comentários


bottom of page