Introduction to Data Architecture with Pinterest Case Study

Ore Otegbade
4 min readDec 29, 2020

According to Inmon & Levins (2019), the initial use of simple storage devices metamorphosed to disk storage to manage data. Disk storage then became insufficient as big data emerged and data architecture became “a rational way to interface legacy systems to big data (Inmon & Levins, 2019, chp. 8.1).” Data architecture is a plan detailing the methods used for collecting data, processing and storing it, and integrating the data to be utilized in a system and/or organization. Much like a building architect, data architects create blueprints for data systems.

Research yielded many characteristics of data architecture, but I would like to draw attention to four characteristics that showed up in just about every search result: Customer-focused, Simple, Automated, and last but not least, Secure (Eckerson, 2018).

Customer-focused

The best data architecture starts from the vision, objectives and strategy of the company and designs a system around that. The data architect should consider the user requirements and factor them into the solution that is provided. The customers of the system can be within an organization or outside it.

Simple

Organizations should try to standardize their database platform and framework as much as possible. As we learnt in class, the business value reduces as complexity of the system increases.

Automated

Management of data architecture in large companies can be time consuming. Therefore, it is advised that data architecture is smart, that is, automated. This can be done with artificial intelligence and machine learning to automate the process of maintaining the system.

Secure

The ideal system is able to withstand the efforts of hackers to infiltrate the system. Security of the system is achieved by encrypting information and compliance with national and international privacy regulations.

Frameworks used for Data Architecture.

A framework is a conceptual structure that serves as a guide for users when building a data system. There are four major architectural frameworks today: The Zachman framework, the Open Group Architecture Framework (TOGAF), the Federal Enterprise Architecture (FEA), and Gartner Framework.

The Zachman framework guides companies in using artifacts to answer What, How, When, Who, Where, and Why, while accounting for artifact perspectives and issues being addressed (Tupper, 2011). Zachman believes that a data architecture is complete only after every cell in the 6x6 matrix is filled. In The Open Group Architecture Framework (TOGAF), enterprise architecture is divided into four: Business architecture, Application architecture, Technical architecture and Data architecture. TOGAF is an architecture development method that is used for creating artifacts, unlike Zachman that organizes them. The Federal Enterprise Architecture (FEA) was instituted by the U.S. federal government in an attempt to unify the various enterprises employed by its various agencies. Unlike the previously mentioned two frameworks, FEA is more robust by combining the functionality of the previous two frameworks. FEA has five reference models which are domains of the framework: service, data, technical, business and components. These reference models combine with segment models to show how to install architecture. Lastly, the Gartner Framework encourages constant adaptation to the changing environment. In this framework, business owners, information specialists and those that implement technology are brought together in a unit to chase a shared vision.

A case study of data architecture in the social media industry is Pinterest. Pinterest is a mammoth social media organization that manages the Pinterest app where users can pin photos, videos and GIFs on boards. Pinterest has over 100 million monthly active users and over 10 billion views monthly. According to Wetzler (2020), Pinterest data architecture comprises of Apache Kafka, Redshift, Hadoop, Storm and HBase. Apache Kafka is a platform for processing real-time data feeds. Redshift is a data warehouse service that drives Pinterest’s interactive analysis. Hadoop is a framework that allows for processing of big data. Storm is a real-time computation system that allows unbounded streams of data to be processed. Pinterest’s data architecture serves two primary needs: tracking data from the customer base and tracking the analytics for the ad buyers.

Pinterest Data Architecture (Wetzler, 2020).

Why Does Data Architecture Matter?

The first goal of data architecture is making sure that the data system meets the needs of business users. It is about aligning needs with purposeful data. Data architecture allows for stability and flexibility so that the system can evolve as the need arises rather than going back to square one each time the data system increases in complexity. Having architecture in place allows for change management ease. Additionally, data architecture is a cost-effective way of fostering collaboration between the business and technology department. If done right, data architecture is initiated at a business level and then the technology department supports the vision. The two units remain in communication so that the architecture can adapt to the changing business environment. Lastly, data architecture enhances decision making as it standardizes reference materials.

References

Eckerson, W. (2018). Ten Characteristics Of A Modern Data Architecture. [online] Eckerson.com. <https://www.eckerson.com/articles/ten-characteristics-of-a-modern-data-architecture>

Huang, T. (2014). Behind The Pins: Building Analytics. [online] Medium.

<https://medium.com/pinterest-engineering/behind-the-pins-building-analytics-f7b508cdacab>

Inmon, W., Linstedt, D., & Levins, M. (2019). Data architecture : A primer for the data scientist (Second ed.).

Tupper, C. (2011). Data architecture from zen to reality. Amsterdam ; Boston: Morgan Kaufmann.

Wetzler, M. (2020). Architecture Of Giants: Data Stacks At Facebook, Netflix, Airbnb, And Pinterest [online] Keen.

<https://keen.io/blog/architecture-of-giants-data-stacks-at-facebook-netflix-airbnb-and-pinterest/>

--

--

Ore Otegbade

Eclectic writings by Sociology and Education student • UofToronto "Emerging Leader Award" • Fashionista • #Learn.. #Experience..#Impact ❤