Normalizing API Data from Multiple Sources
Hello!
This blog documents my journey in creating Museum-Passport, an educational website exploring the world of art museums. One of the main goals of the site is to aggregate works of art from multiple museums and display them in one place. Many major museums offer public APIs for interacting with their collections. Using Go, my goal is to use these APIs to populate Museum Passport with information on art and cultures from around the world and throughout history. I chose Go for this project because of its excellent support for concurrent API calls (goroutines), clean data modeling (structs), and flexible interfaces.
My first goal when starting out was to properly strategize how to use these different APIs in an efficient and logical manner. This involved researching the various APIs and noting what information they can provide.
The problem: APIs aren’t standardized
When browsing the various museum APIs available, the first thing that stood out is how differently they are structured. The Met’s API has all the expected information: name, artist, location, date, etc. But the API of the Harvard Museums includes information such as colors found in the art, descriptions, and even what exhibits they have been featured in.
Not only do they differ in the levels of information provided, but fields that serve the same purpose in each API can have differing names. Examples include dated vs. objectDate, constituents vs people, and so on. The question was: how do I take these differently structured APIs and normalize their responses?
Creating an artwork struct
My solution was creating a standardized artwork struct to normalize museum API data.
type SingleArtwork struct {
ID string
ArtworkTitle string
ArtistName string
DateCreated string
ArtMedium string
ImageLarge string
ImageSmall string
Museum string
}
Normally an ID would be an integer, but it is quite likely that an ID number might be used in multiple museums. I decided to include a short-form version of the originating museum’s name. SingleArtwork IDs therefore might look like “met-23443” or “harvard-3245”. This identifies which museum the artwork comes from, while also ensuring their ID is unique. All that’s left is to take the fields from the API and insert them into a new SingleArtwork struct. Thankfully this is quite simple, and only involves using the API field as the value for the SingleArtwork field.
func (m *MetClient) NormalizeArtwork(receivedArt MetSingleArtwork) models.SingleArtwork {
return models.SingleArtwork{
ID: fmt.Sprintf("met-%d", receivedArt.ObjectID),
ArtworkTitle: receivedArt.Title,
ArtistName: receivedArt.ArtistDisplayName,
DateCreated: receivedArt.ObjectDate,
ArtMedium: receivedArt.Medium,
ImageLarge: receivedArt.PrimaryImage,
ImageSmall: receivedArt.PrimaryImageSmall,
Museum: m.GetMuseumName(),
}
}
Here’s what a normalized response looks like:
{
"ID": "met-436105",
"ArtworkTitle": "The Death of Socrates",
"ArtistName": "Jacques Louis David",
"DateCreated": "1787",
"ArtMedium": "Oil on canvas",
"ImageLarge": "https://images.metmuseum.org/CRDImages/ep/original/DP-13139-001.jpg",
"ImageSmall": "https://images.metmuseum.org/CRDImages/ep/web-large/DP-13139-001.jpg",
"Museum": "The Metropolitan Museum of Art"
}
Takeaway
What makes me most proud is the fact that I thought about how to tackle this beforehand. Like many developers, my first instinct is to dive in headfirst and begin coding without any real direction. I decided to approach every aspect of this project from a slower, more intentional angle. I brainstormed potential roadblocks and then researched how to overcome them.
I hope you continue with me on this journey as I continue to blog about my experiences!
Matt