I recently had the chance to sit down with Riley Newman, Head of Analytics and Data Science at Airbnb. I asked him about the company's data science practices, where he sees the field going, and his personal path to becoming a data scientist.
Can you tell me how you became a data scientist?
I've followed a fairly untraditional path relative to other data scientists in the valley. In college I was interested in the way the world fit together from historical and political perspectives - I was fascinated by wealth divides and international relations. However, as a junior I had this feeling of lacking any real skills beyond being able to read and write. Call it a mid-college crisis.
Economics was the closest adjacent science so I doubled down on econometric tools for making sense of the theories in IR. I graduated with an offer to extend this further in a Master's at Cambridge, where my thesis advisor pushed me in the direction of economic geography (spatial econometrics). The field examines trends across time and space - it was my first experience using data to understand the world around me.
I wanted to continue my research in their PhD program but in undergrad I'd joined the Coast Guard as a reservist and they called me back to the US upon completion of the master's. So I returned home to San Francisco and took-up a research position with a small team of economists who had effectively predicted the recession.
The economists were really into predictive modeling with respect to how the recession would play out and then helping businesses and governments make bets against those predictions. It was great training both in how to munge ugly public data and then how to turn it into actionable insights.
I also gained a lot of exposure to automating scripts through one of the partners that had a computer science background.
Several years later I'd completed my Coast Guard obligation and was all set to return to the UK for the PhD. But then I met my wife and the founders of Airbnb. Turns out it was a good week.
Can you tell me what's currently important to Airbnb in terms of data science?
One of the things I love about Airbnb is the demand for data science from all sides of the organization. In the four years I've been here I've been able to work on projects ranging from product to operations to finance. The only limit to the breadth of projects we take on is the size of our team.
An example of one project (which we recently wrote-up in a blog post) is our search algorithm. One of the members of our team worked with our product search team to develop a system for dynamically determining where someone is most likely to want to stay given their search string. This is complicated when you think about - you have to take into account a wide range of factors in order to return optimal results.
Initially we hard-coded a radius and returned what we assumed to be the best listings within that area, but the results didn't make sense for locations of varying sizes and shapes - Manhattan v. Los Angeles, for example. Then we optimized for centrality relative to the center of the search location, but this washed-out neighborhoods that people love using Airbnb to experience. For example, a search for San Francisco centered on the Tenderloin!
So we created a probabilistic model to estimate where someone is most likely to book given where they searched, but this created a gravitational force toward big cities. Finally, we added an additional probabilistic model to determine where someone was likely to have searched given where they booked, and things started to make sense.
There was a lot of experimentation, but moving search conversion up even a few percentage points has enormous long-term impacts on the business. The team did a great job.
Do you have any examples of insights from data that changed how Airbnb operates?
Tons. We've worked on operations ranging from online marketing to customer support to offline host acquisition. One of the earliest examples stemmed from a simple regression I ran. The regression assessed which features of a listing had the strongest impact on its getting booked. We found that photographs of the listing were very significant so we experimented with professional photography, and the results were astounding. A short period of time later we had a tool for hosts to request free photography from professionals and standards for the photos hosts uploaded. It seemed like overnight we had upgraded the site - our listings looked so much better and hosts couldn't believe we were doing something so 'unscalable' for free.
Can you tell me about Airbnb's marketplace philosophy
We spend a lot of time thinking about 'markets' - geographic areas that we've codified in a way that enables us to compare places like New York and San Francisco with more nebulous destinations like Tahoe and the French Riviera. By doing so we're able to evaluate the distinctive characteristics and needs of one market relative to another, and prioritize our operational responses accordingly.
There are lots of interesting problems we've worked on with respect to market growth. An example I like is modeling how markets develop and mature over time. We try to estimate the extent to which a market will organically grow into a healthy and vibrant travel destination, and if it's off-course, what we can do to help encourage growth.
By cutting our core metrics by market we can often uncover growth opportunities. In another interview I mentioned the story of Lyons, where a few years ago we noticed a huge spike in occupancy in December. It was too early to be related to Christmas, so we looked into it and discovered the festival of lights - a really cool event that I hope to check out at some point. Based on this observation and the predictability of seasonality, we were prepared the next year for the fluctuation of demand and worked ahead of time to build supply.
As a side-note, I think this is a great example of how you can let your community steer you in the direction of where to grow the company. I used to ask potential employees how they would analyze and prioritize growth opportunities, and they almost always began with exogenous data that is complicated to acquire and messy to work with. Internal data is already at your fingertips and tailored to your business - it's hugely useful, if used it correctly.
Can you tell me about what Airbnb's data infrastructure looks like?
Our infrastructure - and infrastructure team - has evolved a lot recently. Everything is hosted by Amazon web services (AWS), and we spend most of our time working with data in Hadoop, which we access via Hive or Pig. Recently we shifted our version of hive from EMR to Mesos, which granted a lot more control over the cluster and free'd up new features. We also are experimenting with Amazon's Redshift, which is great because of its speed. A query that can take hours in Hive can run in minutes on Redshift.
Can you tell me about some of the ways you and other data scientists at Airbnb keep ahead of the curve, education wise?
We honestly learn the most from each other. There's a wide variety of backgrounds on the team, ranging from long careers in data science to consulting, academia, and professional poker. More often than not you can get a question answered by the person sitting next to you, or someone from another team like engineering or design. We do our best to encourage that through learning and development courses in the company - one of the members of our team set up SQL classes and we're training people from other teams so they're empowered to answer simple data questions without our help.
I also set up quarterly offsites where we focus on learning a new skill. For example, last quarter we got together with the former CTO from Stamen Design, Mike Migurski, to learn about how to map data. It's an area that is really sexy in data visualization, and he's exceptional. So we spent a couple days with him learning how to build heatmaps of metrics like supply and demand by market.
On the surface that may not seem particularly useful, but the week after the offsite two of the members of our team conquered a 'holy grail question' related to market opportunity by constructing heat maps of our business relative to some census data. Just looking at the data in a new way quickly illuminated the answer to something we had been toying with for a long time.
Can you tell me about any other ways that Airbnb tackles difficulties that arise?
A difficulty for any analytics team is the ad hoc request that takes precedence over everything else in flight, for example a core metric that moves in an unanticipated direction. It's always worth prioritizing in case something is broken, but it can be frustrating for a data scientist that's deep in a problem they're excited about. So we created a system called 'swat teams', where a group of two people are responsible for responding to anything that comes up in a given week. It's great because they're mentally prepared to take on something with short notice, and it's an opportunity to get people from different sides of the team to solve a problem side-by-side. This enables cross-fertilization of ideas and skills, so the pair winds-up learning a lot from each other.
Is there a "right" or "wrong" way to use data in modern businesses?
Definitely. For starters, I don't believe companies should necessarily be 'data-driven'. Data is often a useful piece of information, but it's rarely a complete picture - you're only able to analyze what you've logged, and the holistic picture is almost never logged. So a company should be 'data-informed', with the understanding that there may be more to the story. In many companies, data scientists are exalted as the authority on any topic. On our team we try to be a bit more humble.
We're constantly working to build a more holistic picture of a problem. My team is great at breaking down hard problems and finding solutions where we have data. But we try to round this out with user experience - through qualitative analysis - and feedback consolidated by our customer experience team, which we package together to understand the key experiences our community has when using Airbnb.
Also, I think there's a difference between teams and companies that tie projects to metrics, rather than metrics to projects. The difference between the two is a clear problem statement at the outset of a project - being able to say, 'this is what we're hoping to accomplish and why'. The alternative (mapping metrics to projects) typically begins with 'here's what we're going to work on - let's come up with a way to measure it.' More often than not, it's unclear what the latter is meant to accomplish and it winds up unsuccessful. Yes, some projects lack data or are aspirational - but I'm very skeptical of anything that has unclear goals.
Do you do any forecasting with Data?
Definitely. Predictive modeling is a really interesting side of data science, and, as I said already, was an early inspiration for my joining this field. Forecasts are incredibly valuable for assessing business performance. For example, it can be difficult to say whether things are going well if you're in hypergrowth because no matter what, you're growing. But a forecast tells you whether you're growing as fast as you could be, which may indicate something that needs to be addressed.
I used to own our revenue forecasting model, which, at a high level, is super easy. We have so much data and the business has such predictable seasonality curves that with the world's simplest regression I was able to predict booking growth with very high accuracy. But, the problem with a top-down model is it's difficult to use operationally - for example, teams want to know how a market is going to perform in order to prioritize resources. That requires a market-specific forecast, which can have a lot of variance given less data, so measuring whether they were successful becomes more difficult. We now have a team that's dedicated entirely to bottoms-up forecasting and reporting - it's really cool work.
Do you have any advice for startups?
One of the big beliefs that I have is that strong brands rest on three equal pillars - technology, design and data.
At Airbnb we build new features with a portfolio concept where we'll have a producer (project manager) that coordinates work between engineers, designers, and data scientists. With these three voices on an equal plane you have the highest probability of a project or feature being successful.
Also, I can't emphasize enough how much investing in good data will pay returns down the road. Almost any good analysis requires time-series so setting up good logging and reliable infrastructure will ensure easy access to data and a clear understanding of where future opportunities lie.