In a recent case study, we profiled Etsy and learned about how a high-performance data platform helps keep Etsy’s global community engaged. In that study, Etsy’s engineering team provided some key examples of how they monitor their database in order to ensure good system performance in development. In this post, we want to highlight a few of those specific uses.
Performance Testing in Development
Performance monitoring is key to good DevOps principles–to understand how to improve system performance, one must first understand how the system in performing. Engineering teams need performance analytics in order to do fact-based evaluations, improve code quality, and ensure system stability. By putting performance analytics in the hands of developers, Etsy grants them access to the information they need for problem-solving; with database monitoring in particular, Etsy’s engineering team can fix issues unilaterally, with confidence that they have access to a complete picture.
These are a few of the applications where Etsy found value in using a performance management solution.
- Database version upgrades. When Etsy upgraded their databases to a new version of MySQL, they used granular database monitoring to help them analyze changes in system performance. They were able to watch their systems performing on separate servers, each with a different version of MySQL, one for production and the other a copy. Using VividCortex, with side-by-side tabs, they could sort and compare metrics like average latency, errors, and total time. Looking at the differences between the servers, the analysis built strong confidence that the switch would afford a real improvement in speed. At the same time, it showed that there was no regression, no query errors. When it came time to push to production, Etsy was confident that all would go smoothly.
- Database schema changes. For schema migrations, Etsy uses performance analytics for real-time feedback. The team looks at historical performance metrics and trends to ensure that there are no resulting spikes in workload metrics or server faults. Engineers and DBAs drill into the data and share what they find by using permanent links into the VividCortex UI to quickly collaborate with a colleague on issue identification and resolution. A SaaS performance monitoring platform with deep linking and sharing capabilities gives a team confidence that when one person looks at a chart and sends it to a colleague, that colleague will see exactly the same thing.
- Hardware upgrades. When Etsy evaluated a hardware change—switching from traditional spinning-disk storage to SSDs—the DBAs used VividCortex to A/B test the change and analyze exactly what system performance would be like with the new hardware.
- Code changes. Etsy’s developers deploy code frequently—often dozens of times every day. One issue they consistently faced was MySQL’s internal measurements of replication latency made it difficult to understand the impact of a code change. In the past, when Etsy pushed a code change into production, the team couldn’t see the effects on replication delay until it was too late and the replicated data had fallen behind. The only way to get visibility into such complex system behavior is a monitoring product that is specifically designed with knowledge of the database’s intricate inner workings. Using VividCortex they can see immediately when replication lags, even though MySQL’s own measurements of delay show nothing.
Code as Craft
A quick tip-of-our-hat to some of Etsy’s development methods. The name of Etsy’s technical blog is “Code as Craft,” a title that reflects how much craftsmanship and trade matter to the company, even beyond the artisan products their users sell. Etsy’s engineering team takes excellence-in-craft to heart, and, on the “Code as Craft” blog, managers, developers, and DBAs examine ideas and best practices that enable readers to hone their own skills, through specific technical advice and Etsy’s team’s own experiences.
Etsy CEO Chad Dickerson has remarked that, “our goal here is to constantly emphasize shipping, and get over any deployment fears.” For Etsy, the regular addition of new features means the team has to stay ahead of their users’. In order to move fast, they’ve adopted a testing process with a heavy emphasis on performance monitoring and analytics. They A/B test whenever they push new features or make any infrastructure upgrades. They do side-by-side comparisons on before and after system performance readouts. They leverage the metrics produced by deep performance monitoring as much as possible. Diligent performance management is core to Etsy’s operational model, and it results is fast, confident updates and quick fixes, whenever necessary.
Throughout Etsy’s engineering team, you see evidence of what Chad Dickerson calls a “radical decentralization of authority,” with performance data and analytics serving as a key enabler for decision making across the organization. System and business metrics like checkouts, listings, registrations, and sign-in rates are projected onto screens throughout the office, reinforcing this data-driven mindset.
Similarly, the engineering team examines database performance analytics closely, allowing them to uncover performance issues and make faster decisions regarding resolution. Armed with the right tools, they can ensure that their “craftsmanship” involves both execution and deep understanding.