Livin' on the Edge Podcast

Matt Klein on Testing Microservices and Building Cloud Platforms

About

In this episode of the Ambassador Livin’ on the Edge podcast, Matt Klein, creator of Envoy and a software engineer at Lyft, discusses his microservice testing strategies and offers guidance on building effective platforms and deployment environments.

Episode guests

Matt Klein

Software Engineer

Matt Klein is the creator of Envoy and a software engineer at Lyft. He has been working on operating systems, virtualization, distributed systems, networking and making systems easy to operate for nearly 20 years across a variety of companies. Some highlights include leading the development of Twitter’s L7 edge proxy and working on high-performance computing and networking in Amazon’s EC2.

Be sure to check out the additional episodes of the "Livin' on the Edge" podcast.

Key takeaways:

Creating a fast development loop is essential for engineering productivity. Sometimes engineers have to “bend the rules” in order to make this work for them.
It is very challenging to make a development environment like production, particularly when working with a large service-based system or dealing with large amounts of data.
Developers should take time to understand the “test pyramid” model. Creating unit tests that can be run locally without external dependencies can provide fast feedback.
The judicious use of mocks and virtual services can provide effective component and integration testing. A service-based system should make strong use of APIs and contracts (e.g. interface definition languages, such as Protobuf), which can be used to automatically generate skeletons for mock components.
The continuous delivery pipeline should include additional tests, and also be capable of supporting incremental release and testing in production e.g. canary releasing and dark launching.
Minimizing the number of services that are directly dependent on a data store enables more effective testing of a system in general. Stateful services can be tested in isolation, using additional testing techniques. Stateless services interacting with other services can be tested via contracts and interactions.
Technical leaders should ensure that any desire to maintain a staging environment (or series of pre-production environments) provides position return on investment.
Systems that synchronize local and remote development environments, such as Telepresence, Garden, and Tilt, are interesting, but engineers should again take care to ensure their usage provides net positive ROI across their team.
The Lyft team continually runs a rider/driver simulation on their staging environment. This typically identifies large or problematic bugs before they are released to production.
The Lyft rider/driver simulation can also be used for load testing or performance testing within a production environment. The system has been designed to recognize this test traffic, and not trigger certain analytics or data mutation that would skew real customer ride data.
When a team is considering building a cloud platform or improving their development workflows, they should always start by identifying their most impactful/blocking problems. When applying solutions, teams should aim to “keep it simple”.
The future of development and platforms may be skewed towards function-based approaches. Frameworks like Kubernetes and Envoy will be around for a long time, and they will increasingly be pushed down into a platform that supports a variety of architectures.
In the near-term future, organizations are dealing with a lot of core/vintage systems (that make money), and so the immediate goals of many teams are to bridge the gap between the core and cloud native technologies.