We recently hosted a Hangout On Air to highlight Site Reliability Engineers (SRE) at Google. SRE is comprised of software and systems engineering teams worldwide who are specialists in troubleshooting, tools development and production systems automation. SRE is responsible for ongoing capacity planning to handle Google’s rapid traffic growth and global expansion.
Today we’re featuring Ib Lundgren, an SRE intern in our ZRH office, who will tell you more about Site Reliability Engineers and the work they do at Google. Ib received his bachelor’s degree in Computer Science and Engineering at Luleå University of Technology in Sweden and will be starting his master’s at UCL after his internship is complete.
What have you worked on as an SRE intern?
SRE interns are involved in a large number of projects and their role in each project can vary tremendously. For me, much of the work SREs do is unlike anything I’d experienced previously at university.
Two good examples were my projects related to monitoring. After the introductory weeks I dug into a migration and refactoring of our threshold based monitoring setup for a specific legacy service. This gave a good introduction to the Google ecosystem.
Later, I embarked on what would become my largest project, the development of a new smart time series analysis system. The purpose of this was to supplement the traditional monitoring by looking for anomalies in trends and classifying their importance based on deviation from the expected, as opposed to deviation from a hard coded threshold.
Another project I was involved in lies in the opposite end of the SRE spectrum, capacity planning. In parallel with my first project I created statistically sound future projections of Bigtable resource usage, mainly disk and memory usage, using an internal forecasting tool. Capacity planning is done per service, as opposed to per team, and my first task was to define these services in terms of Bigtable tables. Then I set up the data extraction pipeline necessary to feed the usage data into the forecasting tool.
What is your typical day like?
My day often starts with skimming through the night’s worth of emails from across the pond. This helps me prioritize what I need to do for the day as a new bug assignment might need urgent attention. I spend part of my day reading and answering code review feedback and bug discussions with various reviewers, usually from my team. This often leads to code needing to be altered, added or removed. When all feedback has been addressed, I move on to the main portion of my daily activities, writing shiny new code.
As an intern you are blessed with much time to focus on one or two tasks without much distraction. The last few months have been almost entirely devoted to my time series analysis project and consequently to finding reviewers for thousands of lines of code. The system was mainly developed by me and it was a great experience to go from an idea of what we would like to achieve, to researching and suggesting possible approaches, to refining the idea and seeing how this new system grew through a large number of iterations, each step with excellent feedback from Googlers both in Zurich and Mountain View.
As a student with experience mostly from the open web and a few article databases, discovering the corporate intranet is similar to finding a new Internet, except you only have a few months to digest all of it! Thus I also try and spend a little time each day to watch tech talks and research libraries, best practices and tools.
Why did you apply for an SRE internship?
I applied to SRE because it would be a great opportunity to grow and gain skills I had not had a chance to develop at university, but in hindsight it really was for the nerf gun shooting, rocket launching glory that is being an SRE.
What’s been your favorite part of the internship?
The greatest part by far about being an SRE intern is seeing the impact you have on your team and other teams you interact with. SREs are constantly working towards eliminating all repetitive work, either their own or in an effort to reduce the workload of others. At first this might come off as a sign of laziness, but it really is about striving to solve new and more challenging problems, not repeating the same job over and over again. As an intern you have time to tackle larger problems the team is struggling to find time for, but is causing them pain on a regular basis. By eliminating that problem and seeing the effect on your team is an opportunity I believe is unique to being an SRE intern.
Also, Google’s food is ridiculously awesome. Of course, with that comes the daily afternoon food crash, which is when I go to the on-site gym =)
|Ib and an impressive looking
snowman after a day of skiing in Engelberg
What skills have you gained from this internship?
The culture here is like no other I’ve experienced. From the start I was free to choose from and work on a number of somewhat tersely described bugs. The initial phase of researching the bug and figuring out what a fix might look like took a while and was a big change from previous lab assignments. It would have been much quicker had I overcome my Noogler (term for “new Googler”) fear of making a fool of myself and started asking questions earlier. This was really when I started to understand how things work at Google. Since then, getting started on new projects and over a blocking issue has gone exponentially faster. The mentality of always asking questions, zooming in on your target and polling for regular feedback to see if you are going in the right direction are invaluable skills to have gained.
Engineering wise I’ve learned the value of doing things the right way and not taking short cuts. I was grilled extensively early on in code reviews and after having to defend all my choices I now know to do my due diligence before choosing any specific solution.
Google has been through many intriguing problems, the type of problems usually discarded in text books as so unlikely to happen that you need not bother thinking of them. Here, however, you stumble over these knowledge gems daily as you wander through the vast codebase.
Any advice for people considering applying?
You don’t need to be a command line ninja with years of Linux kernel hacking experience to do an SRE internship so don’t be afraid to apply, the worst you can get is a no.
What are your plans after your internship is over?
I’m going to do a master’s in the UK and I hope to work full-time at Google once I finish.
Posted by Frida Borjesson, College Recruiting Specialist