Arik Hesseldahl

Recent Posts by Arik Hesseldahl

Seven Questions for Facebook Infrastructure Guru Frank Frankovsky

FrankovskyIt doesn’t take long to get impressed when talking to Frank Frankovsky — that is, if you’re the type of person to get impressed by large-scale computing infrastructure problems. As vice president for hardware design and supply chain operations at Facebook, big computing problems are precisely what is in his daily wheelhouse.

I had the chance to be impressed myself when I met one on one with Frankovsky at Facebook headquarters in Menlo Park, Calif. We talked some about Facebook’s helming of the Open Computing Project, through which Facebook shares its tricks for building and configuring the hardware it uses inside its data centers, and also about the unique nature of the infrastructure problems faced by a site as large as Facebook.

AllThingsD: I know two things about your data centers: That you build the machines inside them yourself, and that you keep expanding the number of data centers you have. Aside from those, what are the big important things happening in your data centers?

Frankovsky: The original reason that we started building our own systems was about efficiency, which leads to positive environmental impact. And that is when we created the Open Compute Project. We aren’t the first ones to do it, but we’re the first ones to share what we know, and a lot of others are starting to share what they know and contribute to the project as well as consume from the project. A new theme in the data center is around networking, and another is around cold storage and archiving. Those are the two things that we think are currently sort of underserved in large-scale computing environments.

In sharing the information about how you do things, isn’t there potentially a loss of competitive advantage?

We don’t think so. We think the service that we provide for people to communicate and share what they care about is the thing that differentiates Facebook from others. Essentially sharing the infrastructure designs, we view as a positive. The way I think about it, there is no one technology company that can hire all the best engineers. But when you open source, you sort of can, because you can harness the expertise of so many more engineers. … People are willing to share how they have reduced costs and environmental impact.

So, what does that community look like? Is it like people from Google, or is it someone working out of their garage?

That’s what’s so cool about it, is the diversity of the community. We have some of the biggest technology companies in the world, like Intel and AMD and Dell and ARM and Hewlett-Packard, who are all contributing members. But then we have hobbyists who come to our hackathons and are simply doing cool things.

When I think of the potential scale of computing infrastructure that you’re dealing with, it must simply be enormous. Can you give me some sense of how big your overall footprint actually is?

I don’t have the daily growth numbers. We don’t talk about the total footprint size, but it numbers in the tens of thousands of servers. I can tell you that we add multiple petabytes of capacity every day.

That’s pretty astonishing. So, given that scale, can you tell me what your biggest challenge is right now?

It is pretty astonishing. And that’s why when I talk about the trends we’re seeing right now, one of the storage challenges we have is that a lot of that storage that we add every day is so-called “hot storage,” meaning that it’s storing data that’s frequently accessed for a short amount of time. And then it becomes warm and it becomes cold. The real challenge is to provide effective cold storage. It used to be that people would use tape to archive items. But if you’re scrolling back in your timeline and want to see a photo from a few years ago, you’re not going to wait for someone to go retrieve a tape. So, one of the challenges we shared with the community was around a new way to think about cold storage. You still have to maintain good retrieval speed. You can’t lose the data. But what can you do? There’s a lot of ideas being generated about it. Some are around using low-write-endurance NAND flash memory chips.

Right now, my timeline goes back to just about 2007, when I first joined Facebook, but you still have to serve up pictures that I uploaded then as if they were still new. That’s only six years ago, but you must be thinking in a much longer time frame than that. How then, do you think about it?

We have to store it forever, and make it accessible forever. When you really start to think long-term, that means moving away from mechanical devices like hard drives that spin seems logical. Years from now, they may not still spin. So that makes solid-state storage seem like a logical way to increase in endurance. These are the most precious memories from people’s lives, so the preservation of that data is something we consider a very high priority.

What is the thing that you think about most on a day-to-day basis?

It’s really making sure that we have sufficient capacity to serve our end users. We can’t afford to miss a beat. The new services that we bring online are numerous, the whole design and supply-chain team takes great pride in making sure that the bus never gets a flat. That’s really what we think about. It’s what I jokingly call “feeding the beast.” We have to predict where our end users are going, and what their experience is like if we add more capacity in this or that region. How do we add that capacity in an efficient way, and how do we manage the supply chain so that we don’t have a hiccup. And it’s also about doing it in the most efficient way, not only in a capital expenditure manner, but from the perspective of an ongoing cost. We’ve done a really good job, I think, of making it reliable and efficient.

It occurs to me that we haven’t talked about a Facebook service outage in quite a long time. The only exception I can think of in recent memory was the day that an Amazon Web Services data center went down and took part of Instagram down with it.

When and if we have issues, it’s all hands on deck, because we have more than a billion people who depend on Facebook.

AllThingsD’s Mike Isaac also participated in this interview, and a couple of the questions are his.

Latest Video

View all videos »

Search »

The problem with the Billionaire Savior phase of the newspaper collapse has always been that billionaires don’t tend to like the kind of authority-questioning journalism that upsets the status quo.

— Ryan Chittum, writing in the Columbia Journalism Review about the promise of Pierre Omidyar’s new media venture with Glenn Greenwald