Err… yeah

Poor Facebook, but also WTF?!?
I think Matt puts it better than I ever could. See here.
It’s true though that it’s impressive as hell that they coded something like this. I mean, I can’t imagine how difficult it must be to keep track of every single action a user makes and then broadcast that to all of said person’s friends… I imagine the algorithm to do that had to be optimized a ton to make page loads bearable. Or I guess, depending on how they have friend structures organized (I imagine as a graph with bidirectional edges), it’s really not that difficult, just space-intensive.

I got new glasses today. They make me happy, I think. I’m gonna have to get used to wearing them higher up on my nose since they’re not as tall as my old ones. Also got a bunch of gummy treats from my parents… and 16 rolls of film. Yay film!

Yearbook meeting was today. Tomorrow is the Activities Fair. Whee.

Algo algo algo algo
Quicksort, quicksort
Algo algo algo algo
Quicksort, quicksort
Algo algo algo algo
Quicksort, quicksort
Algo algo algo algo
Ahhh it’s a Tom, a Tom, ohhhh it’s a Tom

Edit: Actually, this is interesting to think about, from a technical point of view… and I guess it’s not as simple as I originally thought.
Let’s say a user has n friends. Each person has their last 10 actions stored (this isn’t exactly true, since you can delete actions and older ones take their place… and I haven’t deleted enough of mine to see how many they store). Finding the last k actions by your friends incurrs an automatic lookup of 10nlog(10n) to sort all the actions and find the k most recent. Many of these actions may also incurr an additional cost, as they require looking up the tagged people (i.e. photos or notes) in your friends list to see if they exist… likely either a cost of n or 1 amortized, depending on how friends links are stored. Either way, that’s unimportant asymptotically, so you incurr an automatic cost of O(nlogn) with every home page view, assuming the content is generated dynamically, which it seems to be.
Dealing with issues of untagging, deleted groups, removing your actions, and such seem to be relatively easy… just delete the corresponding entry in the log for the person (incurring an additional O(n) each time someone does one of these actions), and those actions are automatically removed from any feeds.
Dealing with multiple people joining a group seems to be more difficult though, since the feed combines such things into “X, Y, and Z joined group G”. This likely incurrs another computation of order n (although I guess they could process duplicates here too, like if two of your friends entered a relationship).
Some people have on the order of 1000+ friends. I don’t even want to think about how many people are logging into facebook (and seeing the home page with the feed info) at any given time.
Even an algorithm that can run in O(nlogn) would be immensely horrible to generate on every login… I can’t think of anything else on Facebook right now that has to do that, besides your friends page (ordering friends by profile update time), which people probably don’t use as much.

The feeds on each user’s page probably aren’t as bad… tagging someone probably adds an action to their log that they were tagged and such, which doesn’t seem incredibly time-intensive, and it’s trivial to store what the user did and then sort the actions.

Meh. My analysis is probably all wrong… but still, it’s quite impressive what they managed to achieve.
Too bad it wasn’t a practical application :-\

2 thoughts on “Err… yeah

  1. Pfff… Just store the things as a hashmap. First parameter would be the user involved, the next a hashset of the events. Or at least something similar to that… Anything significantly over O(n) wouldn’t be feasible for all the data being moved around. Deletion would be fairly simple… You just set the appropriate object to not exist, and all the sets would automatically shrink (since refering to null is just silly).

  2. Um… Josh… that doesn’t really make sense.
    I’m already assuming they have an O(1) lookup for events associated with a person (since everyone has a unique ID, they could just have an array indexed by these, which is far more efficient than having a HashMap).
    The fact that all the events in the feed have to be sorted and such is the main difficulty that causes an O(nlogn) computation.
    Also keep in mind that hashmaps require more space to get that O(1) lookup (amortized), so for something on this scale, a hashmap would probably not be the way to go.

    That said, if they used databases, it would be fairly simple to store events in chronological order and then do database queries to pull out events associated with people. That would still require an O(m) lookup where m is the number of recent events though… which is probably dirtier than organizing by friends.

Comments are closed.