1. Data modeling.
    • Data modeling – If we want to use a relational database like MySQL, we can define user object and feed object. Two relations are also necessary. One is user can follow each other, the other is each feed has a user owner.
  2. How to serve feeds.

    • Serve feeds – The most straightforward way is to fetch feeds from all the people you follow and render them by time.

    • When users followed a lot of people, fetching and rendering all their feeds can be costly. How to improve this? There are many approaches. Since Twitter has the infinite scroll feature especially on mobile, each time we only need to fetch the most recent N feeds instead of all of them. Then there will many details about how the pagination should be implemented.

You may also consider cache, which might also be helpful to speed things up.

  1. How to detect fake users? This can be related to machine learning. One way to do it is to identify several related features like registration date, the number of followers, the number of feeds etc. and build a machine learning system to detect if a user is fake.

  2. Can we order feed by other algorithms? There are a lot of debate about this topic over the past few weeks. If we want to order based on users interests, how to design the algorithm?

I would say few things we should clarify to the interviewer.

How to measure the algorithm? Maybe by the average time users spend on Twitter or users interaction like favorite/retweet. What signals to use to evaluate how likely the user will like the feed? Users relation with the author, the number of replies/retweets of this feed, the number of followers of the author etc. might be important. If machine learning is used, how to design the whole system?

  1. How to implement the @ feature and retweet feature? For @ feature, we can simply store a list of user IDs inside each feed. So when rendering your feeds, you should also include feeds that have your ID in its @ list. This adds a little bit complexity to the rendering logic.

For retweet feature, we could do the similar thing. Inside each feed, a feed ID (pointer) is stored, which indicates the original post if there’s any.

results matching ""

    No results matching ""