ML system design interview-Search Ranking System

Yunrui Li
4 min readJul 1, 2021

Case study: you’re asked to design a search relevance system for a search engine.

Problem Clarifications:

3 aspects:

  1. scope
  2. scale
  3. personalization or not?

Metrics:

Metrics that will help us define the “success” scenarios for a search session.

For Online metrics/(Per-query level):

  • click through rate
  • successful session rate
  • time to success

CTR有個問題就是有些clicks 可能使用者只是點進去但又滿上跳出來這樣不算做是一個successful clicks, 所以我們多了一個successful session rate. 我們計算Dwell time/停留時間在SERP, 來判斷是否這是一個成功的clicks.

還有一種case叫做zero-click searches, 使用者search後不需要click就可以看到他想要的結果, 這也是一種successful SERP但上述兩種does not work for this case. 我們計算使用者用多少queries找到他想要的result, 稱作time to success. 其中a low number of queries per session是我們希望search engine 可以做到的.

single query-based search vs span over several queries

For Offline metrics:

  • normalized discounted cumulative gain (NDCG)
  • mean NDCG over N queries

It’s a critical evaluation metric for any ranking problem.

首先我們先了解最基本的CG(cumulative gain), 這個就是根據你search ranking 系統針對一個query所回應的Documents把他們相對應的relevance rating累加起來. 但我們希望高度相關的文章被放在SERP list後面應該要被penalized, 就出現DCG(discounted cumulative gain), 然後DCG也有缺陷, 不同query之間回傳的SERP會有不同長度, 有的可能檢索3個文章, 有的檢索2個文章. 那這樣長度越長累加CG會越大, 這樣用CDG比較不同query是不公平的, 因此我們用Normalized Discounted cumulative gain(NDCG), 把DCG除以max DCG or the IDCG (ideal discounted cumulative gain) of the query

但注意的是NDCG不會penalize不相關的document(human rater put 0 as relevance rating)被放在retrieved document sets/search engine’s result list.

CG(cumulative gain)
DCG(discounted cumulative gain)
Normalized Discounted cumulative gain(NDCG)
Mean NDCG

再補充implementation code!

Note:

  1. relevance rating comes from ground truth
  2. in order to consider personalization, we might need user’s historical search data and know if he/she is a logged-in user.
  3. NDCG:https://zhuanlan.zhihu.com/p/136199536

Architectural components:

Note:

  1. For candidate generation phase, our goal is usually to increase recall.

why?

recall = TP/(TP+FN), we don’t want to have too many FN in the candidate generation phase. That’s why the purpose of candidate generation phase is to increase recall. For example, in rec system, we want to see the generated candidate by model should be clicked as many as possible, which mean we have lower FN(higher in recall).

2.model execution time refers to cost and benefit refer to benefit. Cost vs benefit tradeoff is always an important consideration in large scale ML systems.

Document Selection

From the one-hundred billion documents on the internet, we want to retrieve the top one-hundred thousand that are relevant to the searcher’s query by using information retrieval techniques.

Basically, document selection can be viewed as task of scoring a large set of documents. (LTR task)

Feature Engineering

The knowledge of feature engineering is highly significant from an interview perspective.

在search中4個最主要的角色

What features we can generate for search ranking problem?

Context could be searcher’s browser history, searcher’s age, gender, location, and the previous queries and time of the day.

For relatively popular queries, historical engagement can be very important. You can use query’s prior engagement(what pages they reviewed before) as a feature.

What BM(Best Matching)25 is?

D: title, metadata or content of a document
Q:

It’s a text match score.

Generate Training Data:

Learning To Rank(LTR) system architecture[2]

Basically, we can think of learning system as what we need in training phase, and ranking system as what we need in testing phase.

PointWise Approach(單文檔):

Pointwise处理对象是单一文档,将Document转化为特征向量(x with 3 features below)后,主要是将排序问题转化为机器学习中常规的分类或回归问题。

当模型参数学习完毕后,之后就可利用模型进行相关性判断,对新的查询和文档,通过模型的打分函数可以得到一个数值,利用该数值即可对文档进行排序了。

Below training examples has 3 features and 5 different labels.

PointWise training examples[2]

Note:

  1. Pointwise完全从单文档的分类角度计算,没有考虑文档之间的相对顺序(relative order)。而且它假设相关度(relevance)是與查询(query)无关的(irrelavent),只要(query,dcoument id)的相关度相同,那么他们就被划分到同一个级别中,属于同一类。

PairWise Approach(文檔對):

Pairwise是目前比较流行的方法,相对pointwise他将重点转向文档顺序关系。它主要将排序问题归结为二元分类问题.

PairWise training examples[2]

如果di>dj则赋值+1(positive),反之-1(negative),于是我们就得到了二元分类器训练所需的训练样本了,如上圖所示。测试时,只要对所有pair进行分类就可以得到所有文档的一个偏序关系,从而实现排序。

Noted: the way we generated could be two types:

  • human raters(offline method)
  • users’s historical engagement data(online method)

Ranking Model

Adopt Stage-wise approach for scalable system

📝 First stage model will focus on the recall of the top five to ten relevant documents in the first five-hundred results while the second stage will ensure precision of the top five to ten relevant documents.

Note:

這種search resulte, 我們通常最在乎前面N個的結果是好的(N=10), 因為使用者通常只看前面的rail.

  • Stage1: pointwise optimisation because objective is xxx
  • Stage2: pairwise optimisation because objective is to find the optimized rank order

Reference:

[1].https://www.educative.io/courses/grokking-the-machine-learning-interview/xlXBOEW7gkJ
[2].https://blog.csdn.net/yuhushangwei/article/details/48547151

--

--

Yunrui Li

I’m Taiwanese expat. I’v worked in Singapore as data scientist after graduation from Taiwan and currently I work in Amsterdam as machine learning engineer.