Linux 系統程式設計 — Timed Waiting

13 min readDec 18, 2020

Cover of the book “Linux System Programming“ from O’Reilly

本篇作者是 Magisk 開發者；Magisk 是一套開源的 Android 刷機工具。文章中僅會提到 Linux 系統程式設計，與 Android 本身無太大關聯。

最近在寫 Magisk 一個看似很直觀的功能時，發現實作起來異常困難，而且有非常多「陷阱」在裡頭，決定來記錄一下整個思路歷程整理思緒。希望讀者在讀完本篇文章後，能夠對多線程 (multi-threading) 及多程序 (multi-processing) 有更深入的了解，並同時能夠透過此例深刻體會混用 multi-thread 跟 multi-process 是多悲慘的惡夢 😂

(本篇內的 pseudo code 語法介於 Python 跟 JavaScript 之間)

定義問題

以下討論的所有程式碼將會跑在一個多線程的背景程式 (daemon) 中。你可以把它想像成像是一個 server：在一個無限迴圈中不斷等待外部的 request，對於每個收到的 request ，開一個新的 thread 去處理以確保每個 request 都跑在自己獨立的 thread 當中。以下為簡易的 daemon pseudo code：

while true:
    task = wait_for_request()
    create_new_thread_and_run_task(task)

在此我們定義某個會被執行的 task：搜集所有存在的腳本 (script) 並依序執行它們。在跑完全部腳本之後，繼續其他的作業：

all_scripts = collect_scripts()// Checkpoint 1for script in all_scripts
    run_script(script)// Checkpoint 2run_other_stuffs()

今天的課題是想新增一個條件：確保 Checkpoint 1 到 Checkpoint 2 之間的執行時間不超過 35 秒；跑 run_script 時若已經超過時限，則不再等待腳本執行結束，讓這些程序繼續在背景執行。

總結一句話：最多等所有腳本運行 35 秒，超時則讓它繼續在背景執行。

分析

首先，我們來看看 run_script 是怎麼實作的。由於自己程式本身並沒有運行 script 的能力，因此我們創建一個新的進程 (fork new process)，並在 child process 中執行 shell 讓它來跑 script。在 parent process 中 (也就是自己)，我們等待這個 child process 終止，以確保 run_script 結束時，腳本已經運行完畢：

function fork_and_exec_shell(script):
    fork_new_process()
    if in_parent_process:
        return child_process_id
    else in_child_process:
        exec_shell(script)function run_script(script):
    pid = fork_and_exec_shell(script)
    wait_process_to_exit(pid)

很快的我們找到需要改進的地方在哪裡：wait_process_to_exit 並沒有一個等待時間的上限，而是會永遠等到 child process 結束為止。

很不幸的，在 Linux 中並不存在一種直接的方法可以達成此目的 (wait_process_to_exit 實際上是用 waitpid 實作)，那們我們便需要思考其他的解決方法來「曲線救國」。

Linux 進程關係

在講述解決方法之前，先來快速介紹一些 Linux 知識。

每個 Linux process 都一定有一個 parent process。每個 process 在終止時都會發送一個叫 SIGCHLD 的 signal 給 parent (詳見 signal(7) manpage)； Linux 中預設 SIGCHLD 會被忽略，所以除非特別設定，parent process 收到這個 signal 時啥都不會發生。此外，process 在終止後並不會馬上消失，而是進入某種空殼型態成為「殭屍進程」(zombie process)。唯有 parent process 去 “wait” 這個殭屍後 (也可以想成是收屍 ?!)，它才會完全消失，壽終正寢。

如果在 child process 消失之前，parent process 先終止的話，我們稱這個 child process 為「孤兒」(orphan process)，最終會被一個名叫 init 的程式領養，集中管理。Linux 中沒有隔代教養，因此透過一些巧妙的方法可以刻意製造出孤兒 (生出孫子，然後馬上把兒子殺掉，孫子就是孤兒了 🙃)，免除掉自己需要 wait 的責任。文章後續會看到的 fork_orphan 就是用此技巧。

(P.S. SIGCHLD 跟 zombie process 在某些特殊情況下不會發生，在此忽略)

方法一

這個是我在 Stack Overflow (笑) 上面看到有人提供的解法。簡單來說，除了 fork 目標 process 之外，我們另外再 fork 一個 process 專門拿來倒數計時。在 parent process 中同時 wait 這兩個 child，其中任一個 child 結束就終止另一個。如此一來，我們最多只會等待倒數計時設定的時間，挺聰明的吧！

我根據需求稍做修改，試著實作看看 (注意一個細節：我們要解決的問題是限制全部腳本執行時間的「總和」上限，不是個別的運行時間)：

global timer_pidfunction run_script(script):
    if timer_pid < 0:
        // Already timed out, run and forget
        fork_orphan_and_exec_shell(script)
        return
    shell_pid = fork_and_exec_shell(script)
    exited_pid = wait_for_either_to_exit(shell_pid, timer_pid)
    if exited_pid == timer_pid:
        timer_pid = -1
        // shell_pid untracked!// The task
timer_pid = fork_with_timer(35 seconds)
for script in all_scripts:
    run_script(script)
if timer_pid > 0:   // Timer is not done yet
    kill_process(timer_pid)
run_other_stuffs()

這個看似完美的方法，其實在要求的條件下是有瑕疵的！在剛剛的敘述中提到，我們需要同時等待任兩個子進程之一終止 (wait_for_either_to_exit)，但是很可惜在 Linux 中並沒有直接等效的功能；Linux 有的功能只有等待「任意」子進程終止 (詳見 waitpid(2) manpage)，因此實際上要把 wait_for_either_to_exit 換成 wait_for_any_child_to_exit。要記得我們的前提是這些 code 會跑在一個 multi-thread daemon 中，那在這裡我們就有可能會不小心 wait 到由其他同時在執行的 thread 所新建的子進程，構成 multi-thread program 中常見的夢魘 — race condition！

方法二

我們來試試看另外一個方法：不直接用 wait 系列 system call 來等 child process，而是從 SIGCHLD 這個 signal 下手。

先來介紹 signal handler 的概念。Linux 中 thread 可以接收各種不同 signal，舉兩個寫程式時會遇到的例子：在 terminal 按下 “Ctrl + C” 就是手動觸發 SIGTERM 吿知程式終止；程式寫壞時 segmentation fault 則是因為 kernel 送給程式 SIGSEGV 。除了少數特定 signal 不允許外，針對各個不同 signal 我們可以設定自定義的 handler。當某 thread 接收到 signal 且設有相對應的 handler 時，會暫時停止一切，直接跳去執行 signal handler，結束後再回到本來暫停的地方繼續執行。

(P.S. 這裡忽略 multi-thread 情況下 signal 會變得極為複雜)

運用自定義的 SIGCHLD handler，我們可以寫出以下 pseudo code：

global target_time
global shell_pidfunction sigchld_handler():
    if is_process_terminated(shell_pid):
        shell_pid = -1function run_script(script):
    if now() > target_time:
        fork_orphan_and_exec_shell(script)
        return
    shell_pid = fork_and_exec_shell(script)
    while (target_time > now() && shell_pid > 0):
        // Checkpoint 3
        state = suspend_thread(target_time)
        if state == time_out:
            return
            // shell_pid untracked!
        else state == signal_interrupted:
            continue// The task
target_time = now() + 35 seconds
set_signal_handler(SIGCHLD, sigchld_handler)
for script in all_scripts:
    run_script(script)
run_other_stuffs()

這裡利用了 suspend_thread() (實作會用 clock_nanosleep(2)) 在休眠的過程可以被 SIGCHLD 中斷，藉此控制最大等待時間上限。整體的概念是在 fork 完 shell process 後直接進入睡眠，等到 shell 執行結束時會發送 SIGCHLD 將整個 thread 喚醒，並在 handler 中 wait 這個 child process 避免成為 zombie (is_process_terminated 會用 waitpid 實作)。若在超過時間上限時還未完成則不再等待。

想法很直觀，然而仔細檢查 pseudo code 後便會發現一個致命的問題：如果接收到 SIGCHLD 的瞬間正好在 while condition 檢查後，進入 suspend_thread 之前這短暫的空擋內 (也就是 Checkpoint 3 的地方)，那 suspend_thread 休眠期間將不會有任何腳本同時在執行，完全失去整個 function 設計的意義。

反思

其實方法一跟二兩個都另外擁有一個小毛病：在到達時間上限的時候，當下正在運行的 child process 將必定成為 zombie (pseudo code 中標記 shell_pid untracked!)，因為我們不知道它還會執行多久，無法找到有效的時間/地點去 wait 。

突然間，我想到一句話，有如當頭棒喝把我敲醒…

除非絕對必要，否則千萬不要混合 multi-thread 跟 multi-process 的 code

方法三：方法一・改

方法一的問題在於 multi-thread 造成的 race condition ，那如果我們轉念一想，把問題化解成純 multi-process 問題呢？這其實是辦得到的。在 Linux 中，一個 multi-thread process 在 fork 之後的 child process 會變成單線程進程 (single threaded process)，利用此特性，我們重新改寫方法一：

global timer_pidfunction run_script(script):
    shell_pid = fork_and_exec_shell(script)
    if timer_pid < 0:
        // Already timed out, run and forget
        return
    exited_pid = wait_for_any_child_to_exit()
    if exited_pid == timer_pid:
        timer_pid = -1function script_helper():
    timer_pid = fork_with_timer(35 seconds)
    for script in all_scripts:
        run_script(script)
    if timer_pid > 0:   // Timer is not done yet
        kill_process(timer_pid)
    exit_process()// The task
pid = fork_and_run_function(script_helper)
wait_process_to_exit(pid)
run_other_stuffs()

我們首先 fork 一個新的輔助進程跑 script_helper，此時這個 child process 已經是single threaded，可以放心使用 wait_for_any_child_to_exit 。此外，所有的 shell process 都會是原進程的 grandchild process，所以即使達到時間上限不再繼續 wait 後，也會因為 script_helper 在結束後直接自殺 (exit_process) 導致孫子輩 process 全部自動變成孤兒，免除成為 zombie 的煩惱。這個方法不僅相對單純，排除了多線程和 signal，還一石二鳥的解決 zombie process 的問題，真棒！

總結

其實如果今天把方法二實作，在絕大多數的情況下是不會有任何問題的，本機測試絕對萬事 OK。但是如果這個東西進到了 Magisk ，最終推送給可能達上百萬用戶時，這微乎其微的機率幾乎就成了必然。幾年前早期開始寫 Magisk 時還對 multi-thread 和 multi-process 不甚熟悉，許多自己測試沒問題的東西一發表出去，用戶回報問題一籮筐。這個就是 multi-thread/process 程式令人頭痛的地方！

這裡還是想強調，在 multi-threaded process 裡面搞 signal 沒有那麼單純，還需要弄好 signal mask 等。從文章中的 pseudo code 實際要寫成 code 還是有很大的差距。本篇文章主要的目的是想概論上傳達一些在搞 multi-threading, multi-processing 會遇到，大家容易遺漏的觀念。

實際 merge 進 Magisk 的 code 在這，有興趣 (?) 可以參考看看：
https://github.com/topjohnwu/Magisk/commit/eee7f097e335d110b60e8249fda1742f6dee053e

Linux 系統程式設計 — Timed Waiting

定義問題

分析

Linux 進程關係

方法一

方法二

反思

方法三：方法一・改

總結

Written by John Wu

Responses (1)

Linux 系統程式設計 — Timed Waiting

定義問題

分析

Linux 進程關係

方法一

方法二

反思

方法三 ：方法一・改

總結

Written by John Wu

Responses (1)

方法三：方法一・改