前端與編譯原理——用JS寫一個JS解釋器

作者：佚名 2018-12-04 13:30:28

說起編譯原理，印象往往只停留在本科時那些枯燥的課程和晦澀的概念。作為前端開發者，編譯原理似乎離我們很遠，對它的理解很可能僅僅局限于“抽象語法樹(AST)”。但這僅僅是個開頭而已。編譯原理的使用，甚至能讓我們利用JS直接寫一個能運行JS代碼的解釋器。

[[251469]]

項目地址：https://github.com/jrainlau/c...

在線體驗：https://codepen.io/jrainlau/p...

一、為什么要用JS寫JS的解釋器

接觸過小程序開發的同學應該知道，小程序運行的環境禁止new Function，eval等方法的使用，導致我們無法直接執行字符串形式的動態代碼。此外，許多平臺也對這些JS自帶的可執行動態代碼的方法進行了限制，那么我們是沒有任何辦法了嗎?既然如此，我們便可以用JS寫一個解析器，讓JS自己去運行自己。

在開始之前，我們先簡單回顧一下編譯原理的一些概念。

二、什么是編譯器

說到編譯原理，肯定離不開編譯器。簡單來說，當一段代碼經過編譯器的詞法分析、語法分析等階段之后，會生成一個樹狀結構的“抽象語法樹(AST)”，該語法樹的每一個節點都對應著代碼當中不同含義的片段。

比如有這么一段代碼：

const a = 1  
console.log(a)

經過編譯器處理后，它的AST長這樣：

{ 
  "type": "Program", 
  "start": 0, 
  "end": 26, 
  "body": [ 
    { 
      "type": "VariableDeclaration", 
      "start": 0, 
      "end": 11, 
      "declarations": [ 
        { 
          "type": "VariableDeclarator", 
          "start": 6, 
          "end": 11, 
          "id": { 
            "type": "Identifier", 
            "start": 6, 
            "end": 7, 
            "name": "a" 
          }, 
          "init": { 
            "type": "Literal", 
            "start": 10, 
            "end": 11, 
            "value": 1, 
            "raw": "1" 
          } 
        } 
      ], 
      "kind": "const" 
    }, 
    { 
      "type": "ExpressionStatement", 
      "start": 12, 
      "end": 26, 
      "expression": { 
        "type": "CallExpression", 
        "start": 12, 
        "end": 26, 
        "callee": { 
          "type": "MemberExpression", 
          "start": 12, 
          "end": 23, 
          "object": { 
            "type": "Identifier", 
            "start": 12, 
            "end": 19, 
            "name": "console" 
          }, 
          "property": { 
            "type": "Identifier", 
            "start": 20, 
            "end": 23, 
            "name": "log" 
          }, 
          "computed": false 
        }, 
        "arguments": [ 
          { 
            "type": "Identifier", 
            "start": 24, 
            "end": 25, 
            "name": "a" 
          } 
        ] 
      } 
    } 
  ], 
  "sourceType": "module" 
}

常見的JS編譯器有babylon，acorn等等，感興趣的同學可以在AST explorer這個網站自行體驗。

可以看到，編譯出來的AST詳細記錄了代碼中所有語義代碼的類型、起始位置等信息。這段代碼除了根節點Program外，主體包含了兩個節點VariableDeclaration和ExpressionStatement，而這些節點里面又包含了不同的子節點。

正是由于AST詳細記錄了代碼的語義化信息，所以Babel，Webpack，Sass，Less等工具可以針對代碼進行非常智能的處理。

三、什么是解釋器

如同翻譯人員不僅能看懂一門外語，也能對其藝術加工后把它翻譯成母語一樣，人們把能夠將代碼轉化成AST的工具叫做“編譯器”，而把能夠將AST翻譯成目標語言并運行的工具叫做“解釋器”。

在編譯原理的課程中，我們思考過這么一個問題：如何讓計算機運行算數表達式1+2+3:

1 + 2 + 3

當機器執行的時候，它可能會是這樣的機器碼：

1 PUSH 1  
2 PUSH 2  
3 ADD  
4 PUSH 3  
5 ADD

而運行這段機器碼的程序，就是解釋器。

在這篇文章中，我們不會搞出機器碼這樣復雜的東西，僅僅是使用JS在其runtime環境下去解釋JS代碼的AST。由于解釋器使用JS編寫，所以我們可以大膽使用JS自身的語言特性，比如this綁定、new關鍵字等等，完全不需要對它們進行額外處理，也因此讓JS解釋器的實現變得非常簡單。

在回顧了編譯原理的基本概念之后，我們就可以著手進行開發了。

四、節點遍歷器

通過分析上文的AST，可以看到每一個節點都會有一個類型屬性type，不同類型的節點需要不同的處理方式，處理這些節點的程序，就是“節點處理器(nodeHandler)”

定義一個節點處理器：

const nodeHandler = { 
  Program () {}, 
  VariableDeclaration () {}, 
  ExpressionStatement () {}, 
  MemberExpression () {}, 
  CallExpression () {}, 
  Identifier () {} 
}

關于節點處理器的具體實現，會在后文進行詳細探討，這里暫時不作展開。

有了節點處理器，我們便需要去遍歷AST當中的每一個節點，遞歸地調用節點處理器，直到完成對整棵語法書的處理。

定義一個節點遍歷器(NodeIterator)：

class NodeIterator { 
  constructor (node) { 
    this.node = node 
    this.nodeHandler = nodeHandler 
  } 
 
  traverse (node) { 
    // 根據節點類型找到節點處理器當中對應的函數 
    const _eval = this.nodeHandler[node.type] 
    // 若找不到則報錯 
    if (!_eval) { 
      throw new Error(`canjs: Unknown node type "${node.type}".`) 
    } 
    // 運行處理函數 
    return _eval(node) 
  } 
 
}

理論上，節點遍歷器這樣設計就可以了，但仔細推敲，發現漏了一個很重要的東西——作用域處理。

回到節點處理器的VariableDeclaration()方法，它用來處理諸如const a = 1這樣的變量聲明節點。假設它的代碼如下：

VariableDeclaration (node) { 
    for (const declaration of node.declarations) { 
      const { name } = declaration.id 
      const value = declaration.init ? traverse(declaration.init) : undefined 
      // 問題來了，拿到了變量的名稱和值，然后把它保存到哪里去呢？ 
      // ... 
    } 
  },

問題在于，處理完變量聲明節點以后，理應把這個變量保存起來。按照JS語言特性，這個變量應該存放在一個作用域當中。在JS解析器的實現過程中，這個作用域可以被定義為一個scope對象。

改寫節點遍歷器，為其新增一個scope對象

class NodeIterator { 
  constructor (node, scope = {}) { 
    this.node = node 
    this.scope = scope 
    this.nodeHandler = nodeHandler 
  } 
 
  traverse (node, options = {}) { 
    const scope = options.scope || this.scope 
    const nodeIterator = new NodeIterator(node, scope) 
    const _eval = this.nodeHandler[node.type] 
    if (!_eval) { 
      throw new Error(`canjs: Unknown node type "${node.type}".`) 
    } 
    return _eval(nodeIterator) 
  } 
 
  createScope (blockType = 'block') { 
    return new Scope(blockType, this.scope) 
  } 
}

然后節點處理函數VariableDeclaration()就可以通過scope保存變量了：

VariableDeclaration (nodeIterator) { 
    const kind = nodeIterator.node.kind 
    for (const declaration of nodeIterator.node.declarations) { 
      const { name } = declaration.id 
      const value = declaration.init ? nodeIterator.traverse(declaration.init) : undefined 
      // 在作用域當中定義變量 
      // 如果當前是塊級作用域且變量用var定義，則定義到父級作用域 
      if (nodeIterator.scope.type === 'block' && kind === 'var') { 
        nodeIterator.scope.parentScope.declare(name, value, kind) 
      } else { 
        nodeIterator.scope.declare(name, value, kind) 
      } 
    } 
  },

關于作用域的處理，可以說是整個JS解釋器最難的部分。接下來我們將對作用域處理進行深入的剖析。

五、作用域處理

考慮到這樣一種情況：

const a = 1 
{ 
  const b = 2 
  console.log(a) 
} 
console.log(b)

運行結果必然是能夠打印出a的值，然后報錯：Uncaught ReferenceError: b is not defined

這段代碼就是涉及到了作用域的問題。塊級作用域或者函數作用域可以讀取其父級作用域當中的變量，反之則不行，所以對于作用域我們不能簡單地定義一個空對象，而是要專門進行處理。

定義一個作用域基類Scope：

class Scope { 
  constructor (type, parentScope) { 
    // 作用域類型，區分函數作用域function和塊級作用域block 
    this.type = type 
    // 父級作用域 
    this.parentScope = parentScope 
    // 全局作用域 
    this.globalDeclaration = standardMap 
    // 當前作用域的變量空間 
    this.declaration = Object.create(null) 
  } 
 
  /* 
   * get/set方法用于獲取/設置當前作用域中對應name的變量值 
     符合JS語法規則，優先從當前作用域去找，若找不到則到父級作用域去找，然后到全局作用域找。 
     如果都沒有，就報錯 
   */ 
  get (name) { 
    if (this.declaration[name]) { 
      return this.declaration[name] 
    } else if (this.parentScope) { 
      return this.parentScope.get(name) 
    } else if (this.globalDeclaration[name]) { 
      return this.globalDeclaration[name] 
    } 
    throw new ReferenceError(`${name} is not defined`) 
  } 
 
  set (name, value) { 
    if (this.declaration[name]) { 
      this.declaration[name] = value 
    } else if (this.parentScope[name]) { 
      this.parentScope.set(name, value) 
    } else { 
      throw new ReferenceError(`${name} is not defined`) 
    } 
  } 
 
  /** 
   * 根據變量的kind調用不同的變量定義方法 
   */ 
  declare (name, value, kind = 'var') { 
    if (kind === 'var') { 
      return this.varDeclare(name, value) 
    } else if (kind === 'let') { 
      return this.letDeclare(name, value) 
    } else if (kind === 'const') { 
      return this.constDeclare(name, value) 
    } else { 
      throw new Error(`canjs: Invalid Variable Declaration Kind of "${kind}"`) 
    } 
  } 
 
  varDeclare (name, value) { 
    let scope = this 
    // 若當前作用域存在非函數類型的父級作用域時，就把變量定義到父級作用域 
    while (scope.parentScope && scope.type !== 'function') { 
      scope = scope.parentScope 
    } 
    this.declaration[name] = new SimpleValue(value, 'var') 
    return this.declaration[name] 
  } 
 
  letDeclare (name, value) { 
    // 不允許重復定義 
    if (this.declaration[name]) { 
      throw new SyntaxError(`Identifier ${name} has already been declared`) 
    } 
    this.declaration[name] = new SimpleValue(value, 'let') 
    return this.declaration[name] 
  } 
 
  constDeclare (name, value) { 
    // 不允許重復定義 
    if (this.declaration[name]) { 
      throw new SyntaxError(`Identifier ${name} has already been declared`) 
    } 
    this.declaration[name] = new SimpleValue(value, 'const') 
    return this.declaration[name] 
  } 
}

這里使用了一個叫做simpleValue()的函數來定義變量值，主要用于處理常量：

class SimpleValue { 
  constructor (value, kind = '') { 
    this.value = value 
    this.kind = kind 
  } 
 
  set (value) { 
    // 禁止重新對const類型變量賦值 
    if (this.kind === 'const') { 
      throw new TypeError('Assignment to constant variable') 
    } else { 
      this.value = value 
    } 
  } 
 
  get () { 
    return this.value 
  } 
}

處理作用域問題思路，關鍵的地方就是在于JS語言本身尋找變量的特性——優先當前作用域，父作用域次之，全局作用域***。反過來，在節點處理函數VariableDeclaration()里，如果遇到塊級作用域且關鍵字為var，則需要把這個變量也定義到父級作用域當中，這也就是我們常說的“全局變量污染”。

JS標準庫注入

細心的讀者會發現，在定義Scope基類的時候，其全局作用域globalScope被賦值了一個standardMap對象，這個對象就是JS標準庫。

簡單來說，JS標準庫就是JS這門語言本身所帶有的一系列方法和屬性，如常用的setTimeout，console.log等等。為了讓解析器也能夠執行這些方法，所以我們需要為其注入標準庫：

const standardMap = {  
console: new SimpleValue(console)  
}

這樣就相當于往解析器的全局作用域當中注入了console這個對象，也就可以直接被使用了。

六、節點處理器

在處理完節點遍歷器、作用域處理的工作之后，便可以來編寫節點處理器了。顧名思義，節點處理器是專門用來處理AST節點的，上文反復提及的VariableDeclaration()方法便是其中一個。下面將對部分關鍵的節點處理器進行講解。

在開發節點處理器之前，需要用到一個工具，用于判斷JS語句當中的return，break，continue關鍵字。

關鍵字判斷工具Signal

定義一個Signal基類：

class Signal { 
  constructor (type, value) { 
    this.type = type 
    this.value = value 
  } 
 
  static Return (value) { 
    return new Signal('return', value) 
  } 
 
  static Break (label = null) { 
    return new Signal('break', label) 
  } 
 
  static Continue (label) { 
    return new Signal('continue', label) 
  } 
 
  static isReturn(signal) { 
    return signal instanceof Signal && signal.type === 'return' 
  } 
 
  static isContinue(signal) { 
    return signal instanceof Signal && signal.type === 'continue' 
  } 
 
  static isBreak(signal) { 
    return signal instanceof Signal && signal.type === 'break' 
  } 
 
  static isSignal (signal) { 
    return signal instanceof Signal 
  } 
}

有了它，就可以對語句當中的關鍵字進行判斷處理，接下來會有大用處。

1、變量定義節點處理器——VariableDeclaration()

最常用的節點處理器之一，負責把變量注冊到正確的作用域。

VariableDeclaration (nodeIterator) { 
   const kind = nodeIterator.node.kind 
   for (const declaration of nodeIterator.node.declarations) { 
     const { name } = declaration.id 
     const value = declaration.init ? nodeIterator.traverse(declaration.init) : undefined 
     // 在作用域當中定義變量 
     // 若為塊級作用域且關鍵字為var，則需要做全局污染 
     if (nodeIterator.scope.type === 'block' && kind === 'var') { 
       nodeIterator.scope.parentScope.declare(name, value, kind) 
     } else { 
       nodeIterator.scope.declare(name, value, kind) 
     } 
   } 
 },

2、標識符節點處理器——Identifier()

專門用于從作用域中獲取標識符的值。

Identifier (nodeIterator) { 
    if (nodeIterator.node.name === 'undefined') { 
      return undefined 
    } 
    return nodeIterator.scope.get(nodeIterator.node.name).value 
  },

3、字符節點處理器——Literal()

返回字符節點的值。

Literal (nodeIterator) { 
    return nodeIterator.node.value 
  }

4、表達式調用節點處理器——CallExpression()

用于處理表達式調用節點的處理器，如處理func()，console.log()等。

CallExpression (nodeIterator) { 
    // 遍歷callee獲取函數體 
    const func = nodeIterator.traverse(nodeIterator.node.callee) 
    // 獲取參數 
    const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg)) 
 
    let value 
    if (nodeIterator.node.callee.type === 'MemberExpression') { 
      value = nodeIterator.traverse(nodeIterator.node.callee.object) 
    } 
    // 返回函數運行結果 
    return func.apply(value, args) 
  },

5、表達式節點處理器——MemberExpression()

區分于上面的“表達式調用節點處理器”，表達式節點指的是person.say，console.log這種函數表達式。

MemberExpression (nodeIterator) { 
    // 獲取對象，如console 
    const obj = nodeIterator.traverse(nodeIterator.node.object) 
    // 獲取對象的方法，如log 
    const name = nodeIterator.node.property.name 
    // 返回表達式，如console.log 
    return obj[name] 
  }

6、塊級聲明節點處理器——BlockStatement()

非常常用的處理器，專門用于處理塊級聲明節點，如函數、循環、try...catch...當中的情景。

BlockStatement (nodeIterator) { 
   // 先定義一個塊級作用域 
   let scope = nodeIterator.createScope('block') 
 
   // 處理塊級節點內的每一個節點 
   for (const node of nodeIterator.node.body) { 
     if (node.type === 'VariableDeclaration' && node.kind === 'var') { 
       for (const declaration of node.declarations) { 
         scope.declare(declaration.id.name, declaration.init.value, node.kind) 
       } 
     } else if (node.type === 'FunctionDeclaration') { 
       nodeIterator.traverse(node, { scope }) 
     } 
   } 
 
   // 提取關鍵字（return, break, continue） 
   for (const node of nodeIterator.node.body) { 
     if (node.type === 'FunctionDeclaration') { 
       continue 
     } 
     const signal = nodeIterator.traverse(node, { scope }) 
     if (Signal.isSignal(signal)) { 
       return signal 
     } 
   } 
 }

可以看到這個處理器里面有兩個for...of循環。***個用于處理塊級內語句，第二個專門用于識別關鍵字，如循環體內部的break，continue或者函數體內部的return。

7、函數定義節點處理器——FunctionDeclaration()

往作用當中聲明一個和函數名相同的變量，值為所定義的函數：

FunctionDeclaration (nodeIterator) { 
    const fn = NodeHandler.FunctionExpression(nodeIterator) 
    nodeIterator.scope.varDeclare(nodeIterator.node.id.name, fn) 
    return fn     
  }

8、函數表達式節點處理器——FunctionExpression()

用于定義一個函數：

FunctionExpression (nodeIterator) { 
    const node = nodeIterator.node 
    /** 
     * 1、定義函數需要先為其定義一個函數作用域，且允許繼承父級作用域 
     * 2、注冊`this`, `arguments`和形參到作用域的變量空間 
     * 3、檢查return關鍵字 
     * 4、定義函數名和長度 
     */ 
    const fn = function () { 
      const scope = nodeIterator.createScope('function') 
      scope.constDeclare('this', this) 
      scope.constDeclare('arguments', arguments) 
 
      node.params.forEach((param, index) => { 
        const name = param.name 
        scope.varDeclare(name, arguments[index]) 
      }) 
 
      const signal = nodeIterator.traverse(node.body, { scope }) 
      if (Signal.isReturn(signal)) { 
        return signal.value 
      } 
    }

9、this表達式處理器——ThisExpression()

該處理器直接使用JS語言自身的特性，把this關鍵字從作用域中取出即可。

ThisExpression (nodeIterator) { 
    const value = nodeIterator.scope.get('this') 
    return value ? value.value : null 
  }

10、new表達式處理器——NewExpression()

和this表達式類似，也是直接沿用JS的語言特性，獲取函數和參數之后，通過bind關鍵字生成一個構造函數，并返回。

NewExpression (nodeIterator) { 
   const func = nodeIterator.traverse(nodeIterator.node.callee) 
   const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg)) 
   return new (func.bind(null, ...args)) 
 }

11、For循環節點處理器——ForStatement()

For循環的三個參數對應著節點的init，test，update屬性，對著三個屬性分別調用節點處理器處理，并放回JS原生的for循環當中即可。

ForStatement (nodeIterator) { 
    const node = nodeIterator.node 
    let scope = nodeIterator.scope 
    if (node.init && node.init.type === 'VariableDeclaration' && node.init.kind !== 'var') { 
      scope = nodeIterator.createScope('block') 
    } 
 
    for ( 
      node.init && nodeIterator.traverse(node.init, { scope }); 
      node.test ? nodeIterator.traverse(node.test, { scope }) : true; 
      node.update && nodeIterator.traverse(node.update, { scope }) 
    ) { 
      const signal = nodeIterator.traverse(node.body, { scope }) 
       
      if (Signal.isBreak(signal)) { 
        break 
      } else if (Signal.isContinue(signal)) { 
        continue 
      } else if (Signal.isReturn(signal)) { 
        return signal 
      } 
    } 
  }

同理，for...in，while和do...while循環也是類似的處理方式，這里不再贅述。

12、If聲明節點處理器——IfStatemtnt()

處理If語句，包括if，if...else，if...elseif...else。

IfStatement (nodeIterator) { 
   if (nodeIterator.traverse(nodeIterator.node.test)) { 
     return nodeIterator.traverse(nodeIterator.node.consequent) 
   } else if (nodeIterator.node.alternate) { 
     return nodeIterator.traverse(nodeIterator.node.alternate) 
   } 
 }

同理，switch語句、三目表達式也是類似的處理方式。

---

上面列出了幾個比較重要的節點處理器，在es5當中還有很多節點需要處理，詳細內容可以訪問這個地址一探究竟。

七、定義調用方式

經過了上面的所有步驟，解析器已經具備處理es5代碼的能力，接下來就是對這些散裝的內容進行組裝，最終定義一個方便用戶調用的辦法。

const { Parser } = require('acorn') 
const NodeIterator = require('./iterator') 
const Scope = require('./scope') 
 
class Canjs { 
  constructor (code = '', extraDeclaration = {}) { 
    this.code = code 
    this.extraDeclaration = extraDeclaration 
    this.ast = Parser.parse(code) 
    this.nodeIterator = null 
    this.init() 
  } 
 
  init () { 
    // 定義全局作用域，該作用域類型為函數作用域 
    const globalScope = new Scope('function') 
    // 根據入參定義標準庫之外的全局變量 
    Object.keys(this.extraDeclaration).forEach((key) => { 
      globalScope.addDeclaration(key, this.extraDeclaration[key]) 
    }) 
    this.nodeIterator = new NodeIterator(null, globalScope) 
  } 
 
  run () { 
    return this.nodeIterator.traverse(this.ast) 
  } 
}

這里我們定義了一個名為Canjs的基類，接受字符串形式的JS代碼，同時可定義標準庫之外的變量。當運行run()方法的時候就可以得到運行結果。

八、后續

至此，整個JS解析器已經完成，可以很好地運行ES5的代碼(可能還有bug沒有發現)。但是在當前的實現中，所有的運行結果都是放在一個類似沙盒的地方，無法對外界產生影響。如果要把運行結果取出來，可能的辦法有兩種。***種是傳入一個全局的變量，把影響作用在這個全局變量當中，借助它把結果帶出來;另外一種則是讓解析器支持export語法，能夠把export語句聲明的結果返回，感興趣的讀者可以自行研究。

***，這個JS解析器已經在我的Github上開源，歡迎前來交流~

https://github.com/jrainlau/c...

責任編輯：龐桂玉來源： segmentfault

Javascript 編譯原理前端

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

前端與編譯原理——用JS寫一個JS解釋器