Brainfuck解释器的实现

1. Brainfuck 简介

我在此假定本文读者了解图灵机的概念。

Brainfuck(以下简称 BF)是一门图灵完备的编程语言,也就是说,别的编程语言能做的事情 BF 都能做,甚至你可以用它来写一个操作系统。

来,先用一个 Hello World 来感受一下 BF 的画风:

1
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

输出:

1
Hello World!

BF 的规则和语法我这里就不说了(懒得打字,这东西网上一查一大堆,我就不生产冗余信息了),本文重点是解释器的实现。

人们常开玩笑说,最底层的编程语言是 0 和 1(所谓的二进制编程)。实际上并不是这样的,因为 01 序列对应的还是指令,比如01110100实际上实现的是 MOV 指令的功能(举个栗子),指令少则几十个,多则上百个,各种分支跳转条件判断等等,跟 BF 相比还是显得很高级的。

我在此宣布:在软件界最底层的语言是 BF!(esoteric programming languages就不说了……不然没完没了)

为什么要加软件界这个定语呢?因为不限定这个的话,Verilog 会不服的。

2. 解释器的实现

因为我没有在网上找到让我满意的 BF 解释器,所以我就自己用 C 语言写了一个 BF 的解释器bf.exe。哈哈,听起来很装,其实不难,挺简单的,因为 BF 的语法本来就非常简单(虽然用来写程序很烧脑,这也是为什么这个语言叫 Brainfuck)。

在了解了 BF 的规则和语法之后,学过 C 语言的应该都能知道怎么写了,比如>就是++p+就是++*p等等,不过循环部分需要仔细想一想,因为有嵌套的[ ],不能直接做 match,需要有一个 counter,这里的处理是先分析代码然后储存对应的跳转表再解释执行。

Talk is cheap, let me show the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Execute the program.
void interpret()
{
int ch;
for (int code_ptr = 0; code_ptr < code_size; code_ptr++)
{
switch (code[code_ptr])
{
// Increase value at current data cell by one.
case '+':
memory[memory_ptr]++;
break;

// Decrease value at current data cell by one.
case '-':
memory[memory_ptr]--;
break;

// Move data pointer to next address.
case '>':
memory_ptr++;
break;

// Move data pointer to previous address.
case '<':
memory_ptr--;
break;

// Accept one character from user.
case ',':
if ((ch = getchar()) != EOF)
{
memory[memory_ptr] = ch;
}
break;

// Output character at current data cell.
case '.':
putchar(memory[memory_ptr]);
break;

// Begin loop.
case '[':
if (!memory[memory_ptr])
{
code_ptr = targets[code_ptr];
}
break;

// End loop.
case ']':
if (memory[memory_ptr])
{
code_ptr = targets[code_ptr];
}
break;

// Ignore other characters.
default:
break;
}
}
}

实现解释器功能的核心代码不到一百行。

运行结果:

1
2
3
4
5
$ cat hello.bf
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.

$ bf hello.bf
Hello World!

3. 增强——注释与调试

如果说这个 BF 的 Hello World 代码不容易看懂的话,那么可以利用 Python 搞一个 plain 版本的,也就是这样:

1
2
3
string = "Hello World!"

print(''.join(map(lambda c: "+"*ord(c)+".>", string)))

转换结果就是很 plain 的 BF 代码:

1
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++.>

还可以再人性化一点:

1
2
3
string = "Hello World!"

print('\n'.join(map(lambda c: f"{'+'*ord(c)}.> ; '{c}'", string)))

转换结果就很清楚了:

1
2
3
4
5
6
7
8
9
10
11
12
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'H'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'e'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'l'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'l'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'o'
++++++++++++++++++++++++++++++++.> ; ' '
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'W'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'o'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'r'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'l'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'd'
+++++++++++++++++++++++++++++++++.> ; '!'

其中的;只是为了表示这是注释,因为实际上在 BF 解释器里面任何那八种操作之外的字符都会被忽略:default: break;

写到这里,我突然发现,如果要是注释里面包含操作符怎么办?

比如:

1
2
3
4
5
6
7
8
9
10
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 't'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'e'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 's'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 't'
++++++++++++++++++++++++++++++++.> ; ' '
+++++++++++++++++++++++++++++++++++++++++++.> ; '+'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; '>'
++++++++++++++++++++++++++++++++++++++++++++.> ; ','
++++++++++++++++++++++++++++++++++++++++++++++.> ; '.'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; ';'

那么程序就会出问题,因为任何出现在代码里面的那八个字符都会被当作操作符解释执行。

除非……把;变成单行注释!和汇编还吻合呢~

虽然 BF 里的注释功能其实可以用[]来实现,因为当指针指向的单元格的值为零的时候,是会直接跳过循环体内的代码的,这样就可以在循环体内写注释了。但是这种办法只适用于指针指向的单元格的值为零的时候,并不像一般的注释那样灵活。

那么,只需要在bf.c里再加一个 case :

1
2
3
4
5
6
7
8
9
// Single-line comment.
case ';':
if (enable_comment)
{
while (code[++code_ptr] != '\n')
{
}
}
break;

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cat test.bf
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 't'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'e'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 's'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 't'
++++++++++++++++++++++++++++++++.> ; ' '
+++++++++++++++++++++++++++++++++++++++++++.> ; '+'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; '>'
++++++++++++++++++++++++++++++++++++++++++++.> ; ','
++++++++++++++++++++++++++++++++++++++++++++++.> ; '.'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; ';'

$ bf -c test.bf
test +>,.;

其中-c参数是表示启用注释(comment)功能,因为标准的 BF 语言规范里并没有注释。

OK,完美~

咦?我好像不知不觉间给 BF 实现了单行注释的功能……啊哈哈哈

可能很多人会觉得这很无聊,觉得作者太闲了,做这些有什么用呢?我并不是为了实用性而写的,也不能说是完全为了学习而写,其实做这些更像是一种娱乐吧。这有点像小孩子玩积木一样,全心全意地把自己的注意力放在积木上面。日常的生活中,其实有很多的烦恼和苦闷,而当我沉浸在计算机世界里的时候,能够让自己的心沉浸在这小小的天地中,能够体会到宁静和满足,就足够了。


后来,参考网上更多例子重构了内部结构,优化了内存使用。

给 BF 解释器增加了打印内部状态的调试(debug)功能:#,使用-d参数启用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Print the program's internal state for debugging.
case '#':
if (enable_debug)
{
printf("\n");
for (int i = memory_ptr - 8; i < memory_ptr + 8; i++)
{
if (i < 0)
{
printf("-- ");
}
else
{
printf("%02hhX ", memory[i]);
}
}
printf("\n%*s\n", 8 * 3 + 2, "^^");
printf("%*X\n", 8 * 3 + 2, memory_ptr);
}
break;

实现了运行时动态内存增长,这样就真的符合标准图灵机模型了:一条无限长的纸带——内存有多大,这条纸带就能有多长。

发布了正式版并进入了微软 winget 官方仓库,Windows 10 及以上版本直接命令行中输入winget install ChenQingYu.BF就可以安装并使用了。

项目已开源在 GitHub: bf