Brainfuck解释器的实现
1. Brainfuck 简介
我在此假定本文读者了解图灵机的概念。
Brainfuck(以下简称 BF)是一门图灵完备的编程语言,也就是说,别的编程语言能做的事情 BF 都能做,甚至你可以用它来写一个操作系统。
来,先用一个 Hello World 来感受一下 BF 的画风:1
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
输出:1
Hello World!
BF 的规则和语法我这里就不说了(懒得打字,这东西网上一查一大堆,我就不生产冗余信息了),本文重点是解释器的实现。
人们常开玩笑说,最底层的编程语言是 0 和 1(所谓的二进制编程)。实际上并不是这样的,因为 01 序列对应的还是指令,比如01110100
实际上实现的是 MOV 指令的功能(举个栗子),指令少则几十个,多则上百个,各种分支跳转条件判断等等,跟 BF 相比还是显得很高级的。
我在此宣布:在软件界最底层的语言是 BF!(esoteric programming languages就不说了……不然没完没了)
为什么要加软件界这个定语呢?因为不限定这个的话,Verilog 会不服的。
2. 解释器的实现
因为我没有在网上找到让我满意的 BF 解释器,所以我就自己用 C 语言写了一个 BF 的解释器bf.exe
。哈哈,听起来很装,其实不难,挺简单的,因为 BF 的语法本来就非常简单(虽然用来写程序很烧脑,这也是为什么这个语言叫 Brainfuck)。
在了解了 BF 的规则和语法之后,学过 C 语言的应该都能知道怎么写了,比如>
就是++p
,+
就是++*p
等等,不过循环部分需要仔细想一想,因为有嵌套的[ ]
,不能直接做 match,需要有一个 counter,这里的处理是先分析代码然后储存对应的跳转表再解释执行。
Talk is cheap, let me show the code:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63// Execute the program.
void interpret()
{
int ch;
for (int code_ptr = 0; code_ptr < code_size; code_ptr++)
{
switch (code[code_ptr])
{
// Increase value at current data cell by one.
case '+':
memory[memory_ptr]++;
break;
// Decrease value at current data cell by one.
case '-':
memory[memory_ptr]--;
break;
// Move data pointer to next address.
case '>':
memory_ptr++;
break;
// Move data pointer to previous address.
case '<':
memory_ptr--;
break;
// Accept one character from user.
case ',':
if ((ch = getchar()) != EOF)
{
memory[memory_ptr] = ch;
}
break;
// Output character at current data cell.
case '.':
putchar(memory[memory_ptr]);
break;
// Begin loop.
case '[':
if (!memory[memory_ptr])
{
code_ptr = targets[code_ptr];
}
break;
// End loop.
case ']':
if (memory[memory_ptr])
{
code_ptr = targets[code_ptr];
}
break;
// Ignore other characters.
default:
break;
}
}
}
实现解释器功能的核心代码不到一百行。
运行结果:1
2
3
4
5$ cat hello.bf
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
$ bf hello.bf
Hello World!
3. 增强——注释与调试
如果说这个 BF 的 Hello World 代码不容易看懂的话,那么可以利用 Python 搞一个 plain 版本的,也就是这样:1
2
3string = "Hello World!"
print(''.join(map(lambda c: "+"*ord(c)+".>", string)))
转换结果就是很 plain 的 BF 代码:1
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.>+++++++++++++++++++++++++++++++++.>
还可以再人性化一点:1
2
3string = "Hello World!"
print('\n'.join(map(lambda c: f"{'+'*ord(c)}.> ; '{c}'", string)))
转换结果就很清楚了:1
2
3
4
5
6
7
8
9
10
11
12++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'H'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'e'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'l'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'l'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'o'
++++++++++++++++++++++++++++++++.> ; ' '
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'W'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'o'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'r'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'l'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'd'
+++++++++++++++++++++++++++++++++.> ; '!'
其中的;
只是为了表示这是注释,因为实际上在 BF 解释器里面任何那八种操作之外的字符都会被忽略:default: break;
。
写到这里,我突然发现,如果要是注释里面包含操作符怎么办?
比如:1
2
3
4
5
6
7
8
9
10++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 't'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'e'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 's'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 't'
++++++++++++++++++++++++++++++++.> ; ' '
+++++++++++++++++++++++++++++++++++++++++++.> ; '+'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; '>'
++++++++++++++++++++++++++++++++++++++++++++.> ; ','
++++++++++++++++++++++++++++++++++++++++++++++.> ; '.'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; ';'
那么程序就会出问题,因为任何出现在代码里面的那八个字符都会被当作操作符解释执行。
除非……把;
变成单行注释!和汇编还吻合呢~
虽然 BF 里的注释功能其实可以用[]
来实现,因为当指针指向的单元格的值为零的时候,是会直接跳过循环体内的代码的,这样就可以在循环体内写注释了。但是这种办法只适用于指针指向的单元格的值为零的时候,并不像一般的注释那样灵活。
那么,只需要在bf.c
里再加一个 case :1
2
3
4
5
6
7
8
9// Single-line comment.
case ';':
if (enable_comment)
{
while (code[++code_ptr] != '\n')
{
}
}
break;
运行结果:1
2
3
4
5
6
7
8
9
10
11
12
13
14$ cat test.bf
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 't'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 'e'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 's'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; 't'
++++++++++++++++++++++++++++++++.> ; ' '
+++++++++++++++++++++++++++++++++++++++++++.> ; '+'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; '>'
++++++++++++++++++++++++++++++++++++++++++++.> ; ','
++++++++++++++++++++++++++++++++++++++++++++++.> ; '.'
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.> ; ';'
$ bf -c test.bf
test +>,.;
其中-c
参数是表示启用注释(comment)功能,因为标准的 BF 语言规范里并没有注释。
OK,完美~
咦?我好像不知不觉间给 BF 实现了单行注释的功能……啊哈哈哈
可能很多人会觉得这很无聊,觉得作者太闲了,做这些有什么用呢?我并不是为了实用性而写的,也不能说是完全为了学习而写,其实做这些更像是一种娱乐吧。这有点像小孩子玩积木一样,全心全意地把自己的注意力放在积木上面。日常的生活中,其实有很多的烦恼和苦闷,而当我沉浸在计算机世界里的时候,能够让自己的心沉浸在这小小的天地中,能够体会到宁静和满足,就足够了。
后来,参考网上更多例子重构了内部结构,优化了内存使用。
给 BF 解释器增加了打印内部状态的调试(debug)功能:#
,使用-d
参数启用:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20// Print the program's internal state for debugging.
case '#':
if (enable_debug)
{
printf("\n");
for (int i = memory_ptr - 8; i < memory_ptr + 8; i++)
{
if (i < 0)
{
printf("-- ");
}
else
{
printf("%02hhX ", memory[i]);
}
}
printf("\n%*s\n", 8 * 3 + 2, "^^");
printf("%*X\n", 8 * 3 + 2, memory_ptr);
}
break;
实现了运行时动态内存增长,这样就真的符合标准图灵机模型了:一条无限长的纸带——内存有多大,这条纸带就能有多长。
发布了正式版并进入了微软 winget 官方仓库,Windows 10 及以上版本直接命令行中输入winget install ChenQingYu.BF
就可以安装并使用了。
项目已开源在 GitHub: bf。