编译原理与设计-Lab3-词法分析实验

使用ANTLR自动生成词法分析器

实验目的和内容

根据 C 语言的词法规则,设计识别 C 语言所有单词类的词法分析器的DFA。并使用 Java,利用ANTLR自动生成词法分析器。。词法分析器的输入为 C 语言源程序,输出为属性字流。

实验过程与方法

实现词法分析的功能,首先需要了解实验提供的编译器的使用方法和实现架构,实验提供的BITMiniCC 是以Java为实现语言实现的C语言编译器,此次实现的词法分析功能属于编译器的前端功能词法分析一般是以预处理的结果为基础进行的,但是由于BIT-MINICC框架的预处理部分存在缺陷,所以本次实验中跳过了预处理部分。

1 框架

  1. 从 github 下载 BIT-MINICC 框架,导入IntelliJ或Eclipse;

  2. test/scanner_example.c中,输入提供的testFile.c作为测试源代码;更改config.xml

    1
    2
    3
    4
    - <phase skip="false" type="java" path="" name="preprocess" />
    - <phase skip="false" type="java" path="" name="scan" />
    + <phase skip="true" type="java" path="" name="preprocess" />
    + <phase skip="false" type="java" path="bit.minisys.minicc.scanner.AWScanner" name="scan" />
  3. lib下已有antlr包,省去导入的过程

2 编写语法

按照C11标准编写C.g4文件。参考ANTLR官方的C Grammar语法规范文件

  • 开首grammar C; C必须与文件名C.g4一致
  • 然后是一些形如规则名 : 分支1 | ... | 分支N ;的规则,一段文本匹配规则相当于它匹配其中一个分支,分支中可用单引号包围要按字面匹配的字符串、用子规则名表示按子规则匹配,另外还可用一些类似正则表达式的记号如?*+|()
    • 语法规则名以小写字母开始,不同语法规则生成解析树不同类型的结点,各分支说明这种结点的子结点序列可以是什么样子
    • 词法规则名以大写字母开始,不同词法规则生成不同语类的词,各分支说明这类词可以是什么样子
    • 以fragment 开首的规则可以被词法规则引用以便重用代码,但本身不会生成对解析器可见的词

3 生成解析器

  1. 在intelliJ中下载ANTLR v4 grammar plugin

    ANTLR v4 grammar plugin
  2. 添加C.g4语法文件,在列表中右键点击该语法文件,选择生成ANTLR,configure选项可以编辑生成文件的输出地址。

    generate ANTLR Recognizer
  3. gen.bit.minisys.minicc.scanner目录下,得到生成的解析器。其中分词器CLexer(以下仅给出部分)。其最后

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
// Generated from /Users/apple/Downloads/lab3/BIT-MiniCC-master/src/bit/minisys/minicc/scanner/C.g4 by ANTLR 4.8
package gen.bit.minisys.minicc.scanner;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.Token;
import org.antlr.v4.runtime.TokenStream;
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.atn.*;
import org.antlr.v4.runtime.dfa.DFA;
import org.antlr.v4.runtime.misc.*;

@SuppressWarnings({"all", "warnings", "unchecked", "unused", "cast"})
public class CLexer extends Lexer {
static { RuntimeMetaData.checkVersion("4.8", RuntimeMetaData.VERSION); }

protected static final DFA[] _decisionToDFA;
protected static final PredictionContextCache _sharedContextCache =
new PredictionContextCache();
public static final int
T__0=1, T__1=2, T__2=3, T__3=4, T__4=5, T__5=6, T__6=7, T__7=8, T__8=9,
T__9=10, T__10=11, T__11=12, T__12=13, T__13=14, Auto=15, Break=16, Case=17,
Char=18, Const=19, Continue=20, Default=21, Do=22, Double=23, Else=24,
Enum=25, Extern=26, Float=27, For=28, Goto=29, If=30, Inline=31, Int=32,
Long=33, Register=34, Restrict=35, Return=36, Short=37, Signed=38, Sizeof=39,
Static=40, Struct=41, Switch=42, Typedef=43, Union=44, Unsigned=45, Void=46,
Volatile=47, While=48...;
public static String[] channelNames = {
"DEFAULT_TOKEN_CHANNEL", "HIDDEN"
};

public static String[] modeNames = {
"DEFAULT_MODE"
};

private static String[] makeRuleNames() {
return new String[] {
"T__0", "T__1", "T__2", "T__3", "T__4", "T__5", "T__6", "T__7", "T__8",
"T__9", "T__10", "T__11", "T__12", "T__13", "Auto", "Break", "Case",
"Char", "Const", "Continue", "Default", "Do", "Double", "Else", "Enum",
"Extern", "Float", "For", "Goto", "If", "Inline", "Int", "Long", "Register",
"Restrict", "Return", "Short", "Signed", "Sizeof", "Static", "Struct",
"Switch", "Typedef", "Union", "Unsigned", "Void", "Volatile", "While",
"Alignas", "Alignof", "Atomic", "Bool"...
};
}
public static final String[] ruleNames = makeRuleNames();

private static String[] makeLiteralNames() {
return new String[] {
null, "'__extension__'", "'__builtin_va_arg'", "'__builtin_offsetof'",
"'__m128'", "'__m128d'", "'__m128i'", "'__typeof__'", "'__inline__'",
"'__stdcall'", "'__declspec'", "'__asm'", "'__attribute__'", "'__asm__'",
"'__volatile__'", "'auto'", "'break'", "'case'", "'char'", "'const'",
"'continue'", "'default'", "'do'", "'double'", "'else'", "'enum'", "'extern'",
"'float'", "'for'", "'goto'", "'if'", "'inline'", "'int'", "'long'",
"'register'", "'restrict'", "'return'", "'short'", "'signed'", "'sizeof'",
"'static'", "'struct'", "'switch'", "'typedef'", "'union'", "'unsigned'",
"'void'", "'volatile'", "'while'", "'_Alignas'", "'_Alignof'", "'_Atomic'",
"'_Bool'", "'_Complex'", "'_Generic'", "'_Imaginary'", "'_Noreturn'",
"'_Static_assert'", "'_Thread_local'", "'('", "')'", "'['", "']'", "'{'",
"'}'", "'<'", "'<='", "'>'", "'>='", "'<<'", "'>>'", "'+'", "'++'", "'-'",
"'--'", "'*'", "'/'", "'%'", "'&'", "'|'", "'&&'", "'||'", "'^'", "'!'",
"'~'", "'?'", "':'", "';'", "','", "'='", "'*='", "'/='", "'%='", "'+='",
"'-='", "'<<='", "'>>='", "'&='", "'^='", "'|='", "'=='", "'!='", "'->'",
"'.'", "'...'"
};
}
private static final String[] _LITERAL_NAMES = makeLiteralNames();
private static String[] makeSymbolicNames() {
return new String[] {
null, null, null, null, null, null, null, null, null, null, null, null,
null, null, null, "Auto", "Break", "Case", "Char", "Const", "Continue",
"Default", "Do", "Double", "Else", "Enum", "Extern", "Float", "For",
"Goto", "If", "Inline", "Int", "Long", "Register", "Restrict", "Return",
"Short", "Signed", "Sizeof", "Static", "Struct", "Switch", "Typedef",
"Union", "Unsigned", "Void", "Volatile", "While"...
};
}
private static final String[] _SYMBOLIC_NAMES = makeSymbolicNames();
public static final Vocabulary VOCABULARY = new VocabularyImpl(_LITERAL_NAMES, _SYMBOLIC_NAMES);

/**
* @deprecated Use {@link #VOCABULARY} instead.
*/

@Deprecated
public static final String[] tokenNames;
static {
tokenNames = new String[_SYMBOLIC_NAMES.length];
for (int i = 0; i < tokenNames.length; i++) {
tokenNames[i] = VOCABULARY.getLiteralName(i);
if (tokenNames[i] == null) {
tokenNames[i] = VOCABULARY.getSymbolicName(i);
}

if (tokenNames[i] == null) {
tokenNames[i] = "<INVALID>";
}
}
}

@Override
@Deprecated
public String[] getTokenNames() {
return tokenNames;
}

@Override

public Vocabulary getVocabulary() {
return VOCABULARY;
}


public CLexer(CharStream input) {
super(input);
_interp = new LexerATNSimulator(this,_ATN,_decisionToDFA,_sharedContextCache);
}

@Override
public String getGrammarFileName() { return "C.g4"; }

@Override
public String[] getRuleNames() { return ruleNames; }

@Override
public String getSerializedATN() { return _serializedATN; }

@Override
public String[] getChannelNames() { return channelNames; }

@Override
public String[] getModeNames() { return modeNames; }

@Override
public ATN getATN() { return _ATN; }

public static final String _serializedATN =
public static final ATN _ATN =
new ATNDeserializer().deserialize(_serializedATN.toCharArray());
static {
_decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
for (int i = 0; i < _ATN.getNumberOfDecisions(); i++) {
_decisionToDFA[i] = new DFA(_ATN.getDecisionState(i), i);
}
}
}

4 调用词法解析器

src/bit/minisys/minicc/scanner目录下,新建AWScanner.java

  • AWScannerIMiniCCScanner的一种实现

  • 输入输出文件和字符流(使用system.out打印的内容将全部输出到文件outputFile

    1
    2
    3
    4
    5
    6
    7
    8
    // 传入函数的是预处理过后的文件名
    String outputFileName = MiniCCUtil.removeAllExt(fileName) + MiniCCCfg.MINICC_SCANNER_OUTPUT_EXT;
    // 输出文件
    File outputFile = new File(outputFileName);
    outputFile.createNewFile();
    FileOutputStream outFileOutputStream = new FileOutputStream(outputFile);
    PrintStream localPrintStream = new PrintStream(outFileOutputStream);
    System.setOut(localPrintStream);
  • 调用Clexer词法分析器

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    CharStream stream = CharStreams.fromFileName(fileName, Charset.defaultCharset());
    CLexer lexer = new CLexer(stream);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    tokens.fill();
    for (Token tok : tokens.getTokens()) {
    if ( tok instanceof CommonToken) {
    System.out.println(((CommonToken)tok).toString(lexer));
    }
    else {
    System.out.println(tok.toString());
    }
    }
  • 输出文件

运行效果

1 testFile运行结果

Screenshot2020-03-1603.13.59

实验心得体会

  • 本次实验中加深对 C 语言的词法规则的了解,了解了编译器词法分析器的主要功能和实现技术
  • 进一步了解 Flex 工作原理和基本思想,学习如何使用 Antlr 自动生成词法分析器
  • 本次实验中初步了解了到词法分析模块与其他模块之间的交互过程

title:编译原理与设计-Lab3-词法分析实验

author:Anne416wu

link:https://www.annewqx.top/posts/62214/

publish time:2020-03-15

update time:2022-07-25


评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×