|Lemon Home Page |

ソースコードの部品化

一般にソースコード部品というとライブラリなど粒度の大きい部品を指すが，ここでは C++ などのオブジェクト指向プログラミング言語で書かれたプログラムから抽出したクラス情報や構成情報のことをいうこととする．すなわち，クラスを単位とした小さい粒度の部品であり，オブジェクト指向ソフトウェア部品の中でももっとも基本となるものである．なお，本研究では部品の単位はクラスとし，基本的にオブジェクト単位では扱わない．

ソースコードをクラス単位で部品することにより，粒度の大きなライブラリ部品（フレームワークなど）やアプリケーション内でのクラスの役割の理解支援を目指す，このため，他のクラス部品やライブラリ部品，抽象部品との連係を積極的に支援する．

以下では，次の項目でソースコード部品の自動生成機構と Makefile の生成支援機構について説明する．

Automatic Source Code Components Registration
部品の形態
C++ での例
Distributed Automatic Construction of Makefile

Automatic Source Code Components Registration

既存のソースコードからクラス情報やクラス間の関係情報などを抽出する機能を提供することが望ましい．

クラスを基盤とする言語に於いて，抽出の単位はクラスでありその関係やインターフェースのアクセス権などが抽出されるべき情報となる．具体的には次のようになる．

クラスのアクセス権(言語による)
Name
Interface information Member functions, member variables and access specifiers (private, protected, public)
Dependencies Inheritance, whole-parts and reference relationships

部品の形態

多様な形態の部品との連結を考えるため，ハイパーテキストの形態で保存を行う．ノードは上述の情報をクラス単位で含み，依存関係に従って相互に連結されていなければならない．アクセス権に従ったインターフェースの見せ方の違いを表現する必要がある．ユーザが閲覧などに利用するための出力形態は HTML で公開インターフェースと非公開インターフェースの２つの見え方を提供する．公開インターフェースには一般に public のような形式で指定されたものを非公開インターフェースにはそれ以外の例えば protected や private と指定されたものをいう．

C++ での例

実装は，C++ で行なった．

Extracting type/class information is further divided into two processes: partitioning information of a class into nodes and grasping relationship among nodes. C++ is not a pure object-oriented language, and contains various features not for object-oriented programming in order to keep compatibility with C. Here we deal with information concerning classes only, and ignore such as global functions or data outside of classes.

The following three kinds of information are extracted for partitioning a class into nodes:

Class (including struct and union) declarations,
Other type declarations (typedef, enum),
Declarations parameterized as templates.

Typedef and enum declarations are extracted to avoid name conflicts among types, because a class is a kind of type in C++. In the current implementation, signature and name space declarations are ignored.

The topmost part of C++ grammar description in YACC to analyze and extract these information is as follows:

program:
    /* empty */
    extdefs

extdefs:
    extdefs extdef

extdef:
    fndef
    datadef
    template_def
    ......

This means that a program is a series of definitions, and a definition is either a function definition, a data definition, template definition, or else.

A type declaration is in a typed declaration specifier which is a part of a data definition, and a class declaration is in a struct declaration in a type declaration. The following information of a class is extracted out of these:

Name,
Interface information
(Member functions, member variables and access specifiers (private, protected, public)),
Dependencies
(Inheritance, whole-parts and reference relationships).

Correspondence between a C++ source code and nodes

A grammar description concerning this is as follows:

structspec:
    ......
    class_head { opt.component_decl_list }
    ......

class_head:
    ......
    aggr identifier
    aggr identifier : base_class_list
    ......

aggr:
    class
    struct
    union

base_class_list:
    base_class
    base_class_list , base_class

base_class:
    typename
    access_list typename

access_list:
    ......
    visspec another_specs
    ......

opt.component_decl_list:
    /* empty */
    component_decl_list
    opt.component_decl_list visspec :
              component_decl_list
    opt.component_decl_list visspec :

visspec:
    private
    protected
    public

A class name ( identifier) and its super class name (


base_class_list

), as well as super class access authorities are in the first part, class_head of the class declaration. Interface declarations, functions and variables, are in component_decl_list , and references to other types are in function and variable declarations. Variables can be either of an aggregate relation or a reference relation, however they are not distinguished. HTML files are created based on the extracted information. Only public super classes and members are stored in a public interface node file, and protected and private super classes and members are stored in an implementation node file. Correspondence between a source code and nodes reflecting the code is shown in Figure fig:C++vsNodes. The HTML files contain links corresponding to dependencies of the class in the form of anchor tags using Common Gateway Interface (CGI). The form is not URL itself in order to make location management easy when the node files are relocated. CGI is an interface mechanism to call an external procedure from HTML files. A CGI procedure and an argument class ID are placed at the entry of a link, and the procedure resolves location of the class by looking up the cmpd table.

依存関係の連結

リンクには部品の識別子を用いる． HTML の出力では，部品の識別子を用いてデータベースからソースコード部品を呼び出し，公開インターフェースの HTML 出力に変換するような CGI を該当位置に埋め込むものとする．

Distributed Automatic Construction of Makefile

分散環境で分散資源から一つのアプリケーションとして構築するには，分散計算という手段を用いる手段もあるが，ここでは依存関係のある部品を全て手元に集めて来て構築するということを考える．このような場合には依存関係に従って部品を集めて来て，それが正しく構築されるよう自動的に配置してくれることが望ましい．従来のアプリケーション開発で用いられているツールとして make というツールがある．これはプログラマが Makefile というファイルに依存関係を記述しておくと，その更新日付などから構築・再構築を自動的に行ってくれるツールで，主に UNIX の上で用いられている． On application construction, a compiler must acquire dependency information among class files. There is a UNIX tool named "make" for this. Dependency information is described in a file called Makefile, and "make" compiles the source files according to the content of the makefile. However, a programmer must understand the dependencies among the files completely to write a makefile. このツールでは依存関係をプログラマが記述する必要があるが，分散環境でのアプリケーション構築では，従来よりさらに記述が困難になる．上のソースコード部品の情報は部品間の依存関係を保持するので，ここから依存情報の抽出は簡単に行える．しかし，言語によってはインターフェース記述と実装記述が分離しているものがあるため，具体的にはクラス単位にそのようなソースコードの依存を別途部品作成者に与えてもらい，クラスの依存関係と併せて最終的なアプリケーション全体の Makefile を構築することを考える． Our system maintains dependency information in the form of links, therefore it generates an appropriate makefile automatically. A user only need to provide supplementary information such as relations to external libraries. A centralized makefile, i.e. a makefile for a whole application, would degrade distributedness of the system OMake. Thus the system creates a dependency information node for each class, and on application construction, the system collects them by tracing links to generate a makefile automatically. The detail of the procedure is as follows. We have an assumption that all the nodes are homogeneous, and can share the same machine codes. This is reasonable because we discuss distributed cooperative development of one application.

Identify necessary classes
The system identifies classes necessary to compile by tracing links. It starts tracing from {\tt Main()} when compiling a whole application, and from a class when compiling this only.
Collect classes distributed on network
Machine codes, headers and supplementary information of the identified classes are collected from node files. If a class has not be compiled yet and does not have a machine code, its source code is fetched and compiled on the collector site.

This procedure is repeated recursively until all the necessary classes are identified and collected.

For example, suppose that ClassA depends on ClassB and ClassC, and ClassB depends on ClassD. The declaration part of ClassA is in the file ClassA.h, and the implementation part is in ClassA.cc and etc.cc. このようなソースコードの情報は例えば以下のように記述される．（C++ の場合）

DeclarationClassA.hDefinitionClassA.cc etc.ccOtherMachineCodelibxxx.a

から，次のように展開して

ClassA.o: ClassA.cc ClassA.declarations
etc.o: etc.cc ClassA.declarations
ClassA.declarations: declaration(in ClassA.make)  ClassB.declarations
                     ClassC.declarations
ClassB.declarations: declaration(in ClassB.make) ClassD.declaration
ClassC.declarations: declaration(in ClassC.make)

Then the following Makefile is created automatically from these.

HEADERS=ClassA.h ClassB.h ClassD.h ClassC.h
all: ClassA.o etc.o
ClassA.o: ClassA.cc $(HEADERS)
    $(CXX) -c ClassA.cc
etc.o: etc.cc $(HEADERS)
    $(CXX) -c etc.cc

あるいは，クラスAを使ったアプリケーション(targer.cc)，

main() {
    ClassA a;
    a.method();
}

の Makefile は，

OBJS=ClassA.o etc.o libxxx.a ClassB.o ClassC.o ClassD.o
all: targetname
targetname: $(OBJS)
    $(CXX) -c target.cc $(OBJS)

の様になる．

この機構は現在開発中であり．まだ詳細な検討が必要である．

Mika Ohtsuki <mika@db.is.kyushu-u.ac.jp>

Last modified: Mon Jan 5 15:53:23 JST 1998