PERL SCRIPTS TO CREATE A GERMAN - JAPANESE- GERMAN DICTIONARY FOR THE SHARP ZAURUS created by Kurt Fischer, 25.10.2001 Source data by Ulrich Apel, Wadoku-jiten http://isweb9.infoseek.co.jp/computer/wadoku/download.html ---------------------------- Introduction: My perl script "wadoku.pl" creates a Japanese-German dictionary for the Zaurus. The source data are taken form the wadoku-jiten of Ulrich Apel. I am very grateful to Ulrich for allowing me to distribute these perl scripts. My other script "dokuwa.pl" creates a German-Japanese dictionary, from the same source data. If you prefer to have only lowercase german entries, then run after the "dokuwa.pl" the script "lowercase.pl". The putput will be "dokuwa-lowercase.txt". You can use the dictionary program "pdic" to view the dictionaries on the PC. You may download it at http://member.nifty.ne.jp/TaN/ . There is up to today no Linux/Unix version of pdic, but it seems to be in statu nascendii: http://www.snaga.org/linux/ Its companion for the Zaurus "zpdview" you can download at http://hp.vector.co.jp/authors/VA004474/zaurus/more.html You can search the wadoku on the Zaurus for entries with - Kanji alone - mixed Kanji and Hiragana - Hiragana alone. I tried to find the best trade-off between search speed, size of the source data, and user friendliness. With the existing zpdview Zaurus software, you can only search the catchwords. However, given the small screen of the Zaurus, my experience is that this suffices in all practical circumstances. The point is that while a grep-like search produces much more answers then a search restriced to catch words, it is likely to produce data garbage. The dokuwa jiten is unique, in that the possible translations are not ordered hierarchically, but with its respective synonyms: If for example you enter the german catchword "formlos", you will find the direkt translation entry 略式の ( りゃくしきの) as well as the derived ones amorph :: 無定形の ( むていけいの) / immateriell; gestaltlos; stofflos; koerperlos; abstrakt; bildlich; geistig :: 無形の ( むけいの) / laessig; vermischt; verschieden; allerlei; mannigfaltig; vielerei; ungenau; unkorrekt; grob; bequem; fluechtig :: 雑な ( ざつな) / unhoeflich; unfreundlich; kurz angebunden; barsch; schroff; rauh; grob; unmanierlich :: 失敬な ( しっけいな) This makes it in practice easy to judge the correct translation from the context. ---------------- The scripts uses as source data the japanese-german dictionary by Ulrich Apel, in the form of a file "wadoku.csv" . This has been exported with the help of the runtime version of filemaker. There is no possibility to enter normal-size Umlauts in the Zaurus. (There is a way around but this is too cumbersome, in practice). Therefore the wadoku.csv includes the columns "Kanji", "Lesung" and "umgerechnete Umlaute" . To run the perl scripts, you need to have Perl on your computer. Most Linux distributions have Perl. You can download a Windows version at http://www.activestate.com/Products/ActivePerl/download.plex Tun start them, just double click them. The output is a text file named "wadoku.txt" . The algorithm itself creates a temporary file named temporary.txt . It is of roughly the same size as the endproduct, dokuwa.txt. ---- For the german-japanese dictionary, the output is a text file named "dokuwa.txt" . The algorithm itself creates 3 temporary files named temporary1.txt, temporary2.txt, and temporary3.txt. They are of roughly the same size as the endproduct, dokuwa.txt, that is, about twice the size of wadoku.csv . ---------------------------- Conversion into pdic and zpdview readable files: ---------------------------- Use dokuwa.txt for german-japanese and und wadoku.txt for japanese-german. The files are already ordered, so the sorting in will be fast. Start pdic and choose in the menu point Tools - 辞書の変換 the following options: 転送元形式: 一行テキスト形式 転送先形式: PDIC 形式 Load the files with 参照, e.g. wadoku.txt for the 一行 , and choose wadoku.dic for the pdic file to be created. In the bottom menu choose 付け加える(区切り文字付き) -------------------- Then move on to 詳細 and choose 一行テキスト形式の区切: /// i.e. with a space at the left and the right of the three slashes. In the left menu 訳 / 用列部について choose only the first point 訳 / 用列を区切りして取り込む and at the bottom choose 区切り文字 / i.e. a space to the left and the right of the slash. In the right menu 出力に関して choose the first 格納しきれない場合のみ圧縮 and the third point 用列を圧縮 -------------------------------------------- In the submenu 登録項目 choose point 1,4 and 5. OK this menu as well as 詳細。 -------------------------------------------- OK once more. Viel Spass damit, ENDLICH Deutsch auf dem Zaurus ins Japanische uebertragen zu koennen! Kurt Fischer