下面是关于Debian 系统上可用的格式转化工具及其相关提示的信息。

Standard based tools are in very good shape but support for proprietary data formats are limited.

11.1. 文本数据转换工具

如下是文本数据转换工具。

表 11.1. 文本数据转化工具列表

软件包	流行度	大小	关键词	说明
`libc6`	V:928, I:998	10670	字符集	text encoding converter between locales by iconv(1) (fundamental)
`recode`	V:5, I:36	608	charset+eol	text encoding converter between locales (versatile, more aliases and features)
`konwert`	V:2, I:59	122	字符集	text encoding converter between locales (fancy)
`nkf`	V:1, I:11	346	字符集	character set translator for Japanese
`tcs`	V:0, I:0	544	字符集	character set translator
`unaccent`	V:0, I:0	76	字符集	replace accented letters by their unaccented equivalent
`tofrodos`	V:3, I:36	50	eol	text format converter between DOS and Unix: fromdos(1) and todos(1)
`macutils`	V:0, I:1	320	eol	text format converter between Macintosh and Unix: frommac(1) and tomac(1)

11.1.1. 用 iconv 命令来转换文本文件

	提示
	iconv(1) 是 `libc6` 软件包的一部分并且它可以在类 Unix 的系统上转换字符的编码。

你能够通过如下的命令用 iconv(1) 来转换文本文件的编码。

$ iconv -f encoding1 -t encoding2 input.txt >output.txt

Encoding values are case insensitive and ignore "-" and "_" for matching. Supported encodings can be checked by the "iconv -l" command.

表 11.2. 编码值和用法的列表

编码值	用法
ASCII	美国信息交换标准代码，7位代码不带重音符号
UTF-8	用于所有现代操作系统的多语言标准
ISO-8859-1	旧的西欧语言标准，ASCII + 重音符号
ISO-8859-2	旧的东欧语言标准，ASCII + 重音符号
ISO-8859-15	旧的带有欧元符号的西欧语言标准（ISO-8859-1）
CP850	code page 850，用于西欧语言的微软 DOS 的带有图形的字符，ISO-8859-1 的变体
CP932	code page 932, Microsoft Windows style Shift-JIS variant for Japanese
CP936	code page 936，用于简体中文的微软操作系统风格的 GB2312，GBK 或者 GB18030 的变体
CP949	code page 949，用于韩语的微软操作系统风格的 EUC-KR 或者 Unified Hangul Code 的变体
CP950	code page 950，用于繁体中文的微软操作系统风格的 Big5 的变体
CP1251	code page 1251，用于西里尔字母的微软操作系统风格的编码
CP1252	code page 1252，用于西欧语言的微软操作系统风格的 ISO-8859-15 的变体
KOI8-R	用于西里尔字母的旧俄语 UNIX 标准
ISO-2022-JP	standard encoding for Japanese email which uses only 7 bit codes
eucJP	old Japanese UNIX standard 8 bit code and completely different from Shift-JIS
Shift-JIS	JIS X 0208 Appendix 1 standard for Japanese (see CP932)

	注意
	一些编码只支持数据转换，它不能作为语言环境的值 (第 8.3.1 节 “编码的基础知识”)。

像 ASCII 和 ISO-8859 这样适用于单字节的字符集，字符编码和字符集几乎指的是同一件事情。

For character sets with many characters such as JIS X 0213 for Japanese or Universal Character Set (UCS, Unicode, ISO-10646-1) for practically all languages, there are many encoding schemes to fit them into the sequence of the byte data.

EUC and ISO/IEC 2022 (also known as JIS X 0202) for Japanese
Unicode 的 UTF-8、UTF-16/UCS-2 和 UTF-32/UCS-4 编码

对于以上这些，字符集和字符编码之间有着明显的区别。

The code page is used as the synonym to the character encoding tables for some vendor specific ones.

注意

Please note most encoding systems share the same code with ASCII for the 7 bit characters. But there are some exceptions. If you are converting old Japanese C programs and URLs data from the casually-called shift-JIS encoding format to UTF-8 format, use "CP932" as the encoding name instead of "shift-JIS" to get the expected results: 0x5C → "\" and 0x7E → "~". Otherwise, these are converted to wrong characters.

	提示
	recode(1) 也可能被使用并且不仅仅是 iconv(1)，fromdos(1)，todos(1)，frommac(1) 和 tomac(1) 功能的结合。想要获得更多信息，请参见 "`info recode`"。

11.1.2. 用 iconv 检查文件是不是 UTF-8 编码

你能够通过如下命令用 iconv(1) 来检查一个文本文件是不是用 UTF-8 编码的。

$ iconv -f utf8 -t utf8 input.txt >/dev/null || echo "non-UTF-8 found"

	提示
	在上面的例子中使用 "`--verbose`" 参数来找到第一个 non-UTF-8 字符。

11.1.3. Converting file names with iconv

Here is an example script to convert encoding of file names from ones created under older OS to modern UTF-8 ones in a single directory.

#!/bin/sh
ENCDN=iso-8859-1
for x in *;
 do
 mv "$x" "$(echo "$x" | iconv -f $ENCDN -t utf-8)"
done

The "$ENCDN" variable specifies the original encoding used for file names under older OS as in 表 11.2 “编码值和用法的列表”.

For more complicated case, please mount a filesystem (e.g. a partition on a disk drive) containing such file names with proper encoding as the mount(8) option (see 第 8.3.6 节 “文件名编码”) and copy its entire contents to another filesystem mounted as UTF-8 with "cp -a" command.

11.1.4. EOL conversion

The text file format, specifically the end-of-line (EOL) code, is dependent on the platform.

表 11.3. List of EOL styles for different platforms

platform	EOL code	control	decimal	hexadecimal
Debian (unix)	LF	`^J`	10	0A
MSDOS and Windows	CR-LF	`^M^J`	13 10	0D 0A
Apple's Macintosh	CR	`^M`	13	0D

The EOL format conversion programs, fromdos(1), todos(1), frommac(1), and tomac(1), are quite handy. recode(1) is also useful.

	注意
	Some data on the Debian system, such as the wiki page data for the `python-moinmoin` package, use MSDOS style CR-LF as the EOL code. So the above rule is just a general rule.

	注意
	Most editors (eg. `vim`, `emacs`, `gedit`, …) can handle files in MSDOS style EOL transparently.

	提示
	The use of "`sed -e '/\r$/!s/$/\r/'`" instead of todos(1) is better when you want to unify the EOL style to the MSDOS style from the mixed MSDOS and Unix style. (e.g., after merging 2 MSDOS style files with diff3(1).) This is because `todos` adds CR to all lines.

11.1.5. TAB 转换

这里有一些转换 TAB 代码的专业工具。

表 11.4. bsdmainutils 和 coreutils 包中的用于转换 TAB 的命令列表

功能	`bsdmainutils`	`coreutils`
把制表符扩展成空格	"`col -x`"	`expand`
不把空格扩展成制表符	"`col -h`"	`unexpand`

indent 包中的 indent(1) 命令能够重新格式化 C 程序中的空格。

例如 vim 和 emacs 这样的编辑软件可以被用来扩展 TAB。就拿 vim 来说，你能够按顺序输入 ":set expandtab" 和 ":%retab" 命令来扩展 TAB。你也可以按顺序输入 :%set noexpandtab" 和 ":%retab" 命令来复原。

11.1.6. 带有自动转换功能的编辑器

像 vim 这样的现代智能编辑器软件是相当聪明的并且能够处理任何编码系统以及任何文件格式。你应该在支持 UTF-8 编码的控制台上并在 UTF-8 环境下使用这些编辑器来获得最好的兼容性。

以 latin1（iso-8859-1）编码存储的旧西欧语言的 Unix 文本文件，“u-file.txt”，能通过如下所示的用 vim 轻易的编辑。

$ vim u-file.txt

这是可能的因为 vim 的文件编码自动检测机制先假定文件是 UTF-8 编码，如果失败了，则假定它是 latin1 编码。

以 latin2（iso-8859-2）编码存储的旧波兰语的 Unix 文本文件，“pu-file.txt”，能通过如下所示的用 vim 编辑。

$ vim '+e ++enc=latin2 pu-file.txt'

An old Japanese unix text file, "ju-file.txt", stored in the eucJP encoding can be edited with vim by the following.

$ vim '+e ++enc=eucJP ju-file.txt'

An old Japanese MS-Windows text file, "jw-file.txt", stored in the so called shift-JIS encoding (more precisely: CP932) can be edited with vim by the following.

$ vim '+e ++enc=CP932 ++ff=dos jw-file.txt'

When a file is opened with "++enc" and "++ff" options, ":w" in the Vim command line stores it in the original format and overwrite the original file. You can also specify the saving format and the file name in the Vim command line, e.g., ":w ++enc=utf8 new.txt".

Please refer to the mbyte.txt "multi-byte text support" in vim on-line help and 表 11.2 “编码值和用法的列表” for locale values used with "++enc".

The emacs family of programs can perform the equivalent functions.

11.1.7. Plain text extraction

The following reads a web page into a text file. This is very useful when copying configurations off the Web or applying basic Unix text tools such as grep(1) on the web page.

$ w3m -dump http://www.remote-site.com/help-info.html >textfile

同样，你可以使用如下所示的工具从其他格式提取纯文本数据。

表 11.5. 用于提取纯文本数据的工具列表

软件包	流行度	大小	关键词	功能
`w3m`	V:275, I:835	2292	html→text	用 "`w3m -dump`" 命令把 HTML 转化为文本的转换器
`html2text`	V:28, I:85	229	html→text	advanced HTML to text converter (ISO 8859-1)
`lynx`	V:37, I:107	1901	html→text	用 "`lynx -dump`" 命令把 HTML 转化为文本的转化器
`elinks`	V:18, I:34	1587	html→text	用 "`elinks -dump`" 命令把 HTML 转化为文本的转换器
`links`	V:21, I:47	2135	html→text	用 "`links -dump`" 命令把 HTML 转化为文本的转换器
`links2`	V:3, I:18	5403	html→text	用 "`links2 -dump`" 命令把 HTML 转化为文本的转换器
`antiword`	V:7, I:15	614	MSWord→text,ps	转化 MSWord 文件到纯文本或 ps 文件
`catdoc`	V:24, I:38	666	MSWord→text,TeX	转化 MSWord 文件到纯文本或 TeX文件
`pstotext`	V:4, I:6	127	ps/pdf→text	extract text from PostScript and PDF files
`unhtml`	V:0, I:0	66	html→text	从一个 HTML 文件里面删除标记标签
`odt2txt`	V:3, I:6	53	odt→text	从开放文档格式到文本格式的转化器

11.1.8. 高亮并格式化纯文本数据

你可以通过如下所示的来高亮并格式化纯文本数据。

表 11.6. 高亮纯文本数据的工具列表

软件包	流行度	大小	关键词	说明
`vim-runtime`	V:20, I:431	27567	高亮	Vim MACRO to convert source code to HTML with "`:source $VIMRUNTIME/syntax/html.vim`"
`cxref`	V:0, I:0	1157	c→html	从 C 程序到 latex 和 HTML 的转换器（C语言）
`src2tex`	V:0, I:0	612	高亮	转换许多源代码到 TeX（C语言）
`source-highlight`	V:1, I:7	2008	高亮	convert many source codes to HTML, XHTML, LaTeX, Texinfo, ANSI color escape sequences and DocBook files with highlight (C++)
`highlight`	V:1, I:16	943	高亮	转化许多源代码到带有高亮显示的 HTML, XHTML, RTF, LaTeX, TeX or XSL-FO 文件。(C++)
`grc`	V:0, I:2	60	text→color	generic colouriser for everything (Python)
`txt2html`	V:0, I:4	296	text→html	文本到 HTML 转换器（Perl）
`markdown`	V:0, I:6	56	text→html	markdown text document formatter to (X)HTML (Perl)
`asciidoc`	V:1, I:14	2442	text→any	AsciiDoc text document formatter to XML/HTML (Python)
`pandoc`	V:3, I:23	69422	text→any	general markup converter (Haskell)
`python-docutils`	V:35, I:554	1653	text→any	ReStructured Text document formatter to XML (Python)
`txt2tags`	V:0, I:1	951	text→any	document conversion from text to HTML, SGML, LaTeX, man page, MoinMoin, Magic Point and PageMaker (Python)
`udo`	V:0, I:0	548	text→any	universal document - text processing utility (C language)
`stx2any`	V:0, I:0	264	text→any	document converter from structured plain text to other formats (m4)
`rest2web`	V:0, I:0	526	text→html	document converter from ReStructured Text to html (Python)
`aft`	V:0, I:0	235	text→any	"free form" document preparation system (Perl)
`yodl`	V:0, I:0	522	text→any	pre-document language and tools to process it (C language)
`sdf`	V:0, I:0	1445	text→any	simple document parser (Perl)
`sisu`	V:0, I:0	5338	text→any	document structuring, publishing and search framework (Ruby)

11.2. XML 数据

The Extensible Markup Language (XML) is a markup language for documents containing structured information.

See introductory information at XML.COM.

11.2.1. Basic hints for XML

XML text looks somewhat like HTML. It enables us to manage multiple formats of output for a document. One easy XML system is the docbook-xsl package, which is used here.

Each XML file starts with standard XML declaration as the following.

<?xml version="1.0" encoding="UTF-8"?>

The basic syntax for one XML element is marked up as the following.

<name attribute="value">content</name>

XML element with empty content is marked up in the following short form.

<name attribute="value"/>

The "attribute="value"" in the above examples are optional.

The comment section in XML is marked up as the following.

<!-- comment -->

Other than adding markups, XML requires minor conversion to the content using predefined entities for following characters.

表 11.7. List of predefined entities for XML

predefined entity	character to be converted into
`"`	`"` : quote
`'`	`'` : apostrophe
`<`	`<` : less-than
`>`	`>` : greater-than
`&`	`&` : ampersand

	小心
	"`<`" or "`&`" can not be used in attributes or elements.

	注意
	When SGML style user defined entities, e.g. "`&some-tag:`", are used, the first definition wins over others. The entity definition is expressed in "`<!ENTITY some-tag "entity value">`".

	注意
	As long as the XML markup are done consistently with certain set of the tag name (either some data as content or attribute value), conversion to another XML is trivial task using Extensible Stylesheet Language Transformations (XSLT).

11.2.2. XML processing

There are many tools available to process XML files such as the Extensible Stylesheet Language (XSL).

Basically, once you create well formed XML file, you can convert it to any format using Extensible Stylesheet Language Transformations (XSLT).

The Extensible Stylesheet Language for Formatting Objects (XSL-FO) is supposed to be solution for formatting. The fop package is new to the Debian main archive due to its dependence to the Java programing language. So the LaTeX code is usually generated from XML using XSLT and the LaTeX system is used to create printable file such as DVI, PostScript, and PDF.

表 11.8. List of XML tools

软件包	流行度	大小	关键词	说明
`docbook-xml`	I:533	2131	xml	XML document type definition (DTD) for DocBook
`xsltproc`	V:14, I:123	148	xslt	XSLT command line processor (XML→ XML, HTML, plain text, etc.)
`docbook-xsl`	V:15, I:233	14998	xml/xslt	XSL stylesheets for processing DocBook XML to various output formats with XSLT
`xmlto`	V:3, I:37	121	xml/xslt	XML-to-any converter with XSLT
`dbtoepub`	V:0, I:1	71	xml/xslt	DocBook XML to .epub converter
`dblatex`	V:5, I:25	4639	xml/xslt	convert Docbook files to DVI, PostScript, PDF documents with XSLT
`fop`	V:3, I:53	64	xml/xsl-fo	convert Docbook XML files to PDF

Since XML is subset of Standard Generalized Markup Language (SGML), it can be processed by the extensive tools available for SGML, such as Document Style Semantics and Specification Language (DSSSL).

表 11.9. List of DSSSL tools

软件包	流行度	大小	关键词	说明
`openjade`	V:3, I:34	921	dsssl	ISO/IEC 10179:1996 standard DSSSL processor (latest)
`openjade1.3`	V:0, I:0	2199	dsssl	ISO/IEC 10179:1996 standard DSSSL processor (1.3.x series)
`jade`	V:0, I:12	825	dsssl	James Clark's original DSSSL processor (1.2.x series)
`docbook-dsssl`	V:2, I:39	2604	xml/dsssl	DSSSL stylesheets for processing DocBook XML to various output formats with DSSSL
`docbook-utils`	V:2, I:26	281	xml/dsssl	utilities for DocBook files including conversion to other formats (HTML, RTF, PS, man, PDF) with `docbook2*` commands with DSSSL
`sgml2x`	V:0, I:0	90	SGML/dsssl	converter from SGML and XML using DSSSL stylesheets

	提示
	GNOME's `yelp` is sometimes handy to read DocBook XML files directly since it renders decently on X.

11.2.3. The XML data extraction

You can extract HTML or XML data from other formats using followings.

表 11.10. List of XML data extraction tools

软件包	流行度	大小	关键词	说明
`wv`	V:6, I:9	713	MSWord→any	document converter from Microsoft Word to HTML, LaTeX, etc.
`texi2html`	V:0, I:11	1832	texi→html	converter from Texinfo to HTML
`man2html`	V:0, I:3	133	manpage→html	converter from manpage to HTML (CGI support)
`tex4ht`	V:1, I:24	36	tex↔html	converter between (La)TeX and HTML
`unrtf`	V:2, I:4	137	rtf→html	document converter from RTF to HTML, etc
`info2www`	V:3, I:4	156	info→html	converter from GNU info to HTML (CGI support)
`ooo2dbk`	V:0, I:1	217	sxw→xml	converter from OpenOffice.org SXW documents to DocBook XML
`wp2x`	V:0, I:0	215	WordPerfect→any	WordPerfect 5.0 and 5.1 files to TeX, LaTeX, troff, GML and HTML
`doclifter`	V:0, I:0	457	troff→xml	converter from troff to DocBook XML

For non-XML HTML files, you can convert them to XHTML which is an instance of well formed XML. XHTML can be processed by XML tools.

表 11.11. List of XML pretty print tools

软件包	流行度	大小	关键词	说明
`libxml2-utils`	V:25, I:322	177	xml↔html↔xhtml	command line XML tool with xmllint(1) (syntax check, reformat, lint, …)
`tidy`	V:2, I:17	83	xml↔html↔xhtml	HTML syntax checker and reformatter

Once proper XML is generated, you can use XSLT technology to extract data based on the mark-up context etc.

11.3. 排版

Unix上的 troff 程序最初是由 AT&T 公司开发的，可以被用做简单排版。现在被用来创建手册页。

Donald Knuth 发明的 Tex 是非常强大的排版工具也是实际上的标准。最初是由 Leslie Lamport 开发的 LaTex 使得用户可以更为方便的利用 Tex 的强大功能。

表 11.12. 排版工具的列表

软件包	流行度	大小	关键词	说明
`texlive`	V:5, I:69	68	(La)TeX	用于排版、预览和打印的 TeX 系统
`groff`	V:8, I:173	9685	troff	GNU troff 文本格式化系统

11.3.1. roff 排版

传统意义上，roff 是 Unix 上主要的文本处理系统。参见 roff(7), groff(7), groff(1), grotty(1), troff(1), groff_mdoc(7), groff_man(7), groff_ms(7), groff_me(7), groff_mm(7) 和 "info groff"。

You can read or print a good tutorial and reference on "-me" macro in "/usr/share/doc/groff/" by installing the groff package.

	提示
	"`groff -Tascii -me -`" produces plain text output with ANSI escape code. If you wish to get manpage like output with many "^H" and "_", use "`GROFF_NO_SGR=1 groff -Tascii -me -`" instead.

	提示
	To remove "^H" and "_" from a text file generated by `groff`, filter it by "`col -b -x`".

11.3.2. TeX/LaTeX

The TeX Live software distribution offers a complete TeX system. The texlive metapackage provides a decent selection of the TeX Live packages which should suffice for the most common tasks.

这里有许多可用的 TeX 和 LaTeX 的参考资料。

The teTeX HOWTO: The Linux-teTeX Local Guide
tex(1)
latex(1)
texdoc(1)
texdoctk(1)
"The TeXbook", 作者 Donald E. Knuth, (Addison-Wesley)
"LaTeX - A Document Preparation System", 作者 Leslie Lamport, (Addison-Wesley)
"The LaTeX Companion", 作者 Goossens, Mittelbach, Samarin, (Addison-Wesley)

This is the most powerful typesetting environment. Many SGML processors use this as their back end text processor. Lyx provided by the lyx package and GNU TeXmacs provided by the texmacs package offer nice WYSIWYG editing environment for LaTeX while many use Emacs and Vim as the choice for the source editor.

有许多在线资源存在。

TEX Live Guide - TEX Live 2007 ("/usr/share/doc/texlive-doc-base/english/texlive-en/live.html") (texlive-doc-base 包)
Latex/Lyx 的一个简单指引
使用 LaTeX 进行文字处理
teTeX/LaTeX 的本地用户指引

When documents become bigger, sometimes TeX may cause errors. You must increase pool size in "/etc/texmf/texmf.cnf" (or more appropriately edit "/etc/texmf/texmf.d/95NonPath" and run update-texmf(8)) to fix this.

注意

The TeX source of "The TeXbook" is available at http://tug.ctan.org/tex-archive/systems/knuth/dist/tex/texbook.tex. This file contains most of the required macros. I heard that you can process this document with tex(1) after commenting lines 7 to 10 and adding "\input manmac \proofmodefalse". It's strongly recommended to buy this book (and all other books from Donald E. Knuth) instead of using the online version but the source is a great example of TeX input!

11.3.3. 漂亮的打印手册页

你能够用如下任意一个命令在打印机上漂亮的打印手册页。

$ man -Tps some_manpage | lpr

$ man -Tps some_manpage | mpage -2 | lpr

The second example prints 2 pages on one sheet.

11.3.4. 创建手册页

尽管用纯 troff 格式写手册页（manpage）是可能的，这里还是有一些辅助的程序包用于创建手册页。

表 11.13. 创建手册页的工具列表

软件包	流行度	大小	关键词	说明
`docbook-to-man`	V:1, I:17	179	SGML→man 手册页	converter from DocBook SGML into roff man macros
`help2man`	V:0, I:9	454	text→man 手册页	通过 --help 参数自动生成手册页的工具
`info2man`	V:0, I:0	134	info→man 手册页	转换 GNU info 到 POD 或手册页的转换器
`txt2man`	V:0, I:1	65	text→man 手册页	把纯粹的 ASCII 文本转化为手册页格式

11.4. 可印刷的数据

Printable data is expressed in the PostScript format on the Debian system. Common Unix Printing System (CUPS) uses Ghostscript as its rasterizer backend program for non-PostScript printers.

11.4.1. Ghostscript

处理可印刷的数据的核心是 Ghostscript PostScript 解释器，它能够生成光栅图像。

The latest upstream Ghostscript from Artifex was re-licensed from AFPL to GPL and merged all the latest ESP version changes such as CUPS related ones at 8.60 release as unified release.

表 11.14. Ghostscript PostScript 解释器列表

软件包	流行度	大小	说明
`ghostscript`	V:160, I:691	224	GPL Ghostscript PostScript/PDF 解释器
`ghostscript-x`	V:32, I:77	210	GPL Ghostscript PostScript/PDF 解释器-X 显示支持
`libpoppler64`	V:19, I:53	3214	PDF rendering library forked from the xpdf PDF viewer
`libpoppler-glib8`	V:239, I:526	435	PDF 渲染库（基于 Glib 的共享库)
`poppler-data`	V:103, I:669	12123	CMaps for PDF rendering library (for CJK support: Adobe-*)

	提示
	"`gs -h`" 能够显示 Ghostscript 的配置信息。

11.4.2. 合并两个 PS 或 PDF 文件

你能够使用 Ghostscript 中的 gs(1) 来合并两个 PostScript(PS) 或可移植文档格式（PDF）文件。

$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pswrite -sOutputFile=bla.ps -f foo1.ps foo2.ps
$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=bla.pdf -f foo1.pdf foo2.pdf

	注意
	PDF 是用途很广的跨平台可印刷的数据格式，它本质上是带有一些额外特性和扩展的压缩了的 PS 格式。

	提示
	对于命令行来说，psmerge(1) 和 `psutils` 包中的其他命令在处理 PostScript 文档时是很有用的。`pdftk` 包中的 pdftk(1) 在处理 PDF 文档的时候同样是很好用的。

11.4.3. 处理可印刷数据的工具

如下是处理可印刷数据的工具列表。

表 11.15. 处理可印刷数据的工具列表

软件包	流行度	大小	关键词	说明
`poppler-utils`	V:52, I:492	526	pdf→ps,text,…	PDF 工具：`pdftops`, `pdfinfo`, `pdfimages`, `pdftotext`, `pdffonts`
`psutils`	V:12, I:221	219	ps→ps	PostScript 文件转换工具
`poster`	V:0, I:8	49	ps→ps	create large posters out of PostScript pages
`enscript`	V:3, I:28	2111	text→ps, html, rtf	转化 ASCII 文本到 PostScript, HTML, RTF 或 Pretty-Print
`a2ps`	V:2, I:31	3624	text→ps	'Anything to PostScript' converter and pretty-printer
`pdftk`	V:9, I:56	2959	pdf→pdf	PDF 文档转换工具：`pdftk`
`mpage`	V:0, I:5	141	text,ps→ps	print multiple pages per sheet
`html2ps`	V:0, I:6	320	html→ps	从 HTML 到 PostScript 的转换器
`gnuhtml2latex`	V:0, I:1	53	html→latex	从 html 到 latex 的转换器
`latex2rtf`	V:0, I:7	438	latex→rtf	转换 LaTeX 文档到能被 Microsoft Word 读取的 RTF 格式的文档
`ps2eps`	V:8, I:114	94	ps→eps	从 PostScript 到 EPS（Encapsulated PostScript）的转换器
`e2ps`	V:0, I:0	112	text→ps	Text to PostScript converter with Japanese encoding support
`impose+`	V:0, I:1	180	ps→ps	PostScript 工具
`trueprint`	V:0, I:0	138	text→ps	漂亮的打印许多源程序（C, C++, Java, Pascal, Perl, Pike, Sh, 和 Verilog）到 PostScript。(C 语言)
`pdf2svg`	V:0, I:5	50	ps→svg	PDF 到可升级的向量图形格式的转换器
`pdftoipe`	V:0, I:0	63	ps→ipe	从 PDF 到 IPE‘s XML 格式的转换器

11.4.4. 用 CUPS 打印

Unix 通用打印系统（CUPS）中的 lp(1) 和 lpr(1) 命令都提供了自定义打印数据的选项。

你可以使用下列命令中的一个来打印 3 份有装订页码的文件。

$ lp -n 3 -o Collate=True filename

$ lpr -#3 -o Collate=True filename

你能够通过 "-o number-up=2", "-o page-set=even", "-o page-set=odd", "-o scaling=200", "-o natural-scaling=200" 等等打印机选项来进一步定制打印机操作，详细的文档参见命令行打印和选项。

11.5. 邮件数据转换

下列邮件数据转换软件包捕获了我的眼球。

表 11.16. 有助于邮件数据转换的软件包列表

软件包	流行度	大小	关键词	说明
`sharutils`	V:9, I:123	1352	邮件	shar(1), unshar(1), uuencode(1), uudecode(1)
`mpack`	V:2, I:26	91	MIME	编码和解码 MIME 信息: mpack(1) 和 munpack(1)
`tnef`	V:7, I:11	98	ms-tnef	解包 MIME 附件类型 "application/ms-tnef"，该格式仅由微软使用
`uudeview`	V:0, I:6	97	邮件	下列格式的编码器和解码器: uuencode, xxencode, BASE64, quoted printable 和 BinHex
`readpst`	I:1	21	PST	转化微软的 Outlook PST 文件到 mbox 格式

	提示
	如果邮件客户端可以配置使用 IMAP4 服务器，互联网消息访问协议版本 4 (IMAP4) 服务器 (参见第 6.7 节 “POP3/IMAP4 服务器”) 可以用来把邮件从专有邮件系统里面移出来。

11.5.1. 邮件数据基础

邮件 (SMTP) 数据需要被限制为 7 位数据序列。二进制数据和 8 位文本数据使用 Multipurpose Internet Mail Extensions (MIME) 互联网多用途邮件扩展和选择的字符集编码到 7 位格式。(参见第 8.3.1 节 “编码的基础知识”).

标准的邮件存储格式是 mbox，它是依据 RFC2822 (由 RFC822 更新) 来的格式. 参见 mbox(5) (由 mutt 软件包提供).

对于欧洲语言,由于没有什么 8 位字符，"Content-Transfer-Encoding: quoted-printable" 加 ISO-8859-1 字符集通常被用于邮件。如果欧洲文本是被编码为 UTF-8,由于几乎全是 7 位数据，使用 "Content-Transfer-Encoding: quoted-printable" 也是合适的。

对于日语，传统的 "Content-Type: text/plain; charset=ISO-2022-JP" 通常被用于邮件来保持文本在 7 位。但是老的微软系统会在没有声明的情况下使用 Shift-JIS 来发送邮件。如果日语文本是用 UTF-8 编码, 由于含有许多 8 位数据，使用 Base64 是合适的。其它亚洲语言也是类似情形。

	注意
	如果你的非 Unix 邮件数据可以通过一个具备和 IMAP4 服务通讯的非 Debian 客户端访问，你可以通过运行你的 IMAP4 服务来将邮件数据移出。(参见第 6.7 节 “POP3/IMAP4 服务器”).

	注意
	如果你使用其它邮件存储格式，第一步把它们移动到 mbox 格式比较好。像 mutt(1) 这样多功能的客户端程序可以便捷的完成这类操作。

你可以使用 procmail(1) 和 formail(1) 把邮箱内容分开成每一封邮件.

每一封邮件能够使用来自 mpack 软件包的 munpack(1) 命令（或其它特异的工具）来获得 MIME 编码内容。

11.6. 图形数据工具

如下是关于图形数据转换、编辑和管理的工具包。

表 11.17. 图形数据工具列表

软件包	流行度	大小	关键词	说明
`gimp`	V:97, I:509	16255	图形(位图）	GNU 图形处理程序
`imagemagick`	V:154, I:544	191	图形(位图）	图形处理程序
`graphicsmagick`	V:7, I:14	4820	图形(位图）	图像处理程序（`imagemagick`派生出来的)
`xsane`	V:24, I:193	913	图形(位图）	GTK+-based X11 frontend for SANE (Scanner Access Now Easy)
`netpbm`	V:32, I:547	4230	图形(位图）	图形界面的转换工具
`icoutils`	V:8, I:72	192	png↔ico(bitmap)	convert MS Windows icons and cursors to and from PNG formats (favicon.ico)
`scribus`	V:14, I:28	19136	ps/pdf/SVG/…	Scribus DTP 编辑器
`libreoffice-draw`	V:344, I:479	8995	图形（矢量）	LibreOffice 办公套件-绘画
`inkscape`	V:145, I:360	102751	图形（矢量）	SVG（可升级矢量图形)编辑器
`dia-gnome`	V:6, I:11	20	图形（矢量）	图表编辑器（GNOME）
`dia`	V:25, I:41	3880	图形（矢量）	图表编辑器（Gtk）
`xfig`	V:13, I:19	1783	图形（矢量）	Facility for Interactive Generation of figures under X11
`pstoedit`	V:15, I:358	667	ps/pdf→image(矢量)	PostScript 和 PDF 文件到可编辑的矢量图形的转换器（SVG）
`libwmf-bin`	V:14, I:365	104	Windows/image(vector)	Windows metafile (vector graphic data) conversion tools
`fig2sxd`	V:0, I:0	142	fig→sxd(vector)	转换 XFig 文件为 OpenOffice.org 绘画格式
`unpaper`	V:2, I:15	447	image→image	post-processing tool for scanned pages for OCR
`tesseract-ocr`	V:4, I:27	558	image→text	基于惠普的商业 OCR 引擎的免费 OCR 软件
`tesseract-ocr-eng`	I:28	37486	image→text	OCR engine data: tesseract-ocr language files for English text
`gocr`	V:2, I:25	494	image→text	免费 OCR 软件
`ocrad`	V:1, I:7	310	image→text	免费 OCR 软件
`eog`	V:101, I:337	10581	image(Exif)	Eye of GNOME 图像浏览程序
`gthumb`	V:15, I:27	3238	image(Exif)	图像浏览器（GNOME）
`geeqie`	V:17, I:25	1535	image(Exif)	基于 GTK+ 的图像浏览器
`shotwell`	V:17, I:140	5754	image(Exif)	数码相片管理器（GNOME）
`gtkam`	V:0, I:7	965	image(Exif)	application for retrieving media from digital cameras (GTK+)
`gphoto2`	V:1, I:14	969	image(Exif)	gphoto2 软件是命令行方式的管理数码相机的工具
`gwenview`	V:33, I:104	4508	image(Exif)	图片浏览器（KDE）
`kamera`	V:4, I:103	230	image(Exif)	KDE 上的支持数码相机的应用软件
`digikam`	V:3, I:17	1760	image(Exif)	用于 KDE 桌面环境的数字照片管理应用
`exiv2`	V:5, I:77	242	image(Exif)	EXIF/IPTC 元数据处理工具
`exiftran`	V:2, I:26	67	image(Exif)	transform digital camera jpeg images
`jhead`	V:1, I:13	105	image(Exif)	manipulate the non-image part of Exif compliant JPEG (digital camera photo) files
`exif`	V:1, I:10	370	image(Exif)	显示 JPEG 文件中的 EXIF 信息的命令行工具
`exiftags`	V:0, I:3	205	image(Exif)	utility to read Exif tags from a digital camera JPEG file
`exifprobe`	V:0, I:3	482	image(Exif)	从数码图片中读取元数据
`dcraw`	V:3, I:25	358	image(Raw)→ppm	decode raw digital camera images
`findimagedupes`	V:0, I:1	79	image→fingerprint	找到相似或重复的图像
`ale`	V:0, I:0	766	image→image	merge images to increase fidelity or create mosaics
`imageindex`	V:0, I:0	144	image(Exif)→html	generate static HTML galleries from images
`outguess`	V:0, I:0	217	jpeg,png	universal Steganographic tool
`librecad`	V:12, I:18	7762	DXF	CAD 数据编辑器（KDE）
`blender`	V:4, I:29	101399	blend, TIFF, VRML, …	3D content editor for animation etc
`mm3d`	V:0, I:0	4668	ms3d, obj, dxf, …	基于 OpenGL 的 3D 模型编辑器
`open-font-design-toolkit`	I:0	28	ttf, ps, …	metapackage for open font design
`fontforge`	V:1, I:10	91	ttf, ps, …	用于 PS，TrueType 和 OpenType 的字体编辑器
`xgridfit`	V:0, I:0	898	ttf	program for gridfitting and hinting TrueType fonts

	提示
	在 aptitude(8) （参考第 2.2.6 节 “aptitude 搜索方式选项”）中用正则表达式 "`~Gworks-with::image`" 来查找更多的图像工具。

虽然像 gimp(1) 这样的图形界面程序是非常强大的，但像 imagemagick(1) 这样的命令行工具在用脚本自动化处理图像时是很有用的。

实际上的数码相机的图像是可交换的图像文件格式(EXIF)，这种格式是在 JPEG 图像文件格式上添加一些元数据标签。它能够保存诸如日期、时间和相机设置的信息。

The Lempel-Ziv-Welch (LZW)无损数据压缩专利已经过期了。使用 LZW 压缩方式的图形交互格式（GIF）工具现在可以在 Debian 系统上自由使用了。

	提示
	任何带有可移动记录介质的数码相机或扫描仪都可以在 Linux 上通过 USB 存储读取器来工作，因为它遵循相机文件系统设计规则并且使用 FAT 文件系统，参考第 10.1.7 节 “可移动存储设备”。

11.7. 不同种类的数据转换工具

这里有许多其他用于数据转换的工具。在 aptitude(8)（参考第 2.2.6 节 “aptitude 搜索方式选项”）里用正则表达式 "~Guse::converting"" 来查找如下的软件包。

表 11.18. 不同种类的数据转换工具列表

软件包	流行度	大小	关键词	说明
`alien`	V:5, I:67	166	rpm/tgz→deb	把外来的软件包转换为 Debian 软件包
`freepwing`	V:0, I:0	568	EB→EPWING	converter from "Electric Book" (popular in Japan) to a single JIS X 4081 format (a subset of the EPWING V1)
`calibre`	V:7, I:33	49261	any→EPUB	电子书转换器和库管理

你能够通过如下的命令从 RPM 格式的包中提取数据。

$ rpm2cpio file.src.rpm | cpio --extract