Hom's Blog


Bash中替换命令tr

tr最熟知的用法就是echo $var | tr ‘a-z’ “A-Z”这样进行大写化或者反过来小写化了. tr的作用就是字符的替换和删除. 以下是msys里tr的说明:

Usage: tr [OPTION]... SET1 [SET2]

Translate, squeeze, and/or delete characters from standard input,
writing to standard output.

  • -c, –complement first complement SET1
  • -d, –delete delete characters in SET1, do not translate
  • -s, –squeeze-repeats replace sequence of characters with one
  • -t, –truncate-set1 first truncate SET1 to length of SET2
  • –help display this help and exit
  • –version output version information and exit

SETs are specified as strings of characters. Most represent themselves.
Interpreted sequences are:

  • \NNN character with octal value NNN (1 to 3 octal digits)
  • \ backslash
  • \a audible BEL
  • \b backspace
  • \f form feed
  • \n new line
  • \r return
  • \t horizontal tab
  • \v vertical tab
  • CHAR1-CHAR2 all characters from CHAR1 to CHAR2 in ascending order
  • [CHAR1-CHAR2] same as CHAR1-CHAR2, if both SET1 and SET2 use this
  • [CHAR*] in SET2, copies of CHAR until length of SET1
  • [CHAR*REPEAT] REPEAT copies of CHAR, REPEAT octal if starting with 0
  • [:alnum:] all letters and digits
  • [:alpha:] all letters
  • [:blank:] all horizontal whitespace
  • [:cntrl:] all control characters
  • [:digit:] all digits
  • [:graph:] all printable characters, not including space
  • [:lower:] all lower case letters
  • [:print:] all printable characters, including space
  • [:punct:] all punctuation characters
  • [:space:] all horizontal or vertical whitespace
  • [:upper:] all upper case letters
  • [:xdigit:] all hexadecimal digits
  • [=CHAR=] all characters which are equivalent to CHAR

Translation occurs if -d is not given and both SET1 and SET2 appear.
-t may be used only when translating. SET2 is extended to length of
SET1 by repeating its last character as necessary. Excess characters
of SET2 are ignored. Only [:lower:] and [:upper:] are guaranteed to
expand in ascending order; used in SET2 while translating, they may
only be used in pairs to specify case conversion. -s uses SET1 if not
translating nor deleting; else squeezing uses SET2 and occurs after
translation or deletion.

tr的基本作用是从输入(管道,标准/重定向输入,文件)中将SET1的字符替换为相应的SET2字符. 替换准则是两个SET中一一对应的替换.所以最好是两个SET等长,因此tr作用最大就是大小写替换了.因为不能字符串替换的效果,所以使用其实很有限,除了大小写替换,还有就是删除字符了.

选项:

  • tr SET1 SET2 < test.txt : 将SET1中相应位置的字符替换为SET2中的字符.
  • -d SET1: 删除匹配的,此时不需要SET2.例如我要删除\t. tr -d "\t"
  • -s SET1 SET2: 将连续的SET1替换为SET2中对应单个字符.例如将多个空格变为1个: tr -s ' ' ' '
  • -t SET1 SET2: 将SET1长度截断至和SET2一样,即舍弃SET1后面过长的部分,如tr -t 1234 ab将01234变为0ab34
  • -c SET1 SET2: 将SET1以外的字符替换为SET2中最后一个字符.例如tr -c 12 ab将01234变为b12bb.因此SET2更常为单字符.

字符集法则:

  • 自动补全: 当SET1>SET2,重复SET2最后一个字符来补全SET2长度到SET1; 当SET1<SET2时,SET2后面多出的无效.
  • 特殊字符: 支持如\t,\n这些转移, 另外支持\123八进制指定相应字符号.
  • 升序集a-z: 使用-号表示从字符a到z的升序集,同样表达是[a-z]
  • 补长集[c*]: (只适合于SET2) 当两个集合长度不一,[c*]会重复一个char补全到两个集合相等,例如tr 123456 a[b*]c等效于123456 abbbbc
  • 重复次数[c*N]: N是重复次数.例如tr 123456 a[b*3]c等效于123456 abbbcc
  • 预定义集合:
    • [:alnum:] :所有字母字符与数字
    • [:alpha:] :所有字母字符
    • [:blank:] :所有水平空格
    • [:cntrl:] :所有控制字符
    • [:digit:] :所有数字
    • [:graph:] :所有可打印的字符(不包含空格符)
    • [:lower:] :所有小写字母. 例如tr [:lower:] [:upper:]
    • [:print:] :所有可打印的字符(包含空格符)
    • [:punct:] :所有标点字符
    • [:space:] :所有水平与垂直空格符
    • [:upper:] :所有大写字母
    • [:xdigit:] :所有 16 进位制的数字

题外话: declare

另外,一个大小写替换的方法是用声明declare. 顾明思议, 对变量进行声明和限定:

declare -l vardeclare -u var声明了变量内容全是小写或全是大写.但声明后变量的值是不变的,需要再次赋值时才起效.-l和-u选项要在较新的bash才起效,所以建议还是使用tr来进行该操作.

取消这两个声明设置使用+l/+u.

例如:

v="abc"
echo $v # abc
declare -u v
echo $v # abc,声明不改变值
v="abc"
echo $v # ABC,再次赋值时声明起效
declare -l v="ABC"
echo $v # abc, 声明同时赋值马上起效.


◆ 本文地址: http://platinhom.github.io/2015/08/27/trBash/, 转载请注明 ◆

前一篇: Python捕获所有异常
后一篇: 查找文件差异的diff和工具


Contact: Hom / 已阅读()
Source 类别: Coding  标签: Bash