shell 统计词频脚本

程序员天才技术 2022年11月16日

0 收藏 936 点赞 4,595 浏览 524 个字

#!/bin/bash
if [ $# -ne 1 ];
then
        echo "Usage:$0 filename";
        exit -1
fifilename=$1
egrep -o "\b[[:alpha:]]+\b" $filename | awk '{count[$0]++}END{printf("%-14s%s\n","Word","Count");for(ind in count){printf("%-14s%d\n",ind,count[ind]);}}'

　　这里注意两点

egrep 和grep的区别：egrep 支持的正则更全一点

The symbol \b
matches the empty string at the edge of a word 匹配一个单词边界的空字符串

\< \>

The symbols \< and \> respectively match the empty string at the beginning and end of a word. 匹配单词的开头或者结尾空串

%-14s – 表示左对齐 14 表示字符串宽度为14

[:alpha:] 表示正则匹配 相当于 a-z A-Z 详见：http://www.cnblogs.com/zhuyp1015/archive/2012/07/01/2572289.html

单词字符串宽度正则边界

程序员天才

贡献者

上一篇： Day05:装饰器，三元表达式，函数的递归，匿名/内置函数，迭代器，模块，开发目录

下一篇：阿里云k8s部署zookeeper集群

相关推荐

python开发_常用的python模块及安装方法

adodb：我们领导推荐的数据库连接组件bsddb3：BerkeleyDB的连接组件Cheetah-1.0：我比较喜欢这个版本的cheeta…

程序员润宾技术

日期：2022-11-24 点赞：878 阅读：9,083

Educational Codeforces Round 11 C. Hard Process 二分

C. Hard Process题目连接：http://www.codeforces.com/contest/660/problem/CDes…

程序员春广技术

日期：2022-11-24 点赞：807 阅读：5,558

下载Ubuntn 17.04 内核源代码

zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…

程序员峰军技术

日期：2022-11-24 点赞：569 阅读：6,407

可用Active Desktop Calendar V7.86 注册码序列号

可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…

程序员天赐技术

日期：2022-11-24 点赞：733 阅读：6,180

Android调用系统相机、自定义相机、处理大图片

Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式，并且由于涉及到要把拍到的照片显…

程序员爱鹏技术

日期：2022-11-24 点赞：512 阅读：7,817

Struts的使用

一、Struts2的获取　　Struts的官方网站为：http://struts.apache.org/　　下载完Struts2的jar包,…

程序员红卫技术

日期：2022-11-24 点赞：671 阅读：4,900

个人收藏笔记记录

开通VIP