I kept fooling around with a text file the other night, trying to get an
exact character count. The file had 11 lines and I came up with 436
characters, counting them manually. That included whitespace between
the letters and numbers. I ran wc(1) on it to see what it would say:
$ wc -m file
447
That was a difference of 11 more than what I had counted. How come?
Turns out it counts line feeds as characters, too. Okay, how can I get
rid of them to get the same count as I had gotten counting manually?
I can use sed(1) to do it:
$ sed -e ’s/.$//’ file |wc -m
436
Actually, I can do the same without the -e switch on it. It got thrown in
the mix and then I realized I didn’t need it. We usually don’t think of a
whitespace as a character, but it is. To see how many characters there
are minus whitespace, I can use tr(1):
$ tr -d ‘[:blank:]‘ < file | wc -c
373
I know the total number of characters including whitespace and line feeds
is 447. Okay, bc(1) to the rescue:
$ echo “447-373″ | bc
74
If I use space instead of blank it takes out the line feeds too:
$ tr -d ‘[:space:]‘ < file | wc -c
362
You add the original 11 line feeds to that and you’re back at 373.
You can also count the characters minus the line feeds and whitespaces
while editing the file in vim like so:
%s/\S/&/gn
So, the adventures in text processing continue.
Any input on these
methods and any additional ones always welcome and appreciated.
Cheers!
