Last updated on February 23, 2021 by Dan Nanni
One common task in day-to-day shell scripting jobs is to read data line by line from a file, parse the data, and process it. The input file can be either a regular text file (e.g., logs or config files) where each line contains multiple fields separated by whitespaces, or a CSV file that is formatted with delimiter-separated values in each row. In bash, you can easily read columns from a file and store them into separate variables for further processing. In this tutorial, let me demonstrate with examples how you can write a shell script that reads columns into variables in bash.
Let's consider the following text file (input.txt
) as an example input.
1 sophia 22 'love pie' 2 charlotte 25 'plum cake' 3 elizabeth 19 'monkey baby' 4 sophia 30 'sleeping beauty' 5 avery 29 'woofy' 6 wendy 28 'smarty pants'
To read column data from a file in bash, you can use read
, a built-in command that reads a single line from the standard input or a file descriptor. When combined with for/while
loop, the read
command can read the content of a file line by line, until end-of-file is reached. If each line contains multiple fields, read
can read individual fields and store them into supplied variables. By default, read
recognizes whitespaces as a separator for different fields.
The following while
loop reads four columns in each line of the input file, and store them in four separate variables. If the number of columns in a line is less than the number of supplied variables, the column values are stored in the left most variables, and any extra variables will be assigned an empty value.
while read index name age nickname; do echo "$index : $name, $age, $nickname" done < "input.txt"
cat input.txt | while read index name age nickname; do echo "$index : $name, $age, $nickname" done
If you are working with a CSV file which uses a non-whitespace character (e.g., "," or ";" or "|") as a delimeter for columns, you can easily read the columns using the same read
command. In this case, though, you need to specify the chosen delimeter in IFS
variable. The IFS
variable (short for "Input Field Separator") is a special bash variable that indicates the character or characters that separate fields.
Let's consider the following CSV file (employee.csv
).
John,Doe,500 Mountain Ave.,Riverside, NJ, 08075 Jack,McGinnis,220 Main St.,Philadelphia, PA,09119 "Anne","Hoffman",120 Jefferson St.,Chatham, NJ,08070 Stephen,King,"7452 Terrace Rd",New York,NY, 91234 Dan,Nann,,San Francisco, CA, 00298
The following while
loop can read the CSV file and store columns into the specified variables. As you can see, IFs=,
before the read
command instructs read
to use ,
as a word splitter. Also note that we can check if a particular column is empty or not by using -z
operator.
while IFS=, read first last address city state zipcode; do if [ -z "$address" ]; then echo "$first $last has no address" else echo "$first $last lives at $address" fi done < "employee.csv"
This while
loop will produce the following output.
John Doe lives at 500 Mountain Ave. Jack McGinnis lives at 220 Main St. "Anne" "Hoffman" lives at 120 Jefferson St. Stephen King lives at "7452 Terrace Rd" Dan Nann has no address
A more general and thus more complicated scenario is when you have to read column data from less-structured files such as logs. A typical log file is not as structured as CSV files, and may not use a fixed delimiter character, nor use a fixed number of columns. Let's consider the following snippet of auth.log
as an example.
Feb 21 21:42:51 ubuntu sudo: dan : TTY=pts/1 ; PWD=/home/dan/download/shc ; USER=root ; COMMAND=/usr/bin/apt-get install autotools-dev Feb 21 21:42:51 ubuntu sudo: pam_unix(sudo:session): session opened for user root by (uid=0) Feb 21 21:42:52 ubuntu sudo: pam_unix(sudo:session): session closed for user root Feb 21 22:40:20 ubuntu sudo: alice : TTY=pts/1 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/apt remove nginx Feb 21 22:40:20 ubuntu sudo: pam_unix(sudo:session): session opened for user root by (uid=0) Feb 21 22:40:25 ubuntu sudo: pam_unix(sudo:session): session closed for user root Feb 21 22:41:57 ubuntu sudo: alice : TTY=pts/1 ; PWD=/home/alice ; USER=root ; COMMAND=/bin/cp bin/dummy /usr/bin Feb 21 22:41:57 ubuntu sudo: pam_unix(sudo:session): session opened for user root by (uid=0) Feb 21 22:41:57 ubuntu sudo: pam_unix(sudo:session): session closed for user root Feb 22 10:50:43 ubuntu sudo: dan : TTY=pts/0 ; PWD=/home/dan/abc ; USER=root ; COMMAND=/usr/bin/vi /etc/hosts Feb 22 10:50:43 ubuntu sudo: pam_unix(sudo:session): session opened for user root by (uid=0) Feb 22 10:50:49 ubuntu sudo: pam_unix(sudo:session): session closed for user root Feb 22 10:51:56 ubuntu sudo: dan : TTY=pts/0 ; PWD=/home/dan/abc ; USER=root ; COMMAND=/usr/bin/vi /etc/resolv.conf
In auth.log
, let's say we want to extract the column data that is highlighted in red color. That is, we want to extract user login and the sudo
command run by the user.
In this case, you can use the read
command to read each line in entirety, and then extract necessary column data by using bash's built-in regular expression. The shell script below gets this job done. The necessary regular expression is stored in pattern
, which matches two patterns; one for user login, and the other for a command entered. These two match results can be retrieved from a special bash array variable called BASH_REMATCH
(${BASH_REMATCH[1]$
for the first match, and ${BASH_REMATCH[2]$
for the second match).
while read -r line; do pattern='ubuntu sudo:\s+([^[:space:]]+).*COMMAND=(.*)' if [[ $line =~ $pattern ]]; then echo "${BASH_REMATCH[1]} : ${BASH_REMATCH[2]}" fi done < "auth.log"
This shell script will produce the following output.
dan : /usr/bin/apt-get install autotools-dev alice : /usr/bin/apt remove nginx alice : /bin/cp bin/dummy /usr/bin dan : /usr/bin/vi /etc/hosts dan : /usr/bin/vi /etc/resolv.conf
bash
shell scripting tutorials provided by Xmodulo.This website is made possible by minimal ads and your gracious donation via PayPal or credit card
Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.
Xmodulo © 2021 ‒ About ‒ Write for Us ‒ Feed ‒ Powered by DigitalOcean