How to Use the Csplit Command in Linux

Wondering How to Use the csplit Command in Linux?

The csplit command in Linux is a utility used to split a file into smaller individual files determined by the contents in the file. The initial file usually remains unaltered. The original file and the smaller files are normally text files.  

Sometimes, the original file is too large, heavy, or too long. This may result in increased execution time of the file. The smaller files created by the csplit command run more easily, run in parallel, and execute faster.

The general syntax used is csplit [OPTION]... FILE PATTERN... In this article, we’ll learn how to split files into smaller, manageable files using the csplit command.

Let’s get started!

Using the csplit Command.

To start using the csplit command, open your Command Terminal by pressing Ctrl + Alt + T. This is where you will type your argument parameters.

You will also need the text file with context lines that will be split into smaller files.

Here are some examples of how the csplit command is used.

Csplit Based on a Specified Number of Lines.

Csplit enables us to split a text file based on the number lines. It allows us to specify how many lines we want on each smaller file.

Here is an example.

We have a sample file test.txt that contains a list with 7 lines.

1 desk
2 tables
3 beds
4 cabinets
5 chairs
6 dressers
7 cupboards

We would like to split the file at the fourth line. This can be done by passing ‘4’ at the command line argument after the file name.

Here is the command we will run on the Command Terminal.

$ csplit test 4

Below are the results after execution. The numbers shown are the byte count for the files that the command has generated.

csplit command in Linux

The files produced are labeled xx00 and xx01 by default. You can view them by running the command cat xx00 and cat xx01 on your Command Terminal. The cat command reads and outputs file contents.

csplit command in Linux

As shown below, we can specify an argument parameter further to split a file at several specified lines. 

$ csplit test 9 62 88 

Here, we are assuming the initial file has 100 lines. The csplit command will generate four files given the command above. The xx00 file will contain lines 1-8, the xx01 file will contain lines 9-61, the xx02 file will contain lines 62-87, and the xx03 file will contain lines 88-100.

Csplit Based on Regex Match.

Csplit allows us to split a file by matching a string with a regular expression. It performs the split where there is a match.

We will split the file test.txt based on the string starting with <version>. This means every block between the opening tag <version> and the closing tag </version> will be in a separate file. 

Here is a sample of the content of the file.

<version>
    <name>Product 1</name>
    <type>2021</type>
</version>
<version>
    <name>Product 2</name>
    <type>2020</type>
</version>
<version>
    <name>Product 3</name>
    <type>2020</type>
</version>

Here is the csplit command we will use to split test.txt.

$ csplit test '/^<version>$/' '{*}'

The interpretation of the argument parameter is shown below.

test   –  this is the input file
/^<version>$/   –  `<version>` is pattern match for every line
{*}   – this means the previous pattern will be repeated until the lines are exhausted.

Here is the output, where we see the red box shows the byte count produced. We can also see the three files produced with their content, as shown by the green boxes.

csplit command in Linux



Csplit Based on Pattern Match.

A text file can have a pattern that appears several times in the content. Csplit can match the pattern on the file with the command parameter then perform a split. 

Now, we want to split the file test.txt file where the pattern is ‘0000’:

678910
qwer
asdf
0000
klklkl
12345
0000

We will use this argument parameter:

$ csplit test '/0000/+1' {*}

Here is the interpretation of the parameter above:

{*} – this means the argument will be repeated until the input is exhausted.
/0000/+1 – this means the pattern 0000 will be matched, and +1 will be added so that 0000 will appear in the next split file.

Here is a screenshot of the output with the pattern at the bottom of each file:

csplit command in Linux



Add Custom Prefix with Csplit Command.

Instead of using the default xx prefix that csplit produces for the file name, we can use -f  or --prefix command-line options to define a unique prefix.

In the example below, the file name is test, and the new prefix will be sub.

$ csplit test 1 -f sub



Using Csplit to Keep Files Produced in Error.

By default, the csplit command removes files as soon as an error is encountered. If we don’t want the files to be deleted, we use -k or --keep-files on the command option.

Here is how it is used:

$ csplit -k test 2



Define the Number of Digits on the File Name Using Csplit Command.

The default number of digits on a file name is 2. If you want to change the number of digits from xx01 to xx0 we use the -n or --digits command options.

Here is the parameter to use on the Command Terminal:

$ csplit -n 1 test 2



Remove Empty Output Files Using Csplit Command.

There are instances when output files are empty after running the csplit command. To remove these empty files, we use -z or --elide-empty-files on the argument parameter.

Here is the syntax:

$ csplit -z test 4

We hope you have learned the different ways of how the csplit command is used. 

There are many more commands used in Linux. If you would like to learn how to schedule repetitive tasks, have a look at How to Use the cron Command in Linux.

We’ve come to the end of our article on How to Use the csplit command in Linux. If you have any questions or suggestions, let us know in the comment section below.

If this guide helped you, please share it. 🙂

Leave a Reply

Your email address will not be published.

Previous Post

How to Exclude with Grep in Linux

Next Post

How to Fix Can’t See Messages on Tinder

Related Posts