Java: how to split a string into fixed length rows, without “breaking” the words.

The problem we face in this post is to take a long string and split it in several lines of fixed length that will be displayed. We must therefore divide this input string into multiple strings of a pre-defined maximum length but keeping the words entire without breaking them when the end of the line has been reached. In addition we also have to take care of punctuation and avoid to start a new line if the n + 1 character is a period, comma, question mark, etc..
To do this we use a ‘ regular expression and the classes that Java provides for their use:Pattern and Matcher. To prevent words from being truncated when they reach the end of the line, we have to delimit our search pattern with the word boundaries (\b).
Furthermore, we limit the length of the pattern to search to n-1, leaving a slot for a possible punctuation mark.

We create a static method that takes as input parameters the string to be split and the maximum length of the line and returns a list of strings representing the output lines. The method signature will then be the following:

public static List splitString (String msg, int lineSize)

Below the program code and with a test on a sample string:

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CutString {

    public static List splitString(String msg, int lineSize) {
        List res = new ArrayList<>();

        Pattern p = Pattern.compile("\\b.{1," + (lineSize-1) + "}\\b\\W?");
        Matcher m = p.matcher(msg);
        
	while(m.find()) {
                System.out.println(m.group().trim());   // Debug
                res.add(m.group());
        }
        return res;
    }


    public static void main(String[] args) {

        splitString("This is a message that needs to be split over multiple lines because it is too long. The result must be a list of strings with a maximum length provided as input. Will this procedure work? I hope so!",40);
    }

}

In this case the output is the following:

This is a message that needs to be
split over multiple lines because it is
too long. The result must be a list of
strings with a maximum length provided
as input. Will this procedure work? I
hope so!

As we can see, the words start a new line properly, without being broken in case of reaching the maximum size of the line in mid-word. Let’s make some more tests on the edge cases, for example in the case where a punctuation character appears in the last available position of the row and in the case in which that character is the first next to the limit of the line:

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CutString {

    public static List splitString(String msg, int lineSize) {
        List res = new ArrayList();

        Pattern p = Pattern.compile("\\b.{1," + (lineSize-1) + "}\\b\\W?");
        Matcher m = p.matcher(msg);

        while(m.find()) {
                System.out.println(m.group().trim());   // Debug
                res.add(m.group());
        }
        return res;
    }

    public static void main(String[] args) {

        splitString("In this case a special character appears in the last position of the row length!",80);
        System.out.println("----");
        splitString("In this case a special character appears as first position of the brand new line!",80);

    }

}

Here is the result of this new execution:

In this case a special character appears in the last position of the row length!
----
In this case a special character appears as first position of the brand new
line!

As we can see, in the first case, the exclamation mark occupied the last available space in the row, while in the second case, the ‘!’ would have represented the first character of the next line. Instead, properly, the first line is finished with the previous word and the last word and the ‘!’ character ended both in the second line.

2 thoughts on “Java: how to split a string into fixed length rows, without “breaking” the words.

  1. Hi,

    This is very nice example which fits into one of my requirement. But, I want to split the string if underscore ‘_’ appears as well. Could you please suggest what change I need to do?

Leave a Reply

Your email address will not be published. Required fields are marked *