Function-Oriented Java

File functions

Reading a text file
Writing a text file
Write to a binary file
Append to a text file
Tail
WordCount
Read from CSV to an array
Copying a file

Table of contents
Download source

Reading a text file

Most high level programming languages offer some capability to read a file with one line of code. With Java, it can take much more. The usual technique for text files is to utilize the FileReader class, either wrapped or unwrapped by a buffering stream. You use the BufferReader's read() method passing an a char array and make multiple calls to read() before you have captured the whole I/O stream.

It isn't the most terse syntax for simply opening a file, but it does offer a lot of flexibility. For one thing, you have the option of buffering the Reader's input stream -- a definite benefit over a network. Like all Java IO operations, you could read the file into one large buffer or operate on chunks of the file. Reading from chunks, like the char array initialized to 1024 characters in the example code, should be a little more efficient.

For the purpose of this function, however, whether you work with chunks or read the whole thing, is irrelevant, since the aim is to add all chunks into one large StringBuffer, from which is returned the String the function promises. Since the StringBuffer will eventually hold the entire contents of the file, opening a file containing "War and Peace" might not be an optimal way of using this function. However, for smaller files, or for files you will have to deal with en masse (let's say you have to pass the contents to a Swing JTextArea), this function will suffice.

Calling the Function

String text = FileFunctions.readTextFile("C:\\temp\\xmas.html");

Sidebar

As in all our functions, you will have to deal with the possibility of an IOException. The string you pass in may not be the correct path, the file may be opened exclusively by some other process or perhaps your network cable feel out while you were reading it.

You need to deal with the possibility of an Exception with something like this.

try{
	String text = FileFunctions.readTextFile("C:\\temp\\xmas.html");
} catch (IOException ioe){
	System.err.println(ioe);
}

Or, if code brevity is an aim in your work, you can add a "throws IOException" to your method signature.

public static void printFile() throws IOException{
	String text = FileFunctions.readTextFile("C:\\temp\\xmas.html");
}

Why use a BufferedReader?

You might wonder why the accompanying code wraps a BufferedReader around a FileReader. The function would probably work fine either way. However, the BufferedReader allows the function to read from a buffer instead of directly reading from some input stream that the FileReader represents. In some network conditions, this can greatly speed up I/O. From your local hard disk, it probably doesn't make much difference. However, because we don't know if you are loading from a local drive or from a drive mapped to a server halfway around the world, we buffer the FileReader.

Code

public static String readTextFile(String fullPathFilename) throws IOException {
	StringBuffer sb = new StringBuffer(1024);
	BufferedReader reader = new BufferedReader(new FileReader(fullPathFilename));
			
	char[] chars = new char[1024];
	int numRead = 0;
	while( (numRead = reader.read(chars)) > -1){
		sb.append(String.valueOf(chars));	
	}

	reader.close();

	return sb.toString();
}

Writing a text file

If you already have a String in memory, committing it to disk should be a relatively simple operation. We use one of the character stream classes from the java.io package, this time FileWriter, again utilizing a Buffering class. We attempt to write it entirely without breaking it into chunks on the assumption that the files are not huge and that is will not stretch any underlying OS buffer. If that is not the case, it will need re-writing to utilize smaller char arrays.

Calling the function

String text = " It happened in the Seventies, in winter, on the day after St. Nicholas Day.";
FileFunctions.writeTextFile(text, "c:\\temp\\sample.txt");

Function code

public static void writeTextFile(String contents, String fullPathFilename) throws IOException{
	BufferedWriter writer = new BufferedWriter(new FileWriter(fullPathFilename));
	writer.write(contents);
	writer.flush();
	writer.close();	
}

Why use the flush method?

Even though we use the write() method, it is common practice to follow a write with a flush(), assuming that something may be left in the Writer's OutputStream internal buffer. You normally call flush() before closing the Writer.


Write to a binary file

When a non-character file needs to be read or written, you can't call on the Reader or Writer classes. Instead, you move up the inheritance chain and take advantage of FileOutputStream, again buffered up to improve performance, but essentially the same technique as with FileWriter. Here the write() methods accept byte instead of char arrays.

Calling the function

FileFunctions.writeBinaryFile("Hello world".getBytes(),"c:\\temp\\bytes.txt");

You should note that writing binary files is an acceptable way of writing out text, particularly when you don't know the stream you are writing contains only characters.

Function Code

 public static void writeBinaryFile(byte[] contents, String fullPathFilename) throws IOException{
 	BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fullPathFilename));	
 	bos.write(contents);
 	bos.flush();
 	bos.close();
 }

Append to a text file

This is particularly useful if You are trying to write to a log file. You don't want to overwrite the existing contents, but want to add some new contents. It's a pretty straightforward call in Java, exactly like writing to an outputstream that overwrites the file.

The sole difference is that You call the FileOutStream with 2 parameters instead of one.

	new FileOutputStream("c:\\temp\\test.txt",true);
The problem most programmers have with this constructor is remembering whether true overwrites or append.

Calling the Function

FileFunctions.appendToTextFile("item 1","c:\\temp\\testAppendToFile.txt");
FileFunctions.appendToTextFile("item 2","c:\\temp\\testAppendToFile.txt");
FileFunctions.appendToTextFile("item 3","c:\\temp\\testAppendToFile.txt");
A typical issue with this type of function is know when to close the file. In many logging applications, it is a common practice to open the file, write intermittantly to the stream, then close the file when the application closes. This can lead to sharing problems, but it is very efficient. By contrast, this function opens and closes the file with each call. Less efficient, but guarantees not to leave a file open.

Function Code

 public static void writeBinaryFile(byte[] contents, String fullPathFilename) throws IOException{
	BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fullPathFilename));	
 	bos.write(contents);
 	bos.flush();
 	bos.close();
 	
 }

Tail

This version of the unix classic function returns the bottom n characters of a text file. You provide the name of the file and the number of characters you want to see, as well as a String encoding. The String encoding is necessary because this implementation actually reads bytes, not characters. The bytes are converted to a String, but it will return the byte in any of the supported unicode character sets. If Yyou pass null to this parameter, it will default to "latin1".

Calling the function

String result = FileFunctions.tail("c:\\temp\\poincare.txt",1000, "ascii");

Tail is implemented here using a RandomAccessFile, which provides both a skip() and a seek() method for moving directly to a location in the file. If you use an InputStream, you will have to read the file sequentially, which means that to get to the nth byte in the file, you may have to call multiple reads, counting each of them until the nth byte is reached.

Another point about RandomAccessFile is that it contains a readFully() method that gobble ups the entire file in one call -- if you care to attempt it. Of course, this is a daring call if You are accessing network drive at some distance, but for local drives and speedy networks it works well.

Function Code

 public static String tail(String fullPathFilename, int charsToRead, String charSet) throws IOException{
 	if (charSet == null) charSet = "latin1";
 	RandomAccessFile raf = new RandomAccessFile(fullPathFilename, "r");	
 	long posToStart = raf.length() - charsToRead;
 	byte[] bytes = new byte[charsToRead];
	 	
 	raf.seek(posToStart);
 	raf.readFully(bytes);
 	raf.close();
 	return new String( bytes, charSet);
	 	
	 	
 }

WordCount

Counting words in a file is a common requirement that is easy to implement -- if You use the correct tools. Unfortunately, if you are implementing it in Java, it is not trivial. Here we use a FileReader that support reading text files. You can elect to use a read() method that fills a char array or utilize a readline() method.

Reading by lines is not only a convenient way to approach the count, but it also allows a nice, natural chunk of characters to be dealt with. You don't have to read the entire file into one char array (which could be a memory consuming technique if the file is large enough. But readline() also keeps You from having to piece together char arrays to determine if the array ended with a space and started with a space.

In our implementation, we don't actually count words. We could the spaces between them. This seems simple enough, but we need to be wary of files that contain many double or triple spaces between words. This would certainly throw the count off. Here we do count spaces, but only if the space is following by a letter of digit. Note the conditions:

	if (Character.isSpace(chars[i]) &&  Character.isJavaLetterOrDigit(chars[i+1])) ; 

The Character class provides all the identification we need to determine a space or a valid letter or digit.

Calling the function

int wordCount = FileFunctions.countWords("c:\\temp\\wordCount.txt");

Function Code

public static int countWords(String fullPathFilename) throws IOException{
	 	
	BufferedReader reader = new BufferedReader(new FileReader(fullPathFilename));
				
	String line;	
	char[] chars;
	int wordCount = 0;
		
	while( (line = reader.readLine()) != null){
		chars = line.toCharArray();	
		wordCount++;
	
		for (int i = 1 ; i < chars.length - 1; i++){
			if (Character.isSpace(chars[i]) &&  Character.isJavaLetterOrDigit(chars[i+1])) {	
				wordCount++;
			} 
		}	
					
	}
		
	reader.close();
	return wordCount;
	 	
 }
 

Read from CSV to an array

Comma separated value files use a common export format from spreadsheets and databases. They provide a straightforward approach to reading and writing tabular character data. Each line of the text file represents a table row and each common in the row separates columns of data.

This is all wonderful, straightforward stuff. However, when an string atom of data already contains a comma, the scheme falls apart. To deal with this anomaly, many software export features will let you elect something like "delimit with quotes" as an option. Some spreadsheets like Quattro Pro will identify a string with commas and insert quotes around the string.

For the purpose of writing this function, we assume there are no quotes to be dealt with. The first row of flatfile data is parsed on the fly to determine the size of the column array to be returned. The number of rows to be returned can be specified by the caller or the function will chug through all rows in the flatfile, resizing the array when necessary.

Two aspects of the function are noteworthy. First, We use a StringTokenizer object to search for commas and return the chunks between them. Since StringTokenizer assume a one-character delimiter, this function will not work for strings that are surrounded by quotes. We need a more power mechanism like the regular expression support that shipped with JDK 1.4.

The array re-sizing support in Java is also worth a mention. You don't normally run into it a lot in day-to-day Java programming because it is normally wrapped with more convenient classes like the Collection subclasses that handle the re-sizing internally. Of course, each call to the resize functionality allocates a new array and copies the original. Therefore, setting the original array size too small can consume considerable memory if you are working with a large file. Here we set the default array size to 250, hoping it will suffice, but you may need to consider resetting it.

This is also good experience for working with Java collections, since their internal array have a default value that may be too small for your purpose. If you are working with large collections of objects -- even if the size is not initially known -- you might consider using constructors that allow initial sizing of the internal array. For example, you can call ArrayList:

new java.util.ArrayList(1024);

Our implementation of the resize uses a simple call

System.arraycopy (oldArray,0,newArray,0,newSize);

However, with a two-dimensional array like our table data, you run into a unique aspect of Java arrays. Multi-dimensional arrays are implemented and accessed as arrays of arrays. With the array we use, you will see an array called data[][]. The first area represents the rows, the second array the columns. So when you resize the array, you need to make multiple calls to System.arraycopy

	temp = new String[data.length + numToRead][size];
	for (int n=0;n < data.length;n++){
		System.arraycopy(data[n],0,temp[n],0,temp[n].length);
	}
	data = temp;

Each row array in the table needs to be reallocated. You can't do it with one call.

Calling the function

String[][] result = FileFunctions.readCSV("c:\\temp\\book.csv",0);

Function Code

 public static String[][] readCSV(String fullPathFilename, int numToRead) throws IOException{
 	BufferedReader reader = new BufferedReader(new FileReader(fullPathFilename));
	String line;	
	StringTokenizer st ;
	int size = 0;
	int index = 0;
	int pos = 1; //skip the first position.
	String[][] data;
	String[][] temp;
	final int DEFAULT_SIZE=250;
	final String DELIMITER = ",";
	int maxRecords;
		
	//read the first line to get the size of the array
	line = reader.readLine();
	st= new StringTokenizer(line, DELIMITER);
	while (st.hasMoreElements()){
		st.nextElement();
		size++;	
	}	
		
	//size the array, if parameter not set, read the entire file
	//otherwise stop as requested.
	if (numToRead == 0) { 
		numToRead = DEFAULT_SIZE;
		maxRecords = Integer.MAX_VALUE;	
	} else {
		maxRecords = numToRead;
	}	
		
	data = new String[numToRead][size];
		
	//do it again to add to the array
	st= new StringTokenizer(line, DELIMITER);
	while (st.hasMoreElements()){
		data[0][index]= st.nextElement().toString();
		index++;
	}
	index=0;
		
	//now do a bunch..
	while( (line = reader.readLine()) != null && pos < maxRecords ){
		st= new StringTokenizer(line, DELIMITER);
		while (st.hasMoreElements()){
			if (index == data[0].length) break;
			data[pos][index]= st.nextElement().toString();
			index++;
		}
		index=0;
		pos++;
			
		if (pos == data.length - 1){ //size array if needed.
			temp = new String[data.length + numToRead][size];
			for (int n=0;n < data.length;n++){
				System.arraycopy(data[n],0,temp[n],0,temp[n].length);
			}
			data = temp;	
			
		}		
					
	}
		
	//size the array correctly.. it may be too large.
	temp = new String[pos][size];// the size read
	for (int n=0;n < temp.length;n++){
		System.arraycopy(data[n],0,temp[n],0,temp[n].length);
	}
				
	reader.close();
	return temp;
		
	 }

Copying a file

The Copy function accepts two file parameters, source and destination. We use two files here because in most situations a caller will need to create a File object for the purpose of seeing if the file already exists. You can make a call to this function in this way:

Calling the function

	File src = new File("c:\\temp\\book.csv");
	File dest = new File("c:\\temp\\bookCopied.csv");
	if (!dest.exists()){
		FileFunctions.copy(src,dest);
	} else {
		System.out.println("Target file " + dest + " already exists.");	
	}

The call to the copy() function creates two Buffered streams, one for input, one for output. Once again we stat this: as with almost all sequential stream input and output, it is wise to buffer up the stream, particularly if the source of the stream is over a network. A byte buffer holds the contents of read from the source file and is written to the outputstream.

Function Code

 public static void copy(File source, File destination) throws IOException{
	 	
	BufferedInputStream bis = new BufferedInputStream(new FileInputStream(source));
        BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(destination));
	    int numRead;                
        byte[] bytes = new byte[1024];
        while ((numRead = bis.read(bytes)) != -1) {
            bos.write(bytes,0,numRead);
        }
        
        try{
        	bis.close();
        } catch (Exception e){}
        	
        try{
            bos.close();
        } catch (Exception e){}
   }


Copyright (c)2004 Gervase Gallant gervasegallant@yahoo.com