Thursday, October 9, 2008

openFile

I've checked in a new portion of org.icongarden.fileSystem called openFile. (Actually, I checked this in a few days ago and I just haven't gotten around to writing about it yet. I really wish I had more time to work on this project!) This is the most elaborate bit of interface I've done for Counterpart so far.

openFile is a function which takes one or two arguments. The first argument is an array in the style of other functions in fileSystem, with the exception that the last member of the array is the name of a file. The second argument is an object whose properties describe how the file is to be be opened. If the second argument is not present, it's as if the caller passed an empty object.

The read property is a boolean which specifies whether the caller wants to be able to read from the file once it is open. If it is absent, the assumption is that the caller does want to read from the file. Why on earth would anyone want to open a file and not read from it? Well, it really comes down to a matter of not trusting yourself. If you know your intent is to write some data to a file and close it, you can specify at the outset that you want it to be impossible for you to read from the file. This is nice to have in a large program or one that gets shuffled around a lot during development.

The write property is a boolean which specifies whether the caller wants to be able to write to the file once it is open. If this property is absent, the assumption is that the caller does not want to write to the file. It should be little more obvious why you would want to prevent yourself from writing to a file you've opened; since a mistake could destroy data, it's nice to have a way to ensure such a mistake is impossible.

If the caller attempts to open a file for neither writing nor reading, an exception is thrown.

The create property specifies whether a file which does not already exist should be created. That's right: one creates and opens a file in a single step. If you only want to create a file, you must endure having opened it at the same time. In combination with some additional properties, this helps make race conditions less likely when you're using the file system as a communications medium. If the create property is absent and the file does not yet exist, openFile throws an exception. If it is present, it may take several forms.

If the create property is a boolean, then it merely specifies whether the file should be created as described above. If the create property is an object, and the object is empty, it has the same effect as a boolean true.

If the create object contains a must property, this property is a boolean which specifies whether openFile must successfully create the file. In other words, if the file already exists, openFile will throw an exception. This is useful when you are using the file as a signifier for exclusive access to a directory or in some other communications scheme that requires atomic exclusion.

If the create object contains a readOnly property, this specifies that subsequent attempts to open the file for writing will fail. (On UNIX, this corresponds roughly to chmod u-r.) The readOnly property has no effect on the the present call to openFile.

So let's suppose you've called openFile and it didn't throw an exception; what now? openFile has returned to its caller an object, and this object has several properties.

The simplest one is encoding. This is the name of the character encoding to use with this file. You can change encoding to any of the names supported by your local implementation of iconv. (I'll get around to making a list of common names eventually. For now, the only encoding actually supported is UTF-8, which is also the default.) When reading from or writing to the file, characters are automagically converted between this encoding and the JavaScript encoding, which is UTF-16. You can change encoding at any time, but you do need to be sure that the file you are reading or writing is encoded accordingly. (Generally, a given file will have just one encoding.)

The byteCount property is a number which specifies how many bytes are contained in the file. If you set byteCount to a smaller value, the file will get shorter, and if you set it to a larger value, the file will get longer (and zeroes will be written into any new portion). However, it can be difficult to set the correct value because many encodings have characters which may be more than one byte, and unless your code has exhaustive knowledge of the contents of the file and the byte-width of each character the file contains, it's usually impossible to know how big to make the file. There are two cases in which it's easier. First, you can always set byteCount to zero to remove all bytes from the file. And second, if you read or write the file and then remember what the value of byteCount was, you can set it back to that value later. (You can also set byteCount to the value of byteOffset, which is the next property discussed.)

The byteOffset property specifies where in the file the next read or write operation will begin. The first offset in the file is zero. You can set byteOffset to any value you like — even one beyond the end of the file — but you must exercise the same care you would with byteCount as described above. One additional safe and simple operation is to set byteOffset to byteCount. This means the next write to the file will occur at its end.

Finally we come to the properties which act on the file. Unsurprisingly, they are both functions, read and write. So far, I've only implemented write; I'll describe read once it's implemented. One tricky aspect of these functions is that if the caller does not specify it wants to be able to write the file when opening it, there is no write property. Likewise, if the caller doesn't specify it wants to be able to read the file when opening it, there is no read property, though this will probably be less common. So, if and when it's inappropriate to perform an operation on a file, it's not just that you can't perform it; you can't even try to perform it. Daniel convinced me that was too clever by half because the failure mode involves tearing down the entire script unrecoverably. I had thought this was a feature, but he convinced me it was a bug. So now read and write are both exposed unconditionally, and if they are called inappropriately, they throw an exception.

The write function writes characters to the file in the file's current encoding. The characters result from converting the single argument to write into a string. The value of byteOffset will advance by the number of bytes occupied by the characters after they have been converted into the file's encoding. The value of byteCount will advance by an appropriate amount if the write operation would have extended beyond the end of the file.

The file will automagically be closed when its object becomes unreferenced and the garbage collector gets around to destroying it. I'm still working on getting that to happen reliably. I'll also add a close function which allows you to explicitly close the file sooner than it would otherwise.

No comments: