Loops can be difficult to maintain with the change of states.
A typical loop involves the following:
- set initial state
- check state condition
- do something with the state
- set state for the next operation
- jump to step 2
Let’s see it in action by writing code to read a file with 4KB at a time. 4KB is crucial as a typical NTFS HDD is formatted with 4KB blocks (file allocation size).
public static class FileExtensions
{
/// <summary>
/// Gets 4KB blocks of the file
/// </summary>
/// <param name="filePath">The file path to read file from</param>
/// <returns>4KB or less data in sequence</returns>
/// <remarks>
/// The file is locked until it's completely read or read is stopped.
/// </remarks>
public static IEnumerable<byte[]> Read4KBBlocks(this string filePath)
{
const int bufferSize_4KB = 4 * 1024;
using (var fileStream = File.OpenRead(filePath))
{
var buffer = new byte[bufferSize_4KB];
var bytesRead = fileStream.Read(buffer, 0, buffer.Length);
while (bytesRead > 0)
{
if (bytesRead == buffer.Length)
{
yield return buffer;
}
else
{
yield return buffer.Take(bytesRead).ToArray();
}
bytesRead = fileStream.Read(buffer, 0, buffer.Length);
}
}
}
}
Focus on the highlighted lines: line 18, 29.
Line 18: The initialization state where the first block is read
Line 29: The next state for the loop before jumping to the condition
It works, but the states can be simpler by combining the Read operation to a single line. One option is to use do-while
loop
do-while
loop is typically used with an operation that involves retry, but it might help us improve the loop.
Let’s take a look.
public static IEnumerable<byte[]> Read4KBBlocks(this string filePath)
{
const int bufferSize_4KB = 4 * 1024;
using (var fileStream = File.OpenRead(filePath))
{
var buffer = new byte[bufferSize_4KB];
int bytesRead;
do
{
bytesRead = fileStream.Read(buffer, 0, buffer.Length);
if (bytesRead == 0)
{
break;
}
if (bytesRead == buffer.Length)
{
yield return buffer;
}
else
{
yield return buffer.Take(bytesRead).ToArray();
}
} while (bytesRead > 0);
}
}
Yay, we reduced to a single Read
function. However, this caused some other concerns at Line 8, 12 and 25.
- Line 8: bytesRead is defined outside the loop
- Line 12: additional condition is added to stop the enumeration when 0 bytes are read to avoid returning empty array.
- Line 25: this condition may not be necessary as when 0, it won’t reach here.
Let’s address the additional conditional checks.
public static IEnumerable<byte[]> Read4KBBlocks(this string filePath)
{
const int bufferSize_4KB = 4 * 1024;
using (var fileStream = File.OpenRead(filePath))
{
var buffer = new byte[bufferSize_4KB];
int bytesRead;
while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) > 0)
{
if (bytesRead == buffer.Length)
{
yield return buffer;
}
else
{
yield return buffer.Take(bytesRead).ToArray();
}
}
}
}
Yay, we are only left with a single concern of bytesRead
being outside the loop. (Line 8). However, we added an additional concern:
- Line 9: the while loop statement is performing the read operation and checking conditions.
This may not be a concern unless coding standard (for maintenance purpose) is to avoid an operation in conditional loop. The line takes longer to read than simple true/false condition.
Let handle the concerns.
public static IEnumerable<byte[]> Read4KBBlocks(this string filePath)
{
const int bufferSize_4KB = 4 * 1024;
using (var fileStream = File.OpenRead(filePath))
{
var buffer = new byte[bufferSize_4KB];
while (true)
{
var bytesRead = fileStream.Read(buffer, 0, buffer.Length);
if (bytesRead == 0)
{
break;
}
if (bytesRead == buffer.Length)
{
yield return buffer;
}
else
{
yield return buffer.Take(bytesRead).ToArray();
}
}
}
}
The unconditional loop condition (Line 8) may be frowned upon. However, this is beautiful. We have single operations for
- setting the state at Line 10
- exit condition at Line 11
The function is available on GitHub: https://github.com/keenam/Codyssey/blob/master/Codyssey.Extensions/FileExtensions.cs