15 : DONT CROSS THE STREAMS!

25 Mar 2022

In this weeks episode we're talking about Streams. We can never remember which one to use or if we need to always create a Stream and then use a StreamReader/StreamWriter, or can we use the static File.IO methods? Hopefully this will clear up some of our confusion.  

Andy: There’s something I forgot to tell you. Don’t cross the streams.

Rowan: Why?

Andy: It would be bad.

Rowan: I’m fuzzy on the whole good/bad thing. What do you mean “bad”?

Andy: Try to imagine all life as you know it stopping instantaneously and every molecule in your body exploding at the speed of light.

Andy: Total protonic reversal.

Rowan: That’s bad. Okay. Alright, important safety tip, thanks Andy.

Random fact

  1. The original title was Ghost Smashers.
  2. Director Ivan Reitman made a couple of unorthodox appearances in the movie
    • For the "pigging out" noises of Slimer pigging out on a pile of food before he slimes Peter Venkman (Bill Murray’s character)
    • Reitman’s naturally deep voice also proved perfect for the moment when Dana becomes possessed and says “There is no Dana, only Zuul,”
    • Bill Murray replies with “What a lovely singing voice you must have” 😂

Introduction

A stream is used to transfer data (read/write) from/to a wide range of source/destinations.

So when you say streams, do you also mean things like Reactive Extensions (linq over events), or IAsyncEnumerable or are we talking about the low level, bit level access?

There is a generic stream class System.IO.Stream, which all other stream classes in .NET derive from (FileStream, MemoryStream, many others) - this provides an abstraction that all other subclasses must abide by, but this basically allows the reading of writing of bytes (not text) from/to a source or backing store. Backing stores can be Files, IO devices, Network - sometimes there is no backing store e.g. Signal generator, Events etc

We can group the functions of the Stream class in three categories.

  1. Reading And Writing
  2. Seeking
  3. Buffering, Flushing and Disposing - this is important as it could be unmanaged resource just don’t forget to wrap it with a Using

Common properties and methods of Streams

Properties:

  • CanWrite
  • CanRead
  • CanSeek
  • Length
  • Position
  • ReadTimeout
  • WriteTimeout

Key methods:

  • BeginRead / EndRead / Read / ReadByte / ReadAsync
  • BeginWrite / EndWrite / Write / WriteByte / WriteAsync
  • Seek
  • CopyTo / CopyToAsync
  • Flush / FlushAsync

Reader and Writer classes (for writing encoded character data to streams) - always confuse me:

Point of confusion for writing to a file:

  • I always find myself asking - do I use a StreamWriter or use the Write method on a FileStream and convert to bytes myself, or just use the File.WriteAllText or File.AppendAllLines?

  • Reader and Writer types are for reading encoded characters from streams and writing them to streams - they do the conversions from / to byte data

  • This is because Streams are designed for byte input and output, therefore other classes are needed to do the conversion

  • I’m always confused around which reader / writer to use, particularly in the case of BinaryReader / Writer, or should I just write directly to the Stream if I can get bytes?

  • Example reader / writer classes are:

    • BinaryReader and BinaryWriter – for reading and writing primitive data types as binary values - simplify writing primitive data types to a stream
    • StreamReader and StreamWriter – for reading and writing characters by using an encoding value to convert the characters to and from bytes.
    • StringReader and StringWriter – for reading and writing characters to and from strings - personally never found a need to use these.
    • ABSTRACT CLASS: TextReader and TextWriter – serve as the abstract base classes for the two above and other readers and writers that read and write characters and strings, but not binary data.

Lets look at a few examples of the common implementations we use

FileStream - provides read and write file operations

FileStream and StreamWriter - write out some text

using var fs = File.OpenWrite("output_textfile.txt");
using var sw = new StreamWriter(fs);

sw.WriteLine("Test 1");
sw.WriteLine("Test 2");
sw.WriteLine("Test 3");
sw.WriteLine("Test 4");

FileStream and StreamReader - read the text back in

using var fs = File.OpenRead("output_textfile.txt");
using var sr = new StreamReader(fs);

while (!sr.EndOfStream)
{
	Console.WriteLine(sr.ReadLine());
}

or simpler versions without the need for the FileStream as StreamWriter/StreamReader have an overload that takes a path.

Write

using var sw = new StreamWriter("output_textfile.txt");
sw.WriteLine("Test 1");
sw.WriteLine("Test 2");
sw.WriteLine("Test 3");
sw.WriteLine("Test 4");

Read

using var sr = new StreamReader("output_textfile.txt");
while (!sr.EndOfStream)
{
	Console.WriteLine(sr.ReadLine());
}

You can even use the System.IO.File static methods to simplify things even further

using (var sw = File.CreateText("newfile.txt"))
sw.WriteLine("First line of example");
sw.WriteLine("and second line");

using (var sr = File.OpenText("newfile.txt"))
while (!sr.EndOfStream)
{
	Console.WriteLine(sr.ReadLine());
}

FileStream with BinaryWriter and BinaryReader

float aspectRatio;
string tempDirectory;
int autoSaveTime;
bool showStatusBar;

using (var stream = File.Open("output_binfile.bin", FileMode.Create))
{
	using (var writer = new BinaryWriter(stream))
	{
		writer.Write(1.250F);
		writer.Write(@"c:\Temp");
		writer.Write(10);
		writer.Write(true);
	}
}

using (var stream = File.Open("output_binfile.bin", FileMode.Open))
{
	using (var reader = new BinaryReader(stream))
	{
		aspectRatio = reader.ReadSingle();
		tempDirectory = reader.ReadString();
		autoSaveTime = reader.ReadInt32();
		showStatusBar = reader.ReadBoolean();
	}
}

Span / ReadonlySpan - TextWriters accept these data structures that are pointers into parts of arrays

So in summary just for reading/writing files...

when trying to read text files use StreamReader

when trying to write text files use StreamWriter

when trying to read binary files use FileStream with BinaryReader

when trying to write binary files use FileStream with BinaryWriter

MemoryStream - in memory stream, might use to prepare to write to another stream

  • eg) query data from a database, processing data, write all contents to MemoryStream and then copy MemoryStream to FileStream in one go,

    • why? to minimise writes to file (IO) / minimise write time to file if it is a shared file for example
    • multiple sources merged into a single stream?
    using (MemoryStream ms = new MemoryStream())
    {
      StreamWriter writer = new StreamWriter(ms);
    
      writer.WriteLine("asdasdasasdfasdasd");
      writer.Flush();
    
      //You have to rewind the MemoryStream before copying
      ms.Seek(0, SeekOrigin.Begin); 
      // or
      ms.Position = 0;
    
      using (FileStream fs = new FileStream("output.txt", FileMode.OpenOrCreate))
      {
          ms.CopyTo(fs);
          fs.Flush();
      }
    }
    

Layering of Streams (Streams created on streams)

Layering of stream classes for operations such as the following - :

  • buffering

    • eg) NetworkStream to connect to network resource → BufferedStream over NetworkStream
    • A buffered stream object creates an internal buffer, and reads bytes to and from the backing store in whatever increments it thinks are most efficient. It will still fill your buffer in the increments you dictate, but your buffer is filled from the in-memory buffer, not from the backing store. The net effect is that the input and output are more efficient and thus faster. A BufferedStream object is composed around an existing Stream object that you already have created.
    var t1 = Stopwatch.StartNew();
    // Use BufferedStream to buffer writes to a MemoryStream.
    using (MemoryStream memory = new MemoryStream())
    using (BufferedStream stream = new BufferedStream(memory))
    {
    	// Write a byte 5 million times.
    	for (int i = 0; i < 5000000; i++)
    	{
    		stream.WriteByte(5);
    	}
    }
    t1.Stop();
    Console.WriteLine("BUFFEREDSTREAM TIME: " + t1.Elapsed.TotalMilliseconds);
    
    t1.Restart();
    // Use MemoryStream directly with no buffering.
    using (MemoryStream memory = new MemoryStream())
    {
    	// Write a byte 5 million times.
    	for (int i = 0; i < 5000000; i++)
    	{
    		memory.WriteByte(5);
    	}
    }
    t1.Stop();
    Console.WriteLine("MEMORYSTREAM TIME: " + t1.Elapsed.TotalMilliseconds);
    
  • compress /encrypt data to a file - byte by byte

    • eg) FileStream to destination file → GZipStream over FileStream → source fileStream.CopyTo(GZipStream)

      • Compress
      var uncompressedFilebytes = File.ReadAllBytes(@"d:\temp\a_links.txt");
      using (FileStream fs = new FileStream(@"d:\temp\a_links.txt.gz", FileMode.CreateNew))
      using (GZipStream zipStream = new GZipStream(fs, CompressionMode.Compress))
      {
      	zipStream.Write(uncompressedFilebytes, 0, uncompressedFilebytes.Length);
      }
      
      • Decompress
      var compressedFileStream = File.OpenRead(@"d:\temp\a_links.txt.gz");
      using (FileStream fs = new FileStream(@"d:\temp\a_links.txt.gz.txt", FileMode.CreateNew))
      using (GZipStream zipStream = new GZipStream(compressedFileStream, CompressionMode.Decompress))
      {		
      	zipStream.CopyTo(fs);
      }
      

So you can cross the streams in .NET but be careful!

Benefits of streams

Incremental data processing - don’t need to load everything into memory

Abstraction of backing store - don’t need to know how it works it’s just a stream

Flexibility / Control - low level binary file operations

Random access / seeking - access any part of a file (performant)

Composability / pipelines - chain multiple streams together to perform additional processing e.g. encryption, compression etc

Don’t forget to use System.IO.Abstractions and System.IO.Abstractions.TestingHelpers(https://www.nuget.org/packages/System.IO.Abstractions) - to allow you to mock everything stream related - TEST, TEST, TEST :-)

OS project/utility of the week

https://www.nirsoft.net/

NirSoft web site provides a unique collection of small and useful freeware utilities all of them developed by Nir Sofer

Ivan Reitman OC (October 27, 1946 – February 12, 2022) was a Czechoslovak-born Canadian film and television director, producer and screenwriter. He was best known for his comedy work, especially in the 1980s and 1990s. He was the owner of The Montecito Picture Company, founded in 1998.

Films he directed include Meatballs (1979), Stripes (1981), Ghostbusters (1984), Ghostbusters II (1989), Twins (1988), Kindergarten Cop (1990), Dave (1993), and Junior (1994). Reitman also served as producer for such films as Animal House (1978), Beethoven (1992), Space Jam (1996), and Private Parts (1997).