Buffered file copy (copy data in blocks)

Started by kevin, March 20, 2012, 09:32:20 AM

Previous topic - Next topic

kevin

 Buffered file copy (copy data in blocks)

 This is a little tidbit showing how we can copy a file by buffering data in memory to avoid some disc trashing.   What it does,  is rather than reading a byte from the source and then writing it a destination file, which has massive amounts of latency on both the read and writing,  we read a chunk of the file into memory, then write than chunk back out to the disc.   This takes advantage of the fact that file systems / hard drives prefer to read/write large chunks of data in one hit, rather than byte by byte.

  Here's the basic principal (slow Version),    

PlayBASIC Code: [Select]
   SrcFile$  ="SomeFile.Data"
DestFile$ =replace$(SrcFile$,".","_copy.")

if SrcFile$<>DEstFile$

Size_Of_File =FileSize(SrcFile$)
HunkSize =$10000

SrcFileHandle =ReadNewFile(SrcFile$)
DestFileHandle =WriteNewFile(DestFile$)

TempBank = NewBank(HunkSize)

Hunks=Size_Of_File/HUnkSize

; Copy the file in chunks
for Hunklp=1 to HUnks
SpoolData(SrcFileHandle,DestFileHandle,TempBank, HunkSize)
next

; copy any remaining bytes
SpoolData(SrcFileHandle,DestFileHandle,TempBank, Size_Of_File-(Hunks*HunkSize))

; Kill temp buffer
DeleteBank TempBank

; close files
Closefile SrcFIleHandle
CloseFile DestFileHandle
endif


print "DONE"
print srcfile$
print destfile$
print FileSize(SrcFIle$)
print FIleSize(DestFIle$)


sync
waitkey




Function SpoolData(SrcFileHandle,DestFileHandle,Bank, Size)

; Read a block of data into the temp buffer
For lp=0 to Size-1
PokeBankByte Bank, lp, ReadByte(SrcFileHandle)
next

; Spit this block of data back out to the output file
For lp=0 to Size-1
WriteByte DestFileHandle, PeekBankByte(Bank,lp)
next

endFunction






    We can make this much more efficient in PlayBASIC by reading /writing blocks of  bytes rather than individual  bytes.   So here's a version using the Read / WriteCHR functions.  


PlayBASIC Code: [Select]
   SrcFile$  ="SomeFile.Data"
DestFile$ =replace$(SrcFile$,".","_copy.")

if SrcFile$<>DEstFile$

Size_Of_File =FileSize(SrcFile$)
HunkSize =$10000

SrcFileHandle =ReadNewFile(SrcFile$)
DestFileHandle =WriteNewFile(DestFile$)

TempBank = NewBank(HunkSize)

Hunks=Size_Of_File/HUnkSize

; Copy the file in chunks
for Hunklp=1 to HUnks
SpoolData(SrcFileHandle,DestFileHandle,HunkSize)
next

; copy any remaining bytes
SpoolData(SrcFileHandle,DestFileHandle, Size_Of_File-(Hunks*HunkSize))

; close files
Closefile SrcFIleHandle
CloseFile DestFileHandle
endif


print "DONE"
print srcfile$
print destfile$
print FileSize(SrcFIle$)
print FIleSize(DestFIle$)


sync
waitkey




Function SpoolData(SrcFileHandle,DestFileHandle,Size)

; Read a block of data into the temp buffer
s$=ReadChr$(SrcFileHandle,Size)

; Spit this block of data back out to the output file
WriteChr DestFileHandle,s$,Size

endFunction







leopardps

Is there an 'ideal' chunk size to read/write to/from the disk?  or is that a device dependent thing?

I understand reading/writing in small sections (bytes, words, etc) is inefficient for the reasons you mention, but I imagine that there is also an 'optimal' size, like a disk sector, or something.

For instance, is it faster to read a 1gb chunk from a disk, or, 4 chunks of 256mb? or will they be about the same either way? (I assume that no native disk 'chunk' sizes are going to be as large as 256mb, at least not yet, maybe in a few years, lol!).  If the disk read cache was 256mb, then that would be the 'optimal' size to read/write, correct?

kevin

#2
  Well a few meg is more than enough (generally) as the cache in your average HD the is unlikely to be huge.  So trying to pull chunks bigger than it can cope with isn't going to make it faster.   The annoying thing about PC land  how the variance in hardware can be big differences, both positively and negatively...  So I tend to assume the lowest common denominator, then not bother about it :)