Pages in topic:   [1 2] >
Merge and split MS Word files
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 02:33
Member (2006)
English to Afrikaans
+ ...
Jan 18, 2013

G'day everyone

A client sent me 100 MS Word files that I need to run some macros on and do some stuff with. It would be a lot simpler if it was just one loooooooong MS Word file. Do you know of reliable utilities that can merge multiple MS Word files and then split it up again into the original file names later?

I can do merging and splitting if the files are plain text, but in this case the files contain formatting, so I can't just convert it to text and then merge a
... See more
G'day everyone

A client sent me 100 MS Word files that I need to run some macros on and do some stuff with. It would be a lot simpler if it was just one loooooooong MS Word file. Do you know of reliable utilities that can merge multiple MS Word files and then split it up again into the original file names later?

I can do merging and splitting if the files are plain text, but in this case the files contain formatting, so I can't just convert it to text and then merge and split it.

The files are in DOC format and I generally use Word 2003 (although I do have Word 2007 available too). There are no tables and text boxes, fortunately -- just formatted text.

I understand that there are utilities that allow you to process multiple MS Word files, but I need to run some pretty complex macros on the file(s), so such multifile utilities probably won't work for me. I just need to merge it all and later split it all.

Any ideas?

Thanks
Samuel
Collapse


 
Tony M
Tony M
France
Local time: 02:33
Member
French to English
+ ...
SITE LOCALIZER
Handy utility Jan 18, 2013

I did once come across a handy utility (around $20) called soemthing like 'split/join' — it actually comes as 2 separate utilities, one for each feature — which does exactly that.

The only real snag I found was that when splitting the files out again, i couldn't find any way to reinstate their original filenames. My workaround was to include a 'filename' field in an added header/footer in each doc (still a pain, though, doing that on 100 docs!), which meant the customer did at l
... See more
I did once come across a handy utility (around $20) called soemthing like 'split/join' — it actually comes as 2 separate utilities, one for each feature — which does exactly that.

The only real snag I found was that when splitting the files out again, i couldn't find any way to reinstate their original filenames. My workaround was to include a 'filename' field in an added header/footer in each doc (still a pain, though, doing that on 100 docs!), which meant the customer did at least have that as a reference at the end of the day.

Sorry I can't give you the actual source of this utility (it was on a PC 4 lives ago!) — but I originally found it quite easily using Google. Hopefully, they may even have imporved it by now!
Collapse


 
Meta Arkadia
Meta Arkadia
Local time: 08:33
English to Indonesian
+ ...
Folder Action Jan 18, 2013

Samuel Murray wrote:
It would be a lot simpler if it was just one loooooooong MS Word file.

Even simpler - and less risky - would be to automatically process the files one by one, so without joining them. I can think of something for OS X - a "folder action" - and I suppose there is something similar for Windows.

Cheers,

Hans

[Edited at 2013-01-18 10:21 GMT]


 
Rolf Keller
Rolf Keller
Germany
Local time: 02:33
English to German
Macros may consume a long time Jan 18, 2013

Samuel Murray wrote:
A client sent me 100 MS Word files that I need to run some macros on and do some stuff with. It would be a lot simpler if it was just one loooooooong MS Word file.


Caution! It depends on the macros. There are some macros that have a quadratical time-comsuming behaviour: If one file needs 8 seconds, two such files combined will need 32 secs, and with 128 files combined the duration will be 36 hours. It would be very annoying if you discern - after 20 hours - that something has gone wrong, because one of the the original files contains a tiny speciality.

So, be aware of what your macros do. If they all have a linear time-consuming behaviour, there will be no problem, though.


 
Philippe Etienne
Philippe Etienne  Identity Verified
Spain
Local time: 02:33
Member
English to French
Potential workaround with TagEditor Jan 18, 2013

Trados 2007 has a Glue feature (SDL 2007>SDL Trados 2007 Freelance>Trados>Tools>SDL Trados Glue in the Programs tree), so maybe you could drag/drop your Word files to the TagEditor window, save them all (100 clicks/shortcuts to save icons, 100 clicks/shortcuts to close each file), use the Glue feature, work on the resulting TagEditor file, then split it back to the original files.

There may also be options to work in Word (rtf?) with the glued file if you need to run macros.
<
... See more
Trados 2007 has a Glue feature (SDL 2007>SDL Trados 2007 Freelance>Trados>Tools>SDL Trados Glue in the Programs tree), so maybe you could drag/drop your Word files to the TagEditor window, save them all (100 clicks/shortcuts to save icons, 100 clicks/shortcuts to close each file), use the Glue feature, work on the resulting TagEditor file, then split it back to the original files.

There may also be options to work in Word (rtf?) with the glued file if you need to run macros.

Philippe
Collapse


 
Diana Coada (X)
Diana Coada (X)  Identity Verified
United Kingdom
Local time: 01:33
Portuguese to English
+ ...
I use Nitro Pdf Jan 18, 2013

Save the Word files as pdf, use Nitro to merge or split them and then convert them back to Word with all the formatting intact.

 
Rolf Keller
Rolf Keller
Germany
Local time: 02:33
English to German
What means "formatting" anyway? Jan 18, 2013

Diana Coada wrote:

Save the Word files as pdf, use Nitro to merge or split them and then convert them back to Word with all the formatting intact.


PDF files do not contain any "real" formatting info. They contain only an optical representation.

Any links between text elements and style templates get lost, because all the style templates get lost. Even simple elements like tabulators get lost. The difference between hard and soft hyphens gets lost, as well as the difference between hard and soft linebreaks.

Actually you get a file that may look similar to the original, but is not properly editable - its just data garbage like a telefax.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 02:33
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
PDF idea seems a bit odd Jan 18, 2013

Rolf Keller wrote:
Diana Coada wrote:
Save the Word files as pdf, use Nitro to merge or split them and then convert them back to Word with all the formatting intact.

PDF files do not contain any "real" formatting info. They contain only an optical representation.


I must say that I did not seriously think that the PDF method would work either. However, if Diana is willing, I can send her two or three sample files to convert to PDF and then convert back again, to see if the formatting remains intact after all. But I doubt if it would.


 
Diana Coada (X)
Diana Coada (X)  Identity Verified
United Kingdom
Local time: 01:33
Portuguese to English
+ ...
No problem, Samuel Jan 18, 2013

You're welcome to send me the files

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 02:33
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Reportback Jan 19, 2013

Diana Coada wrote:
You're welcome to send me the files


I sent Diana three sample files for the round-trip experiment. She merged them in PDF and then converted the PDF to an output DOC file. As I has suspected, the round-trip was not very successful, even though it appeared quite promising at first.

99.9% of the actual text appears to have survived the conversion (if we don't take extra line breaks into account, and if we don't take hidden text into account).

In the output DOC file, hard returns were inserted in mid-sentence in several places (as can be expected from a conversion from PDF). Also, the margins were awfully narrow in at least once place, which caused text to appear broken in mid-word.

In the converted output DOC file, several lines of text from the top of each file were missing (even though they were present in the PDF file).

There were also one or two instances of character changes between the PDF and the output DOC. In one file, backslashes were replaced by yen signs (though interestingly when I copied the yen signs to a new, blank document, they magically changed back to backslashes again). Most characters remained intact, however (e.g. the original files had multiple types of quotes, and all of the quote types survived the round-trip).

Which brings us to the most important changes:

1. The original files had hidden text, which were not present in the PDF, and consequently were not present in the output DOC file.

2. The original files had text in black, grey and red, but in the original files the text also had styles, and these styles were missing in the PDF, and consequently were missing in the output DOC file.

Thanks, Diana, for doing this experiment with us.


 
Meta Arkadia
Meta Arkadia
Local time: 08:33
English to Indonesian
+ ...
The wrong way Jan 20, 2013

I'm still convinced merging/splitting is not the way to go. Even if the PDF trick would have worked, I see no way to arrive at the the original (100) files with there appropriate names, apart from splitting and naming them manually.

I would create an Automator action more or less like this:



You just dump your 100 files on the Folder Action (that looks like a folder), wait a few seconds, and that's all there is to it.

This is not blatant OS X promotion. In fact, Microsoft made around 100 ready-to-use MS Office "actions" available - if not more - to be used with Automator, and those are the actions I use most often. The nice guys from Redmond wouldn't have done that is there wasn't a Windows alternative. The trouble is, I don't know of any, and although AutoHotkey comes to mind, I don't know if it works.

Cheers,

Hans


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 02:33
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
A merger would have a splitter Jan 20, 2013

Meta Arkadia wrote:
I'm still convinced merging/splitting is not the way to go. Even if the PDF trick would have worked, I see no way to arrive at the the original (100) files with there appropriate names, apart from splitting and naming them manually.


Well, a good merger would have an appropriate splitter to go along with it, that names the files correctly. It would not be strange to me if the merger would place an extra page between merged files (with page breaks) that includes a special code with the file's original name in it.

I would create an Automator action more or less like this...


I've never heard of this feature for Windows. It would have to be integrated with MS Word itself, wouldn't it?

Although AutoHotkey comes to mind, I don't know if it works.


Well, yes, I could script the repetitive actions in AutoIt or AutoHotKey. But doing so means having to figure out what happens during each of my macros (e.g. what possible error messages may pop up) so that I can script the appropriate responses to those actions. Otherwise the script will break (best case scenario) or do some damage to the computer while I'm not looking (worst case scenario).

I'm positive that merging and splitting must be possible in an MS Word macro. I can script it in AutoIt (in fact, I worked out how to do it and managed to do the merging already), but surely an MS Word macro would be better.


 
Rolf Keller
Rolf Keller
Germany
Local time: 02:33
English to German
XXX->PDF->XXX will not work, if XXX is a fully editable format Jan 20, 2013

Samuel Murray wrote:

As I has suspected, the round-trip was not very successful


That's was to be expected. Probably you had seen even more problems if you had converted all the 100 files.

Merging/splitting via PDF can work only if the DOC files meet a **lot** of rigid criteria. In general the DOC-> PDF conversion deletes much info. The PDF->DOC conversion adds much info, but this will never be the prevoiusly deleted info - except if the converter software includes a clever clairvoyant.

How would you check the 100 files upfront to make sure that none of them contains any unconvertable features?

styles were missing in the PDF, and consequently were missing in the output DOC file.


Een if the DOC->PDF converter would include the styles, maybe in form of comments, this cannot work under alll circumstances. Just imagine a setting like "Apply this to the whole document". Or imagine that there are differently defined styles with identical names.[/quote]


 
Natalie
Natalie  Identity Verified
Poland
Local time: 02:33
Member (2002)
English to Russian
+ ...

MODERATOR
SITE LOCALIZER
Haven't you tried to do this directly in Word? Jan 20, 2013

For Word 2007:
http://www.ehow.com/how_5833976_merge-files-word-2007.html

For Word 2010 (it looks exactly like in 2007 version!):
http://www.wikihow.com/Merge-Documents-in-Microsoft-Word
(even with a video)


 
Tony M
Tony M
France
Local time: 02:33
Member
French to English
+ ...
SITE LOCALIZER
A Word macro Jan 20, 2013

Can't vouch for it, but I found this, and from the posting date, it predates W2007, 2010, etc.


Sub ConcatenateAllWordFiles()

With Application.FileSearch
.NewSearch
.LookIn = "C:\Test" 'Set this to your directory full of files.
.SearchSubFolders = True 'Set this to false if you don't want subfolders included
.Execute

For i = 1 To .FoundFiles.Count

If Right(.FoundFiles(i), 4) = ".doc" Then


Docume
... See more
Can't vouch for it, but I found this, and from the posting date, it predates W2007, 2010, etc.


Sub ConcatenateAllWordFiles()

With Application.FileSearch
.NewSearch
.LookIn = "C:\Test" 'Set this to your directory full of files.
.SearchSubFolders = True 'Set this to false if you don't want subfolders included
.Execute

For i = 1 To .FoundFiles.Count

If Right(.FoundFiles(i), 4) = ".doc" Then


Documents.Open FileName:=.FoundFiles(i), _
ConfirmConversions:=False, ReadOnly:=False, AddToRecentFiles:=False, _
PasswordDocument:="", PasswordTemplate:="", Revert:=False, _
WritePasswordDocument:="", WritePasswordTemplate:="", Format:= _
wdOpenFormatAuto

current = ActiveDocument.Name
Selection.WholeStory
Selection.Copy
Documents(current).Close
Selection.Paste
Selection.EndKey Unit:=wdLine
End If

Next i

End With

End Sub
posted by pompomtom at 5:33 PM on April 7, 2005

HOWEVER, that doesn't solve the problem of how to split them out again
Collapse


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Merge and split MS Word files






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »