Word count having 2 different languajes in source document
Autor wątku: Ana Lopez
Ana Lopez
Ana Lopez  Identity Verified
Meksyk
Local time: 14:03
Członek ProZ.com
od 2013

angielski > hiszpański
+ ...
Jun 4, 2014

Hello!!

I'm working on a PDF document that has German/English in two "columns" and I only have to translate the English part, do you know any way I can ONLY count the English words?

Trados has statistics, but I don't know if there is a tool to count by language.

The only way I can think of is to count them manually. Do you know anything faster?

Thank you.


 
Jack Doughty
Jack Doughty  Identity Verified
Wielka Brytania
Local time: 21:03
rosyjski > angielski
+ ...
In Memoriam
Convert to Word Jun 4, 2014

You can convert it to Word using an OCR. Abbyy fine Reader and Abbyy PDF Converter come to mind.

 
Ana Lopez
Ana Lopez  Identity Verified
Meksyk
Local time: 14:03
Członek ProZ.com
od 2013

angielski > hiszpański
+ ...
NOWY TEMAT
Can Word count by language? Jun 4, 2014

Thanks! I already converted it to Word however, since the columns are mixed with images I cannot just "select" the English column. Thus asking if there is any other way than by marking page by page. Maybe there isn't, just asking

 
Tony M
Tony M
Francja
Local time: 22:03
Członek ProZ.com
francuski > angielski
+ ...
SITE LOCALIZER
Are languages set? Jun 4, 2014

When you did the conversion using OCR, were you able to set the languages of the relevant bits?

If the text DOES have its 'language' attributes correctly set, then you can do an ordinary word count in Word; then search and replace all for 'any character' + language attribute = (say) German, replacing with nothing.

Then do another word count, and this will be the EN words without the German ones; in fact, you don't even need to have done the preliminary word count, I was
... See more
When you did the conversion using OCR, were you able to set the languages of the relevant bits?

If the text DOES have its 'language' attributes correctly set, then you can do an ordinary word count in Word; then search and replace all for 'any character' + language attribute = (say) German, replacing with nothing.

Then do another word count, and this will be the EN words without the German ones; in fact, you don't even need to have done the preliminary word count, I was just thinking of subtracting the EN from the total, since TOTAL – EN = German, of course!

Naturally, if the language attribute was NOT correctly set in the first place, this won't work; but at least you'll know for next time.

BTW, you say that the images are stopping you from selecting all the EN column, but why? Are they in merged cells or something? You ought to be able to process your table in such a way as to unmerge all the cells, which will probably push all the images into the l/h column or something, but will leave you with two clean columns you can select properly.

Your are SURE it is in a proper Word table? OCR conversions have a nasty habit of 'organizing' (well, that's not what I call it...) text into newspaper-style columns, in which case you'll have a harder job on your hands trying to sort it out. It might even be simpler to convert everything to single-column and remove all column breaks from the document, and then see what you have left...
Collapse


 
Tony M
Tony M
Francja
Local time: 22:03
Członek ProZ.com
francuski > angielski
+ ...
SITE LOCALIZER
Failing that... Jun 4, 2014

...if the original document really is organized neatly into two columns, why not just do another 'dummy' OCR run on it, selecting ONLY the EN column as you go through, so you'll actually have a document at the end of it that ONLY contains the EN you need to translate; you might even be able to use this for your translation, or at worst, it will be a useful intermediate stage for your word count.

[Modifié le 2014-06-04 20:58 GMT]


 
Ana Lopez
Ana Lopez  Identity Verified
Meksyk
Local time: 14:03
Członek ProZ.com
od 2013

angielski > hiszpański
+ ...
NOWY TEMAT
I'll try the option Jun 4, 2014

I'll try making a dummy OCR conversion, from Abbyy, only identifying English as language, and see how it goes with the find & replace.

Thank you so much Tony M.!!


 
Ümit Karahan
Ümit Karahan  Identity Verified
Turcja
Local time: 23:03
angielski > turecki
+ ...
Paste only text Jun 5, 2014

Hi.

Try to copy the all by Ctrl+A, Ctrl+C and then choose to paste it as text only in a blank word page. So you can get rid of images.



[Edited at 2014-06-05 01:14 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word count having 2 different languajes in source document






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »