|
|
Questions about how to use alternative (DOS) software to deal with MS files.
269 responses total.
Someone just sent me a zip file which contains three .rtf files, two of which start with the same 8 characters. pkunzip, when it reached the second file, said a file by that name already exists and gave me a choice of overwriting the file or not. I said no, and got out only two not three files. Is there some way to use pkunzip or some other DOS unzip program to get out all three files? And is there a DOS rtf reader that will handle 'Windows rtf' files? The one I have (VIEW) does not seem to work, just gave me all the rtf tags. The simplest solution was to ask for them to convert the rtf to ascii and give 8-character distinct file names.
I converted the rtf to ascii and here is what I got. Is this some cousin of
html? Why would it be sent to a translator? I omitted company name.
C:\DOWNLOAD\ANCILLAR.RTF
Normal;tw4winMark;tw4winError;tw4winPopup;tw4winJump;tw4winExternal;tw4winIntern
l;tw4winTerm;DO_NOT_TRANSLATE;
<stf "F3.01">
<sourcecharset "Normal">
<sourcelanguage "English (US)">
<sourcehyphenation "Turn Off">
Solution_278004r2¨Phase1_convertions_for_translation¨mif¨278004-002-body.mif">
<bb "Cross-Reference Formats">
<bb "Variable"><:v "<:bb "paratext[Chapter Title]">" 1>
<vfmt "Running H/F 3"><:bb "paratext[Chapter Title]">
<bb "Variable"><:v "<:bb "curpagenum">" 2>
<vfmt "Current Page #"><:bb "curpagenum">
<bb "Variable"><:v "User's Guide" 3>
<vfmt "Manual Title">{>User's Guide<}{User's Guide<0}
(there was lots more - with all the actual English words doubled)
RTF is a file format that stores typesetting information as well as text, though the file itself is text. If you want to send someone a Word document, you can save it as RTF and send that without losing the formatting. It's easier to transmit, since it's text and smaller than the original word doc, and portable to other applications (and among different versions of Word). It also can't contain viruses the way a Word doc can.
That is interesting to know about the viruses and about RTF being portable between versions of WORD. I used a program called VIEW to convert the RTF to ascii. All those tags are TRADOS tags. The person who sent the zip file said she would send me an rtf file. I have no idea what that will be. What is TRADOS? Why are all the English phrases in it doubled? They apparently converted the TRADOS to rtf. This is all from a Czech translation agency which for some reason is translating a computer manual from English to Macedonian and found my website.
TRADOS is software for translations and localization. The Czechs are doing software localization - anyone have any experience in that? We had a French friend who was doing it and had to make the French short enough to fit into the space occupied by the English. His primary complaint is that the software was badly written even in English.
(RTF stands for Rich Text Format, IIRC.)
I thought it was an abbreviation of RTFM.
Still no new file. I cannot even imagine what format the Macedonian might arrive in. We may end up going the scan-print-pdf route again. I can convert pdf to pbm or ps now and read it, at least. Perhaps they can export to html?
Try moving the first file, giving it a new name, then let pkunzip overwrite
when it gets to the second file. Thus you would end up with
file1new # the first file with a new name
file1 # the second file with the first name
file3S # the third file
How do I use pkunzip to unzip files one at a time rather than all at once? This is purely theoretical by now. They finally realized that I am not a native speaker of Macedonian, which I have been telling them all along, when I pointed out that I cannot correct grammar errors in their Macedonian. Somehow I was not expecting this to turn into paid employment. I think somewhere I may have a .doc file for pkunzip.....
pkzip/unzip comes with its own help file ...
You mean pkunzip /h or /? I will try that. Thanks. Curious what form the Macedonian is in. One time someone sent me Cyrillic that consisted solely of ?? ? ????? ??. They saved it as text but not as Cyrillic.
I think the idea with pkunzip was to unzip all the files, tell it not to
overwrite, rename the first file, and unzip again letting it overwrite.
But they figured out how to name all three files differently before zipping
them this time. They sent me a zip file containing three files in Macedonain
- Macedon.zip (8 characters). This file unzipped to mac- bod.rtf and mac-
fro.rtf and ancilar.rtf (they apparently shortened the long file names by
removing a number that was identical in both files, but did not shorten the
resulting names to 8 characters and body was cut off to bod). The first two
files have spaces in them, which DOS is not happy with. I could not view
either of them with my DOS-based VIEW program (file does not exist or is zero
length) or with Newdeal (which uses a menu system and found the files but
could not open them). I could not rename the files with DOS rename. I could
look at them with LIST by typing LIST *.rtf. I could view the ancilar.rtf
file with Newdeal. Newdeal does not handle Cyrillic and instead told me
Language 1024 and then /'e08/ etc. Lots of strings of coded characters. Where
can I find a list of how the codes correspond to ASCII numbers, assuming that
they do?
I had asked these people to convert their files to ascii or at least
to WORD so I could read them. I can read WORD 6 for Win31. Or they could
have printed everything out on a page or two and emailed me a scanned file.
Or converted to html if they knew how to do that. Apparently they don't know
how to remove TRADOS tags - they said they removed some of them. The TRADOS
version refers to PDF a few times. I could have dealt with a PDF file but
not a TRADOS file of Cyrillic in Windows rtf format.
I think you can give pkunzip a list of files to extract from an archive, but it's been so long since I used it on the command line I can't remember. These days I tend to use WinZip or Alladin Expander.
You could use Norton Utilities to go in and change the names of those files to not have spaces. Word 6 for Win31 should be able to read rtf files, I think.
Thanks, I will try Word 6 on them next time, if there is a next time. I wish people would learn to use Windows enough to be able to convert their files to ASCII. Lots of Windows users don't even know how to rename files. Next problem, which Mark has been trying to help us with. Tim Ryan gave us a partly-working Canon Multipass CMP3500 printer which uses a 25M zipped (self-executing) driver that we cannot manage to fit into our computer. Mark suggested downloading one for a previous modem (3000) which we are about to try. But since we are not interesting in doing anything except simple printing, would a driver from another Canon BJ printer work as well? Win95 comes with lots of those already (BJ4000 takes the same cartridge, I think). The C3000.exe file is 8M - I cannot imagine how they got the CMP3500.exe file up to 25M when the two models appear to do the same things (fax, copy, print). At one point the computer said it needed a bidirectional cable - is that the same as IEEE shielded cable or would an ordinary printer cable work? We can get the IEEE for $5 from MCM Electronics. Our friend offered to lend us the one he bought for only $25 retail. I have a non-windows program that claims to print with BJ4000 driver but it told me the port was already in use - is that a cable problem? The port was in use only because we had it hooked up to the Canon printer. A friend's Canon BJ printer will print from DOS (printscreen, text editor) but this one will not.
You could also use DOS rename -- "rename mac-?bod.rtf mac-body.rtf"
> Thanks, I will try Word 6 on them next time, if there is a next time. > I wish people would learn to use Windows enough to be able to convert > their files to ASCII. Translation: I want other people to learn Windows so I don't have to. ;)
I would be delighted if nobody used WIndows, but if they insist on it they should learn how to rename their files and convert them to something non-MS. I will try using rename that way - did not know you could do it batch. We are about to try two borrowed printer cables and the M3000 driver. We can order inkjet ink for $30/pint, which is 12 refills for a 40ml cartridge (or 24 for a 20 ml, or about 100 for a 4 ml). Useful when the fax paper funs out but people keep giving us more of it.
While you can use rename batch, the particular example I gave was intended to match just one file (which is why the replacement doesn't have any wildcards). An example of using rename in a batch mode would be "rename *.lst *.txt" which would rename any files with an extension of lst to have an extension of txt instead.
How would the rename program deal with file 1.txt and file21.txt if I told it rename file?1.txt file1.txt? Would it not have to rename two different files to the same name?
re #21: It will return "Duplicate file name or file in use" if there is a conflict for file names. You would be attempting to give both files the same name, which would be a conflict. It's not too easy to make file names containing spaces in MS-DOS or the Windows command prompt. I found I could do it, from the Windows 98 command prompt anyway, by enclosing that part of the file name in quotes, for example: rename abc.txt "a c".txt You could use this in reverse, too, to rename one file. For example: rename "file 1".txt 1file.txt
I just tried the first of these with MS DOS 6.22. Parameter format not correct - "a Maybe Windows DOS's can handle this but mine cannot. Hans in Sweden informed me that 'port already in use' means that the printer will work only with Windows. Somewhere there is a driver only 3M long that should work with the model 3000 but the link to it at driverguide is broken. We will try the 8M model on the 3000 and maybe also on the 3500. The 3000 claims to also work with Win31 but I could not find a driver. How would one go about making a DOS driver for these printers?
What do you mean by a DOS driver, Sindi? The idea of having drivers for printers built into the operating system is a difference between DOS and Windows.
A driver for a non-windows program, for instance for WP or Newdeal or printgf or gifprint or ghostscript or the other programs I have been using to print text or graphics with. They all have Epson 9-pin and 24-pin and IBM 24-in generic drivers that work with most dot-matrix printers. Is there some generic print driver that might work with a bubble-jet which will not even Print Screen? Newdeal has drivers for BJC-4000 and BJC-4200, and the 4200 and MPC3000 are supposed to work with the same Win95 driver but when I try using ND it tells me 'port already in use'.
Well, you should ask the manufacturers of those programs.
I just wrote 'tech support' at Newdeal, which seems to be defunct but the support person is still supporting it. I don't think WP is supported any longer but I will check user groups.
EUREKA! This was probably written by the guy who recruited me for beta
testing. He is amazingly good at his job.
NewDeal Technical Support Document 272
Specific Printer Notes, Canon
_________________________________________________________________
All Canon BubbleJets [long section on other models omitted here]
QUESTION: Why does my Canon printer spit out a blank page between each
printed page?
ANSWER: Canon forces the printing to start either 1/4 or 1/2 inch from
the top of the page, depending on the printer type. This causes the
printable area on the page to be that much less (10 3/4 or 10 1/2 on
an 11 inch page). The software thinks the paper is 11 inches long and
keeps track, down to the micro inch, where it is on the paper. When it
reaches the 10 1/2 inches, your printer calls the page done and ejects
it. However, the software thinks there is still another 1/2 inch to
print (usually the footer area). Now when you insert a new sheet of
paper, the software finishes printing the rest of the page (normally a
blank footer) and form feeds to the next sheet.
To correct what the software thinks the paper size is, you need to run
the Preferences application and change the default settings. Select
the Printer button. In the next window, select the 'Change
Defaults....' button. Change both the Document size and the Paper size
to a height of 10.875 and click OK. Try a new document to see if the
paper size is correct. You may need to go back and change the height a
few times until it is correct (the height changes in .125 increments).
Usually either 10.875 (for 1/4 inch forced top margin printers) or
10.75 ( for 1/2 inch forced top margin printers) will work.
Canon MultiPASS and MultiPASS C5000
To use the MultiPASS with NewDeal, you will need to select an
emulation printer driver from those listed within NewDeal. Close the
MultiPASS Background. On the MultiPASS control panel:
1. Press FUNCTION
2. Press 0 on the numeric keypad
3. Press the right search arrow '>' until #4 DOS Printing appears
4. Press START/COPY, ON appears
5. Press START/COPY again
The display will read 'Printer Mode.' You are now ready to print from
a DOS application.
The MultiPASS can be used with the following print drivers in Windows
and in DOS programs:
* BJC 4300, 4200, 4100, 4000, 70, 600, 600e, 800 (color)
* BJ 200, 200e, 200ex, 230, 20, 10ex
* Epson LQ 2550 & 2500 (color)
* Epson LQ 510,850, 500
Note: The MultiPASS was designed to print using Windows applications.
Canon does not guarantee print results from DOS.
Last Modified 26 Jun 1999
I just found a new way to remove spaces from files. If they are sent to my ISP webmail as an attachment and I download them, the name 20411-4 NYS Ed NC82.doc (three spaces) downloads and saves to my home directory as 20411 (Unix apparently ends things just before a -). Comments? I suggested to someone who was going to fax me a document for translation that he scan it and send it as a BW gif or even a BW pdf. It came as WORD. Now I get to test ANTIWORD, the latest weapon against MS, which claims to read the most recent WORD even in Cyrillic and other character sets.
There is something odd about that last post. From a distance it looks like baby Jesus. But the voice-over sounds more like Willy Nelson.
Looks like the Shroud of Turin to me. (Suggestion for future reference: Don't post binary files in responses. They can mess up a person's terminal settings. Better to save the file in your home directory and tell people what its name is.)
Okay,take a look at 20411-4.doc When I was downloading it seemed to imply that this is WORD97 and I told one of my programs to view it as that. I see (here, not when using pico) <84> <97> etc. Perhaps these are codes for Cyrillic characters? I took a look at this with Jim's text editor and we found the name of the company and the strings .tif and .png so I tried renaming it to those and the view program could not recognize the format. This is an image viewer that does 40 formats of images. Every time I master a new format someone invents another one. I can handle gif and even print it, pdf (convert to pbm or ps and view and print those), rtf (without Cyrillic, anyway), even various WORDs, but now they send TRADOS and whatever this one might be. I expect a fax tomorrow, I hope. Would anyone with WORD97 like to take a look at the complete text of this one and tell me why no programs can tell the number of lines in it? PICO told me 'long line length'. I chopped off the beginning for anonymity but I can forward you the whole thing.
A few of the programs that we looked at this with identify it as WORD 97 or WORD 8 (same thing) and it came labelled .doc. I sent a copy to a friend with WORD97. There are 299 lines each about three times normal length, which is probably why my programs cannot identify the number of lines as they do not recognize lines over a certain length. Jim says some of the lines may be extremely long and some very short. They were full of nulls.
I'm not sure what "20411-4.doc" is. It looks vaguely like a word
document with all the NULs stripped out and perhaps otherwise mangled.
Word documents are stored in what MicroSoft calls "structured storage"
and the Linux world calls "OLE" files; it's a binary file structure
similar in some ways to a floppy disk image. Removing the NULs and
otherwise randomly mangling it will destroy the data structures and
leave largely useless remains.
(Needless to say, pasting it in as a response will only further
mangle whatever binary data was left...)
"20411-4" (no .doc) on the other hand, *does* appear to be a valid
MicroSoft word document, at least after a fashion. The "structured
storage" data structures are completely intact, so at that level at
least, it's completely intact. I've forgotten at which offset nFIB is
stored, and I never completely understood which nFib corresponded to
what version of Word. As best I can determine, the actual data stream
of the word document consists of just this: ^A^M which likely means
there's one special object and the obligatory paragraph ending.
I think all MicroSoft products that use "structured storage" always
arrange to write a special document summary stream, that contains
a few interesting strings, including:
Copper Translation Service
_PID_GUID
_PID_HLINKS
{5816E8C0-39C3-11D5-96A5-00107A185D3C}
and in unicode:
E:\Current Copper Work\Untitled.tif
I'm guessing that the bracketed number is some sort of UUID, and may
perhaps be the registration number of the copy of Word that made this
file. Maybe. As a companion to to the WordDocument stream, there's the
"Data" stream, which contains the objects of the document. I don't see
anything that looks like TIFF, but there is something that looks like a
valid PNG header at offset +0x133. In fact, it appears to be a 1482 x
1287 1 bit non-interlaced image; you can get it here:
http://www-personal.umich.edu/~mdw/20411-4.png
The first line appears to read:
l. Srpskohrvatski jezik i knjizevnost 2 - -
There is a hat over the last "z" on the line, and there are about 22
more mostly similar text lines. Looks like a report card, maybe.
And, before you ask, no, I specifically disclaim any interest in dealing
with any further files like this. If *you* want to be able to do this,
you should strongly consider installing Linux, there's something called
"libole" out in the Linux world that understands MicroSoft "structured"
storage, and MicroSoft has published documents describe the content and
fields of MicroSoft word documents, which are well distributed on the
web today. I learned about this stuff mainly so that I could read the
formatting information in word documents; I have no actual idea how the
"Data" stream is actually formatted, and was actually quite surprised to
see PNG magic in the file.
Sindi, could you please expurgate #30? It made my terminal unusable when I read it; I had to log off and log back on again.
It did nasty things to my xterm as well.
How do I expurgate 30? (The full command and which prompt to type it at
please). I did not post the whole file or the part with the company name
to save embarrassing them in public but I suppose they will never know.
We saw something in there like untitled1.tif and at the end of the same line
also PNG but renaming it to PNG did not make something viewable.
The line you typed out for me means 'Serbocroatian language and literature'
so I suspect it is someone's diploma. Marcus, I am astonished and grateful
that you were able to do this transformation and I hope never to come up with
this format again as the company should know better. I told them to send a
gif or a pdf file. A png would have worked. A WORD png is something else.
They may have converted tif to png to WORD?
I have notified NEWDEAL that their WORD filter does not work on these
things, but then again my friend with WORD97 also could not read anything in
it. He tried various programs one of which said 'mangled email attachment'.
The author of VIEW 1.70 says it is a graphic and his program only converts
text. So does ANTIWORD.
I am off to download and maybe even translate Marcus' png.
I tried, from the bbs prompt, expurgate 69 30 and scribble 69 30. No luck. png file downloaded, thanks Marcus. It is 2/3 the size of the WORD (?) file.
| Last 40 Responses and Response Form. |
|
|
- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss