BBC BASIC, the programming language specified by the BBC for its groundbreaking Computer Literacy Project, is 25 years old today! Designed originally for the BBC Microcomputer, BBC BASIC has since been implemented on at least seven different processor types and 30 different computer platforms. Today it is still, in the form of BBC BASIC for Windows, a popular language for programming PCs.
In the early 1980s the BBC set out to educate the public in the use of computers. It was soon realised that the wide variety of different machines, operating systems and languages of the day would cause difficulties. The decision was made to target the educational material at a standard machine running a standard language, thus the BBC Microcomputer and BBC BASIC were born.
The original version of BBC BASIC was written by Sophie Wilson of Acorn Computers; she later developed a more sophisticated implementation for the Acorn Archimedes. Other versions, including those for the Cambridge Computer Z88, Amstrad Notepad range and Microsoft Windows were written by Richard Russell, until recently a Senior Research and Development Engineer with the BBC.
> Other versions, including those for the > Cambridge Computer Z88, Amstrad Notepad range and Microsoft Windows > were written by Richard Russell, until recently a Senior Research and > Development Engineer with the BBC.
I recall the BBC used 7-bit ASCII for characters, and bytes above 127 were used for tokens.
Modern computers seem to use 8-bit character codes, e.g. ISO-8859-1 which is numerically equivalent to Unicode page zero IIRC.
So, have modern versions of BBC BASIC evolved to cope, or would this break compatibility?
I ask because computers are increasingly having to talk over the internet and exchange text files.
In message <rASzg.54372$1g.2...@newsfe1-win.ntli.net> "Kryten" <kryten_droid_obfustica...@ntlworld.com> wrote:
> <n...@rtrussell.co.uk> wrote in message > news:1154418933.157247.21900@m73g2000cwd.googlegroups.com... > > Other versions, including those for the > > Cambridge Computer Z88, Amstrad Notepad range and Microsoft Windows > > were written by Richard Russell, until recently a Senior Research and > > Development Engineer with the BBC.
> I recall the BBC used 7-bit ASCII for characters, and bytes above 127 were > used for tokens.
Strings can contain any high bit set character.
> Modern computers seem to use 8-bit character codes, e.g. ISO-8859-1 which is > numerically equivalent to Unicode page zero IIRC.
Er, that would be 7-bit ASCII then. What computers use in a given context varies considerably.
> So, have modern versions of BBC BASIC evolved to cope, or would this break > compatibility?
There are ways for example to handle 16-bit strings in BASIC, but there isn't extensive support in RISC OS for them (apart from the RISC OS 5 font manager) and there isn't any native BASIC support for them.
> I ask because computers are increasingly having to talk over the internet > and exchange text files.
The problem hasn't really changed all that much recently.
-- Peter Naulls - pe...@chocky.org | http://www.chocky.org/ --------------------------------------------------------------------------- - RISC OS Community Wiki - add your own content | http://www.riscos.info/
> <n...@rtrussell.co.uk> wrote in message > news:1154418933.157247.21900@m73g2000cwd.googlegroups.com... >> Other versions, including those for the >> Cambridge Computer Z88, Amstrad Notepad range and Microsoft Windows >> were written by Richard Russell, until recently a Senior Research and >> Development Engineer with the BBC.
> I recall the BBC used 7-bit ASCII for characters, and bytes above 127 were > used for tokens.
> Modern computers seem to use 8-bit character codes, e.g. ISO-8859-1 which is > numerically equivalent to Unicode page zero IIRC.
> So, have modern versions of BBC BASIC evolved to cope, or would this break > compatibility?
> I ask because computers are increasingly having to talk over the internet > and exchange text files.
I think you are confusing two issues: The encoding the BBC BASIC program is written *in* (which is, as you say, limited to 7-bit ASCII, though not for everything, see below) and the encoding of text that programs written in BASIC can handle. The latter has always been 8-bit for BASIC strings, so there is no problem handling and sending text files in any 8-bit encoding, e.g., ISO-8859-1.
The 7-bit limitation of BBC BASIC does mean that your programs cannot have variable names with top-bit-set characters (most notably accented characters), but they *can* have comments with top-bit-set characters, which, due to the natural language nature of comments is quite useful.
Martin -- --------------------------------------------------------------------- Martin Wuerthner MW Software http://www.mw-software.com/ spamt...@mw-software.com [replace "spamtrap" by "info" to reply]
Peter Naulls <pe...@chocky.org> wrote: > There are ways for example to handle 16-bit strings in BASIC, but there > isn't extensive support in RISC OS for them (apart from the RISC OS 5 font > manager) and there isn't any native BASIC support for them.
One advantage that a library implementing this sort of thing on top of BASIC has over languages like C is that BASIC uses counted strings rather than zero-terminated strings, so you *can* use UCS2 as your encoding easily - it's also very efficient for random access. Downside is your strings consume twice as much memory as before - not bad if a high percentage of the characters need more than 7-bits, but wasteful if not.
Also, if you want to be able to represent character codes for Gothic characters and the like, you'll need 32-bit values, so you'd be better off with UTF-16 (or, more likely, UTF-8) rather than UCS2
If I was to be writing a new program that needed to handle multi-byte characters, I'd usually go for UTF-8 encoding (which the RO5 font manager also supports). Requiring arbitrary random accesses into the string would make me consider that position carefully (but probably still go for UTF-8)
Stewart Brodie wrote: > Peter Naulls <pe...@chocky.org> wrote:
>> There are ways for example to handle 16-bit strings in BASIC, but >> there isn't extensive support in RISC OS for them (apart from the >> RISC OS 5 font manager) and there isn't any native BASIC support >> for them.
> One advantage that a library implementing this sort of thing on top > of BASIC has over languages like C is that BASIC uses counted strings > rather than zero-terminated strings, so you *can* use UCS2 as your > encoding easily - it's also very efficient for random access. > Downside is your strings consume twice as much memory as before - not > bad if a high percentage of the characters need more than 7-bits, > but wasteful if not.
> Also, if you want to be able to represent character codes for Gothic > characters and the like, you'll need 32-bit values, so you'd be > better off with UTF-16 (or, more likely, UTF-8) rather than UCS2
> If I was to be writing a new program that needed to handle multi-byte > characters, I'd usually go for UTF-8 encoding (which the RO5 font > manager also supports). Requiring arbitrary random accesses into the > string would make me consider that position carefully (but probably > still go for UTF-8)
One possibility is to have three different kinds of string (8-bit, 16-bit or 32-bit) as necessary, but hide this as an implementation detail - as long as you stuck to BASIC you'd never know! One type could be promoted to the next highest when necessary.
>> If I was to be writing a new program that needed to handle multi-byte >> characters, I'd usually go for UTF-8 encoding (which the RO5 font >> manager also supports). Requiring arbitrary random accesses into the >> string would make me consider that position carefully (but probably >> still go for UTF-8)
> One possibility is to have three different kinds of string (8-bit, > 16-bit or 32-bit) as necessary, but hide this as an implementation > detail - as long as you stuck to BASIC you'd never know! One type could > be promoted to the next highest when necessary.
Why complicate matters? How often you you need to index into a string where it is time critical? If you usually manipulate strings by searching or iteration then UTF-8 is a pretty good choice for most western european languages.
>>> If I was to be writing a new program that needed to handle multi-byte >>> characters, I'd usually go for UTF-8 encoding (which the RO5 font >>> manager also supports). Requiring arbitrary random accesses into the >>> string would make me consider that position carefully (but probably >>> still go for UTF-8)
>> One possibility is to have three different kinds of string (8-bit, >> 16-bit or 32-bit) as necessary, but hide this as an implementation >> detail - as long as you stuck to BASIC you'd never know! One type could >> be promoted to the next highest when necessary.
> Why complicate matters? How often you you need to index into a string > where it is time critical? If you usually manipulate strings by > searching or iteration then UTF-8 is a pretty good choice for most > western european languages.
> James
Depends what you're doing, in some of the work I've done this is fairly useful and time-critical. I like the idea of the automatic promotion and has given me an idea for something I've been trying to do for some time now...