How to pack GSM-7 characters into septets
This post was written by Jeroen on July 6, 2009
Posted Under: SMS
Posted Under: SMS
- Sending out an SMS in text mode
- Sending out an SMS in PDU mode
- More on the SMS PDU
- Sending a flash SMS message
- What are EMS messages?
- Combining SMS messages
- WAP Push over SMS
- WAP Push SMS encoding
- EMS and WAP Push support
- Another WAP Push over SMS encoding
- SMS based applications
- Text formatting with EMS
- GSM-7 Encoding with the GNU iconv library
- How to pack GSM-7 characters into septets
- References
- Setting Voicemail Waiting Indication via SMS
- SMS via Email
Once we have our text in the GSM-7 character set, we’re ready to write the septets. As is show before, the mapping is kind of awkward, see 3GPP TS 23.038.
Here is the algorithm I use to achieve this:
/*
GSM-7 packing routing.
Written by Jeroen @ Mobile Tidings (http://mobiletidings.com)
*/
int /* Returns -1 for success, 0 for failure */
SMS_GSMEncode(
int inSize, /* Number of GSM-7 characters */
char* inText, /* Pointer to the GSM-7 characters. Note: I could
not have used a 0-terminated string, since 0
represents '@' in the GSM-7 character set */
int paddingBits, /* If you use a UDH, you may have to add padding
bits to properly align the GSM-7 septets */
int outSize, /* The number of octets we have available to write */
unsigned char* outBuff, /* A pointer to the available octets */
int *outUsed /* Keeps track of howmany octets actually were used */
)
{
int bits = 0;
int i;
unsigned char octet;
*outUsed = 0;
if( paddingBits )
{
bits = 7 - paddingBits;
*outBuff++ = inText[0] << (7 - bits);
(*outUsed) ++;
bits++;
}
for( i = 0; i < inSize; i++ )
{
if( bits == 7 )
{
bits = 0;
continue;
}
if( *outUsed == outSize )
return 0; /* buffer overflow */
octet = (inText[i] & 0x7f) >> bits;
if( i < inSize - 1 )
octet |= inText[i + 1] << (7 - bits);
*outBuff++ = octet;
(*outUsed)++;
bits++;
}
return -1; /* ok */
}
The padding bits are used to make sure the GSM-7 septets are written on a septet boundary. If you don’t use a User Data Header (UDH) for combining SMS messages or EMS text formatting or something else and your text starts at the first octet of the User Data (UD), you can leave out padding (set paddingBits to 0).
If you have a UDH than the paddingBits can be calculated as follows:
paddingBits = ((UDHL + 1 ) * 8 ) % 7;
if( paddingBits ) paddingBits = 7 - paddingBits;
UDHL stands for User Data Header Length. I hope this helps everybody who is struggling with GSM-7 encodings.







Reader Comments
Hello.
I have a doubt about your algorithm for padding the bits when using UDH with septets. When you explain the way to calculate the padding bits:
paddingBits = ((UDHL + 1 ) * 8 ) % 7;
if( paddingBits ) paddingBits = 7 – paddingBits;
Is the second line the same that is inside the SMS_GSMEncode() function??
I mean, this is suposed to be calculated twice? My UDHL is 5, so the result of the first line is 6 for me. Should I pass to SMS_GSMEncode() 6 as the value, or (7-6) = 1 (result of second line)??
Thanks, your posts are being of great help.
Rudolfo,
If you have a UDHL of 5, the UDH consists of 6 octets (1 for the UDHL itself). These 6 octets take up 48 bits. Padding this with 1 bit makes it 7 septets. So it seems to me that the 1 padding bit (as calculated) is right.
In this case the first block inside SMS_GSMEncode takes the right 6 bits of the first septet and writes those to the output buffer. The rest of the algorithm will append the remaining bit.
The way septets are packed is a bit confusing, but have look at my attempt to visually show how this works in “More on the SMS PDU”.
Jeroen
Hi Jeroen,
thanks for the quick response. I’m sending the concatenated SMS’s, but when I receive it on my cel phone I only get “dirty” chars.
Two question about the padding:
1) Does it has to be done in every single part of a larger message, or just in the first one?
2) When you do the padding on one message(that is part of a larger message), this message make senses(can be read) individually? Or just when you put it together with the other parts??
Thanks a lot, and sorry about the bad english.
Rodolfo,
The dirty characters probably mean you’ve got your character encoding or padding wrong…
1) The padding is to be done in every part of the larger message. The padding works the same with any type of UDH (see for instance “Text formatting with EMS”) and is completely independent of concatenation.
2) In general any device that doesn’t recognize an UDH or a particular IE within the UDH will ignore that information ad act as if it wasn’t there. For concatenation this means that the individual parts will be treated as individual text messages. So if your concatenated message consists of 3 parts, the recipient will see 3 SMS messages (not necessarily in the order they were sent).
Cheers,
Jeroen
Hi,
I’ve written an application to send long SMS and it arrives as a long SMS on the mobile device but the text is garbled.
My transmit application performs the following processing:
– Creates a User Data Header for each message segment
– Field 1 (1 octet) – set to 6
– Field 2 (1 octet) – set to 8
– Field 3 (1 octet) – set to 4
– Field 4 (2 octets) – set to 0
– Field 5 (1 octet) – set to 3
– Field 6 (1 octet) – set to 1, 2, or 3 (depending on the particular segment)
– 7-bit encode each text message segment
– Set the ESM class to 0×40 in the Submit_SM PDU
– The data coding is set to 0.
– Create the final SMS by joining the text message segment to the relevant User Data Header and send it to the SMSC via the SMPP interface
Looking for some ideas/suggestions as to what I may be doing wrong.
Oh yeah, thanks for your posts – they have been of great help!
Chi,
The most common error for getting garbled text is not starting your encoded text on a 7-bit boundary. In this article this issue is addressed by the padding bits. You’re lucky: the UDH you use (including the first octet that indicates the length) is 7 octets long, this corresponds to exactly 8 septets, so you don’t have to worry about the padding described in this article.
Could it be that SMPP doesn’t require you to 7-bit encode?
Jeroen
Jeroen,
Thanks very much for the speedy response. You were spot on – it appears that SMPP does not require the short message to be 7-bit encoded!
About the choice of UDH used, I did deliberately select it so that I didn’t have to worry about the padding.
Lastly, it is amazing how I get better support here than from my SMSC provider! Thanks once again.
So it turns out that with SMPP you don’t have to worry about padding bits or UDH size, as the 7-bit encoding is not used….
I am glad your problem is solved and that my blog is useful. Please post a link to my blog if you can.
Cheers,
Jeroen
I am still confuse with bit padding…
here are 6 octets of UDH and UDHL…
11111111 11111111 11111111 11111111 11111111 11111111
now, what I understand is
1111111 1111111 1111111 1111111 1111111 1111111 111111(1)
in above (1) at the last of stream is padded bit…rite??
One most important question is when sending long message in 7-bit encoding.. answer the following..
1. encode the whole text with 7-bit encoding and then divide it for appending in multiple PDUs.
Or, the whole Text is divided first according to size and then encoded separately for each PDU.
Kindly reply soon..
Martin,
You’re right about the padding bit. Think of it this way. In an SMS message every character has a fixed position so you could implement a function
You’d use this function regardless of the presence of a UDH. Just the 1st character you can write when you have an UDH of 6 octets would be on position 7 (the space for characters 0 through 6 are whole or partially occupied by the UDH). If you have a function like this you wouldn’t think about ‘padding’ but would executing the same logic.
Continuing this reasoning… if you have a function like this you’d have to break up your text first according to size and then encode each PDU. Though large messages can be split over multiple SMS messages, you can’t split individual characters.
I hope this helps,
Jeroen
Thanks a lot.. Jeroen!
In formula for padding bits uses UDHL for calculations, kindly clear it what the UDHL exactly is? either is it number of octets in UDH only or the number octets in UDH plus 1 for UDHL itself??
UDHL = LengthOf(UDH)
or
UDHL = LengthOf(UDH) + 1
(1 is for UDHL itself)
In the specs the UDHL is part of the UDH. UDHL is the number of octets remaining in the UDH. So the size of the UDH is UHDL + 1.
Jeroen
Hi,
I’ve read your sample code above, but it written in C, and I dont understand C, so Would you mind helping me to translate it into VB 6 code?
I’ve read you article about Combining SMS Messages too, in your example texts (Lorem ipsum…), your PDU code was 986F79B90D4…, but when i used my routine to convert text to PDU it’s result was: A0CCB7BCDC06A5E1F37A1B447EB3DF72D03C4D0785DB653A0B347EBBE7E531BD4CAFCB4161721A9E9EA7C769F7195466A7E92CD0BC4C0691DFA072BA3E6FBFC9207AB90D7FCB4169F7384D4E93EB6E3AA84E07B1C3E2B7BC0C2AD341E437FB2D2F83DAE1B33B0C0AB3D3F17AD805AAD2416577BA0D0A9341EDB43BDD06D9CBEE74B8CD02C5EBE939C8FD9ED3E5
I’m using VB6 to convert text to PDU. These are my routine to generate it:
‘====================Start of Code
Function ConvTxt(txt As String) As String
Dim i As Integer
Dim datArr1(1 To 256) As Byte
Dim l As Integer
Dim touw As String
‘no more than 160 chars
If Len(txt) > 160 Then txt = Left$(txt, 160)
l = Len(txt)
ConvTxt = Right$(“00″ & Hex(Len(txt)), 2)
For i = 1 To l
datArr1(i) = Asc(Mid$(txt, i, 1))
Next i
‘make a bit stream of septets
touw = “”
For i = 1 To l
touw = ToBin7(datArr1(i)) + touw
Next i
‘and convert it to octets
While Len(touw) > 8
ConvTxt = ConvTxt + Bin2Hex(Right$(touw, 8))
touw = Mid$(touw, 1, Len(touw) –
Wend
ConvTxt = ConvTxt + Bin2Hex(touw)
Debug.Print ConvTxt
End Function
Function ToBin7(ByVal num As Byte) As String
‘convert to padded 7 place binary number
While num > 0
ToBin7 = Trim(num Mod 2) + ToBin7
num = num \ 2
Wend
ToBin7 = Right$(“0000000″ + ToBin7, 7)
End Function
Function Bin2Hex(ByVal touw As String) As String
‘convert binary to a padded 2 place hex number
Dim x As Integer
Dim num As Long
For x = 1 To Len(touw)
If Mid$(touw, x, 1) = “1″ Then
num = num + 2 ^ (Len(touw) – x)
End If
Next x
Bin2Hex = Right$(“00″ + Hex(num), 2)
End Function
‘===============End of Code
So, what’s wrong with my code? why the generated PDU different than yours?
Thanks.
Thanks u so much Jeroen, I have done it successfully, now I can send multi-part messages, without your help it was quit impossible.
Thanks again.
Hi Jeroen,
Nice algorithm, but how can we decode the concatenated SMS?
Thanks
Uhm Dear Jeroen, I have the same problem as Jeff. It’s good that I saw this section, so there, I hope you can help us convert the code to VB6, I am so very confused with the conversion from text to PDU part. *swirly eyes*
is there anyone here have vb 6 codes for pdu mode, pls help.. i need it for my thesis.
i need a help about UDH, in logica api am getting the length always 48. how to set octets.
For GSM sms, does this need to be addressed?
http://www.3gpp.org/ftp/Specs/html-info/23038.htm
From 3GPP TS 23.038 V9.1.1 (2010-02)
If the total number of characters to be sent equals
(8n – 1) where n=1,2,3 etc. then there are 7 spare bits at the end of the message. To avoid the situation where the receiving entity confuses 7 binary zero pad bits as the @ character, the carriage return or <CR;> character (defined in clause 6.1.1) shall be used for padding
in this situation, just as for Cell Broadcast.
If <CR> is intended to be the last character and the message (including the wanted ) ends on an octet boundary, then another <CR> must be added together with a padding bit 0. The receiving entity will perform the carriage return function twice, but this will not result in misoperation as the definition
of <CR> in clause 6.1.1 is identical to the definition of <CR><CR>. The receiving entity shall remove the final <CR> character where the message ends on an octet boundary with <CR> as the last character.
If not, then how is this case handled in gsm sms?
Tom,
In my personal experience, all modern devices look at the UDL (User Data Length; which in the case of GSM-7 is expressed in septets) to determine the size of the payload. This avoids any confusion regarding padding bits being interpreted as ‘@’ and <CR> being ignored.
So to be short, as long as you have your UDL correct, this does not need to be handled (I know my code doesn’t handle this).
It is a common bug to first encode the payload and then determine the UDL from the number of octets that the encoded payload occupies. In this case the section you quote may save you.
Jeroen
Jeroen,
My complication is that I am actually trying to pack the text to be sent in an SMPP msg that will then
be sent via GSM to the handset. The length in SMPP is defined in octets.
Is this even valid? Or should the SMSC (the receiver of the SMPP msg) be the one doing the packing?
Thanks
Tom
If you use SMPP you don’t have to pack the characters, the SMSC will do it for you. Likewise you don’t have to worry about padding bits.
Cheers,
Jeroen