How to pack GSM-7 characters into septets

This post was written by Jeroen on July 6, 2009
Posted Under: SMS
This entry is part 14 of 17 in the series Sending out an SMS

Once we have our text in the GSM-7 character set, we’re ready to write the septets. As is show before, the mapping is kind of awkward, see 3GPP TS 23.038.

Here is the algorithm I use to achieve this:

/*
   GSM-7 packing routing.
   Written by Jeroen @ Mobile Tidings (http://mobiletidings.com)
*/
int                                /* Returns -1 for success, 0 for failure */
SMS_GSMEncode(
   int             inSize,         /* Number of GSM-7 characters */
   char*           inText,         /* Pointer to the GSM-7 characters. Note: I could
                                      not have used a 0-terminated string, since 0 
                                      represents '@' in the GSM-7 character set */
   int             paddingBits,    /* If you use a UDH, you may have to add padding
                                      bits to properly align the GSM-7 septets */
   int             outSize,        /* The number of octets we have available to write */
   unsigned char*  outBuff,        /* A pointer to the available octets */
   int            *outUsed         /* Keeps track of howmany octets actually were used */
)
{
   int             bits = 0;
   int             i;
   unsigned char   octet;
   *outUsed = 0;
   if( paddingBits )
   {
      bits = 7 - paddingBits;
      *outBuff++ = inText[0] << (7 - bits);
      (*outUsed) ++;
      bits++;
   }
   for( i = 0; i < inSize; i++ )
   {
      if( bits == 7 )
      {
         bits = 0;
         continue;
      }
      if( *outUsed == outSize )
         return 0; /* buffer overflow */
      octet = (inText[i] & 0x7f) >> bits;
      if( i < inSize - 1 )
         octet |= inText[i + 1] << (7 - bits);
      *outBuff++ = octet;
      (*outUsed)++;
      bits++;
   }
   return -1; /* ok */
}

The padding bits are used to make sure the GSM-7 septets are written on a septet boundary. If you don’t use a User Data Header (UDH) for combining SMS messages or EMS text formatting or something else and your text starts at the first octet of the User Data (UD), you can leave out padding (set paddingBits to 0).

If you have a UDH than the paddingBits can be calculated as follows:

paddingBits = ((UDHL + 1 ) * 8 ) % 7;
if( paddingBits ) paddingBits = 7 - paddingBits;

UDHL stands for User Data Header Length. I hope this helps everybody who is struggling with GSM-7 encodings.

Series NavigationGSM-7 Encoding with the GNU iconv libraryReferences
Tags: , ,

Reader Comments

Hello.

I have a doubt about your algorithm for padding the bits when using UDH with septets. When you explain the way to calculate the padding bits:

paddingBits = ((UDHL + 1 ) * 8 ) % 7;
if( paddingBits ) paddingBits = 7 – paddingBits;

Is the second line the same that is inside the SMS_GSMEncode() function??

I mean, this is suposed to be calculated twice? My UDHL is 5, so the result of the first line is 6 for me. Should I pass to SMS_GSMEncode() 6 as the value, or (7-6) = 1 (result of second line)??

Thanks, your posts are being of great help.

#1 
Written By Rodolfo on September 16th, 2009 @ 12:16 pm

Rudolfo,

If you have a UDHL of 5, the UDH consists of 6 octets (1 for the UDHL itself). These 6 octets take up 48 bits. Padding this with 1 bit makes it 7 septets. So it seems to me that the 1 padding bit (as calculated) is right.

In this case the first block inside SMS_GSMEncode takes the right 6 bits of the first septet and writes those to the output buffer. The rest of the algorithm will append the remaining bit.

The way septets are packed is a bit confusing, but have look at my attempt to visually show how this works in “More on the SMS PDU”.

Jeroen

#2 
Written By Jeroen on September 16th, 2009 @ 12:55 pm

Hi Jeroen,

thanks for the quick response. I’m sending the concatenated SMS’s, but when I receive it on my cel phone I only get “dirty” chars.

Two question about the padding:
1) Does it has to be done in every single part of a larger message, or just in the first one?
2) When you do the padding on one message(that is part of a larger message), this message make senses(can be read) individually? Or just when you put it together with the other parts??

Thanks a lot, and sorry about the bad english.

#3 
Written By Rodolfo on September 16th, 2009 @ 2:04 pm

Rodolfo,

The dirty characters probably mean you’ve got your character encoding or padding wrong…

1) The padding is to be done in every part of the larger message. The padding works the same with any type of UDH (see for instance “Text formatting with EMS”) and is completely independent of concatenation.
2) In general any device that doesn’t recognize an UDH or a particular IE within the UDH will ignore that information ad act as if it wasn’t there. For concatenation this means that the individual parts will be treated as individual text messages. So if your concatenated message consists of 3 parts, the recipient will see 3 SMS messages (not necessarily in the order they were sent).

Cheers,
Jeroen

#4 
Written By Jeroen on September 16th, 2009 @ 2:24 pm

Hi,

I’ve written an application to send long SMS and it arrives as a long SMS on the mobile device but the text is garbled.

My transmit application performs the following processing:
– Creates a User Data Header for each message segment
– Field 1 (1 octet) – set to 6
– Field 2 (1 octet) – set to 8
– Field 3 (1 octet) – set to 4
– Field 4 (2 octets) – set to 0
– Field 5 (1 octet) – set to 3
– Field 6 (1 octet) – set to 1, 2, or 3 (depending on the particular segment)
– 7-bit encode each text message segment
– Set the ESM class to 0×40 in the Submit_SM PDU
– The data coding is set to 0.
– Create the final SMS by joining the text message segment to the relevant User Data Header and send it to the SMSC via the SMPP interface

Looking for some ideas/suggestions as to what I may be doing wrong.

Oh yeah, thanks for your posts – they have been of great help!

#5 
Written By Chi on January 6th, 2010 @ 8:03 am

Chi,

The most common error for getting garbled text is not starting your encoded text on a 7-bit boundary. In this article this issue is addressed by the padding bits. You’re lucky: the UDH you use (including the first octet that indicates the length) is 7 octets long, this corresponds to exactly 8 septets, so you don’t have to worry about the padding described in this article.

Could it be that SMPP doesn’t require you to 7-bit encode?

Jeroen

#6 
Written By Jeroen on January 6th, 2010 @ 9:36 am

Jeroen,

Thanks very much for the speedy response. You were spot on – it appears that SMPP does not require the short message to be 7-bit encoded!

About the choice of UDH used, I did deliberately select it so that I didn’t have to worry about the padding. :)

Lastly, it is amazing how I get better support here than from my SMSC provider! Thanks once again.

#7 
Written By chi on January 6th, 2010 @ 10:40 am

So it turns out that with SMPP you don’t have to worry about padding bits or UDH size, as the 7-bit encoding is not used….

I am glad your problem is solved and that my blog is useful. Please post a link to my blog if you can.

Cheers,
Jeroen

#8 
Written By Jeroen on January 6th, 2010 @ 4:22 pm

I am still confuse with bit padding…

here are 6 octets of UDH and UDHL…
11111111 11111111 11111111 11111111 11111111 11111111

now, what I understand is

1111111 1111111 1111111 1111111 1111111 1111111 111111(1)

in above (1) at the last of stream is padded bit…rite??

One most important question is when sending long message in 7-bit encoding.. answer the following..

1. encode the whole text with 7-bit encoding and then divide it for appending in multiple PDUs.

Or, the whole Text is divided first according to size and then encoded separately for each PDU.

Kindly reply soon.. :)

#9 
Written By Martin on January 22nd, 2010 @ 3:50 pm

Martin,

You’re right about the padding bit. Think of it this way. In an SMS message every character has a fixed position so you could implement a function

   write7bitChar( n, c )
   // n = position between 0 and 160
   // c = GSM char value

You’d use this function regardless of the presence of a UDH. Just the 1st character you can write when you have an UDH of 6 octets would be on position 7 (the space for characters 0 through 6 are whole or partially occupied by the UDH). If you have a function like this you wouldn’t think about ‘padding’ but would executing the same logic.

Continuing this reasoning… if you have a function like this you’d have to break up your text first according to size and then encode each PDU. Though large messages can be split over multiple SMS messages, you can’t split individual characters.

I hope this helps,
Jeroen

#10 
Written By Jeroen on January 22nd, 2010 @ 10:34 pm

Thanks a lot.. Jeroen!

In formula for padding bits uses UDHL for calculations, kindly clear it what the UDHL exactly is? either is it number of octets in UDH only or the number octets in UDH plus 1 for UDHL itself??

UDHL = LengthOf(UDH)
or
UDHL = LengthOf(UDH) + 1
(1 is for UDHL itself)

#11 
Written By Martin on January 23rd, 2010 @ 5:28 am

In the specs the UDHL is part of the UDH. UDHL is the number of octets remaining in the UDH. So the size of the UDH is UHDL + 1.

Jeroen

#12 
Written By Jeroen on January 23rd, 2010 @ 6:50 am

Hi,

I’ve read your sample code above, but it written in C, and I dont understand C, so Would you mind helping me to translate it into VB 6 code?

I’ve read you article about Combining SMS Messages too, in your example texts (Lorem ipsum…), your PDU code was 986F79B90D4…, but when i used my routine to convert text to PDU it’s result was: A0CCB7BCDC06A5E1F37A1B447EB3DF72D03C4D0785DB653A0B347EBBE7E531BD4CAFCB4161721A9E9EA7C769F7195466A7E92CD0BC4C0691DFA072BA3E6FBFC9207AB90D7FCB4169F7384D4E93EB6E3AA84E07B1C3E2B7BC0C2AD341E437FB2D2F83DAE1B33B0C0AB3D3F17AD805AAD2416577BA0D0A9341EDB43BDD06D9CBEE74B8CD02C5EBE939C8FD9ED3E5

I’m using VB6 to convert text to PDU. These are my routine to generate it:

‘====================Start of Code
Function ConvTxt(txt As String) As String
Dim i As Integer
Dim datArr1(1 To 256) As Byte

Dim l As Integer
Dim touw As String

‘no more than 160 chars
If Len(txt) > 160 Then txt = Left$(txt, 160)

l = Len(txt)

ConvTxt = Right$(“00″ & Hex(Len(txt)), 2)
For i = 1 To l
datArr1(i) = Asc(Mid$(txt, i, 1))
Next i

‘make a bit stream of septets
touw = “”
For i = 1 To l
touw = ToBin7(datArr1(i)) + touw
Next i

‘and convert it to octets
While Len(touw) > 8
ConvTxt = ConvTxt + Bin2Hex(Right$(touw, 8))
touw = Mid$(touw, 1, Len(touw) – 8)
Wend

ConvTxt = ConvTxt + Bin2Hex(touw)

Debug.Print ConvTxt
End Function

Function ToBin7(ByVal num As Byte) As String
‘convert to padded 7 place binary number
While num > 0
ToBin7 = Trim(num Mod 2) + ToBin7
num = num \ 2
Wend

ToBin7 = Right$(“0000000″ + ToBin7, 7)
End Function

Function Bin2Hex(ByVal touw As String) As String
‘convert binary to a padded 2 place hex number
Dim x As Integer
Dim num As Long

For x = 1 To Len(touw)
If Mid$(touw, x, 1) = “1″ Then
num = num + 2 ^ (Len(touw) – x)
End If
Next x

Bin2Hex = Right$(“00″ + Hex(num), 2)
End Function
‘===============End of Code

So, what’s wrong with my code? why the generated PDU different than yours?

Thanks.

#13 
Written By Jeff on January 23rd, 2010 @ 6:42 pm

Thanks u so much Jeroen, I have done it successfully, now I can send multi-part messages, without your help it was quit impossible.

Thanks again.

#14 
Written By Martin on January 24th, 2010 @ 1:37 pm

Hi Jeroen,

Nice algorithm, but how can we decode the concatenated SMS?

Thanks

#15 
Written By Diane on January 24th, 2010 @ 10:35 pm

Uhm Dear Jeroen, I have the same problem as Jeff. It’s good that I saw this section, so there, I hope you can help us convert the code to VB6, I am so very confused with the conversion from text to PDU part. *swirly eyes*

#16 
Written By RC L on January 25th, 2010 @ 1:53 am

is there anyone here have vb 6 codes for pdu mode, pls help.. i need it for my thesis.

#17 
Written By programmer on February 6th, 2010 @ 12:06 am

i need a help about UDH, in logica api am getting the length always 48. how to set octets.

#18 
Written By boopathi on February 18th, 2010 @ 10:29 am

For GSM sms, does this need to be addressed?

http://www.3gpp.org/ftp/Specs/html-info/23038.htm
From 3GPP TS 23.038 V9.1.1 (2010-02)

If the total number of characters to be sent equals
(8n – 1) where n=1,2,3 etc. then there are 7 spare bits at the end of the message. To avoid the situation where the receiving entity confuses 7 binary zero pad bits as the @ character, the carriage return or <CR;> character (defined in clause 6.1.1) shall be used for padding
in this situation, just as for Cell Broadcast.

If <CR> is intended to be the last character and the message (including the wanted ) ends on an octet boundary, then another <CR> must be added together with a padding bit 0. The receiving entity will perform the carriage return function twice, but this will not result in misoperation as the definition
of <CR> in clause 6.1.1 is identical to the definition of <CR><CR>. The receiving entity shall remove the final <CR> character where the message ends on an octet boundary with <CR> as the last character.

If not, then how is this case handled in gsm sms?

#19 
Written By Tom Esh on February 19th, 2010 @ 10:00 pm

Tom,

In my personal experience, all modern devices look at the UDL (User Data Length; which in the case of GSM-7 is expressed in septets) to determine the size of the payload. This avoids any confusion regarding padding bits being interpreted as ‘@’ and <CR> being ignored.

So to be short, as long as you have your UDL correct, this does not need to be handled (I know my code doesn’t handle this).

It is a common bug to first encode the payload and then determine the UDL from the number of octets that the encoded payload occupies. In this case the section you quote may save you.

Jeroen

#20 
Written By Jeroen on February 20th, 2010 @ 8:11 pm

Jeroen,

My complication is that I am actually trying to pack the text to be sent in an SMPP msg that will then
be sent via GSM to the handset. The length in SMPP is defined in octets.

Is this even valid? Or should the SMSC (the receiver of the SMPP msg) be the one doing the packing?

Thanks
Tom

#21 
Written By Tom on February 20th, 2010 @ 10:56 pm

If you use SMPP you don’t have to pack the characters, the SMSC will do it for you. Likewise you don’t have to worry about padding bits.

Cheers,
Jeroen

#22 
Written By Jeroen on February 22nd, 2010 @ 9:51 pm

Very good this C function for packing with padding.
Do you also have the reverse C function?

“How to UNpack septets characters into GSM-7 charset”?

#23 
Written By Joker on March 12th, 2010 @ 4:39 am

Hi,
here is PDU to TXT

/*
GSM-7 packing routing.
Written by Emil.E
emil.elazar@mail.huji.ac.il
Israel
*/

int SMS_GSMDecode(unsigned char *dest, const unsigned char *source,int length)
{
int i = 0, j=0, s=0;
int num_ext_bits = 1;
unsigned char ext_bits[7]={0,0,0,0,0,0,0};
unsigned char temp=0;

for (i = 0; i =8) num_ext_bits=1;
ext_bits[j] = source[s] >> (8-num_ext_bits);
temp = source[s]<>(num_ext_bits);
if(j>0)
dest[i] = (dest[i] << j) | ext_bits[j-1];

num_ext_bits++;
}

dest[i] = '';
return 1;
}

Best Regards,
Emil

#24 
Written By Emil E on June 2nd, 2010 @ 11:53 am

Hello Every one !
Its been great to read this blog , I have an issue that I couldn’t solve yet . I can not convert GSM-7 character est into septets , I need to know how this is done ?? please help me out , I need to know how this is done manually so that i could find out what am I doing wrong .

I have a text
“This is part 1.”

when it is encoded without padding bits it has the following encoded hex values
54 74 7A 0E 4A CF 41 F0 B0 9C 0E 8A B9 00

but when with padding is used then it has the following hex pattern
A8 E8 F4 1C 94 9E 83 A0 61 39 1D 14 73 01

I can not convert from gsm-7 character set to gsm-7 character set with encoding , I need to know that how ths is done , I used this algorithm mentioned in this blog but it doesnt seem to be working for this , please help me out what I am missing

Thanks in regards

#25 
Written By Minhaj on June 18th, 2010 @ 12:21 am

My UDH is
05 00 03 03 02 01
for first message

#26 
Written By Minhaj on June 18th, 2010 @ 12:24 am

I have make a project using vb for send conconated sms. It is working fine. But Problem is this, It is not send Unicode in Long Conconate SMS. Plz. help me for do this. Plz. tell me how to do this, Plz….. With PDU Example

#27 
Written By Anil on February 6th, 2011 @ 6:56 am

Hi everyone!

I have been reading these articles for a while and helped better understand the concatenation mechanism of an SMS.

I use the code below to create the UDH but it only works for UCS2 encoding.
While I understand the need to add 1 bit when GSM7 encoding is used and the CSMS reference number is stored on 1 octet, I don’t understand why the code below doesn’t work (instead of 1 concatenated message I receive the parts separately on my mobile phone) when I used a CSMS reference number stored on 2 octets.

Here’s the method I used to build the UDH. The parameters names are self-explanatory.

public byte[] BuildUDH(int CSMSRefNumber, int TotalParts, int PartNumber)
{
byte[] udh= null;
if (_largeCSMSRefNumber)
{
udh = new byte[7];
udh[0] = 0×6;
udh[1] = 0×8;
udh[2] = 0×4;
BitConverter.GetBytes((UInt16)CSMSRefNumber).CopyTo(udh, 3);
udh[5] = (byte)TotalParts;
udh[6] = (byte)PartNumber;
}
else
{
udh = new byte[6];
udh[0] = 0×5;
udh[1] = 0×0;
udh[2] = 0×3;
udh[3] = (byte)CSMSRefNumber;
udh[4] = (byte)TotalParts;
udh[5] = (byte)PartNumber;

}
return udh;
}

Thanks for any hint or advice you may have.
Cheers,
Florin

#28 
Written By Florin on April 18th, 2011 @ 4:42 am

Developed USSD Encoder and Decoder in C#.
Will soon upload on codeproject and google code respository

#29 
Written By Nitin Gupta on July 23rd, 2012 @ 3:23 am

many thanks, your algorithm saved my day, here are a direct translation to java that worked, I hope that will help someone

private String pduTxt;
private String pduTxtLen;

private boolean withUDH;

/*
GSM-7 packing routing.
Written by Jeroen @ Mobile Tidings (http://mobiletidings.com)
*/
private void SMS_GSMEncode(
byte[] inText /* Pointer to the GSM-7 characters. Note: I could
not have used a 0-terminated string, since 0
represents '@' in the GSM-7 character set */
)
{
pduTxt = new String();
int paddingBits = 0; /* If you use a UDH, you may have to add padding
bits to properly align the GSM-7 septets */

int in_text_lenght = inText.length;
// first of all set the msg lenght
if( this.withUDH == false )
{
pduTxtLen = hexa_2_string(in_text_lenght);
paddingBits = 0;
}
else
{
pduTxtLen = hexa_2_string(in_text_lenght + 7);

//paddingBits = ((UDHL + 1 ) * 8 ) % 7;
paddingBits = ((5 + 1 ) * 8 ) % 7;
if( paddingBits != 0 )
{
paddingBits = 7 - paddingBits;
}
}/* fim if-else */

int bits = 0;
int i;
int octet;
if( paddingBits != 0 )
{
bits = 7 - paddingBits;
pduTxt += hexa_2_string( inText[0] << (7 - bits) );
bits++;
}

for( i = 0; i > bits;
if( i < inText.length - 1 )
{
octet |= inText[i + 1] << (7 - bits);
}
pduTxt += hexa_2_string(octet);
bits++;
}/* fim for */
}

private String hexa_2_string( int hex )
{
hex = hex & 0x00FF;
String str1 = Integer.toHexString(hex); // convert # in string
if (str1.length() 2 )
{
str1 = str1.substring(0, 3);
}
return str1.toUpperCase();
}

#30 
Written By emerson on September 18th, 2012 @ 9:39 am

List septets = new List();
if (paddingBits != 0) septets.Add(0×00);

// range: ?????????
int range = paddingBits;
for (int n = offset; n < octets.Length; n++)
{
if ((octets[n] & 0×80) != 0×00)
throw new ArgumentException("Octet is invalid.", "octets");

if (range == 0)
{
septets.Add(octets[n]);
range = 7;
continue;
}

septets[septets.Count - 1] |= (byte)(octets[n] <> (7 – range)));
}

#31 
Written By sognami on June 26th, 2013 @ 4:04 pm

Add a Comment

required, use real name
required, will not be published
optional, your blog address

Next Post: