1) Presentation
When making a SunOne DS to RedHat-DS migration with different charset: Redhat DS is using UTF-8 charset. It means ldif import files used by RH-DS have to be UTF-8. This is even more true, when it comes to binary data, which ought to be encoded using using UTF-8.
2) How is it possible to get the value of the charset
The command locale gives you teh charset used on your platform. It is recorded in the LANG environment variable
locale LANG=fr_FR.UTF-8
Discrepancy is coming from the following
1) SunOne charset is iso_8859 (1byte)
2) RH-DS does only accept UTF-8 (2 bytes)
3) Binary values are encoded on SunOne using iso_8859 charset.
When reading a ldif import coming from sunone, RH-DS import is blowing up indicating a violation with a message such as « violates attribute syntax »
3) How to fix it
The fix is quite tricky as for binary value it consists of providing the following operations:
-a) read the encoded binary value (using iso8859 charset)
-b) decode the binary value, which has been read
-c) reencode the value in binary using UTF8
At upper level, it makes binary attributes values and binary acis values which were iso_8859 charset encoded to be replaced by their corresponding UTF8 in the ldif file.
d) With such a transformation ldif import will now behaves fine using rh-ds
You can write such a transformation using java for example.
Example ========= 1) ISO_8859 encoding ===================== The cn encoded in iso_8859 is François Rivat ---> the binary value is cn:: RnJhbudvaXMgUml2YXQ= This corresponds to entry # entry-id: 10 dn: uid=frivat,ou=People,dc=example,dc=com uid: frivat givenName: francois objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetorgperson sn: rivat cn:: RnJhbudvaXMgUml2YXQ= creatorsName: uid=admin,ou=Administrators,ou=TopologyManagement,o=NetscapeRoot modifiersName: uid=admin,ou=Administrators,ou=TopologyManagement,o=NetscapeRoot createTimestamp: 20181211091901Z modifyTimestamp: 20181211091901Z nsUniqueId: cb3a1a01-fd2511e8-ad1fc1a5-ec63facb 2) ldif import with ISO_8859 encoding ====================================== The entry is rejected ---> [13/Dec/2018:16:38:42.460516110 +0100] - WARN - import_producer - import userRoot: Skipping entry "uid=frivat,ou=People,dc=ovh,dc=net" which violates attribute syntax, ending line 168 of file "/tmp/test2_8859.ldif" ldif2db -Z host-2389 -n userRoot -i /tmp/test2_8859.ldif importing data ... [13/Dec/2018:16:38:42.090160723 +0100] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000 [13/Dec/2018:16:38:42.109035769 +0100] - INFO - dblayer_instance_start - Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [13/Dec/2018:16:38:42.113566568 +0100] - INFO - check_and_set_import_cache - pagesize: 4096, available bytes 6758842368, process usage 37781504 [13/Dec/2018:16:38:42.121801510 +0100] - INFO - check_and_set_import_cache - Import allocates 2640172KB import cache. [13/Dec/2018:16:38:42.251224607 +0100] - INFO - import_main_offline - import userRoot: Beginning import job... [13/Dec/2018:16:38:42.254042367 +0100] - INFO - import_main_offline - import userRoot: Index buffering enabled with bucket size 100 [13/Dec/2018:16:38:42.456513364 +0100] - INFO - import_producer - import userRoot: Processing file "/tmp/test2_8859.ldif" [13/Dec/2018:16:38:42.460516110 +0100] - WARN - import_producer - import userRoot: Skipping entry "uid=frivat,ou=People,dc=ovh,dc=net" which violates attribute syntax, ending line 168 of file "/tmp/test2_8859.ldif" [13/Dec/2018:16:38:42.463281395 +0100] - INFO - import_producer - import userRoot: Finished scanning file "/tmp/test2_8859.ldif" (9 entries) [13/Dec/2018:16:38:42.959570547 +0100] - INFO - import_monitor_threads - import userRoot: Workers finished; cleaning up... [13/Dec/2018:16:38:43.162926039 +0100] - INFO - import_monitor_threads - import userRoot: Workers cleaned up. [13/Dec/2018:16:38:43.165695678 +0100] - INFO - import_main_offline - import userRoot: Cleaning up producer thread... [13/Dec/2018:16:38:43.168191021 +0100] - INFO - import_main_offline - import userRoot: Indexing complete. Post-processing... [13/Dec/2018:16:38:43.170430668 +0100] - INFO - import_main_offline - import userRoot: Generating numsubordinates (this may take several minutes to complete)... [13/Dec/2018:16:38:43.176801092 +0100] - INFO - import_main_offline - import userRoot: Generating numSubordinates complete. [13/Dec/2018:16:38:43.179294550 +0100] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs... [13/Dec/2018:16:38:43.181680973 +0100] - INFO - ldbm_get_nonleaf_ids - import userRoot: Finished gathering ancestorid non-leaf IDs. [13/Dec/2018:16:38:43.190312574 +0100] - INFO - ldbm_ancestorid_new_idl_create_index - import userRoot: Creating ancestorid index (new idl)... [13/Dec/2018:16:38:43.193173380 +0100] - INFO - ldbm_ancestorid_new_idl_create_index - import userRoot: Created ancestorid index (new idl). [13/Dec/2018:16:38:43.195769472 +0100] - INFO - import_main_offline - import userRoot: Flushing caches... [13/Dec/2018:16:38:43.198300399 +0100] - INFO - import_main_offline - import userRoot: Closing files... [13/Dec/2018:16:38:43.246529362 +0100] - INFO - dblayer_pre_close - All database threads now stopped [13/Dec/2018:16:38:43.248973794 +0100] - INFO - import_main_offline - import userRoot: Import complete. Processed 9 entries (1 were skipped) in 1 seconds. (9.00 entries/sec) 3) Transforming ldif Iso_8859 format to UTF8 format =================================================== We run the java parser to transform the value. Binary value are converted is_8859 charset to uf8 charset binary encoding for François Rivat is transformed as follows: ISO_8859 cn:: RnJhbudvaXMgUml2YXQ= UTF8 cn:: RnJhbsOnb2lzIFJpdmF0 4) Running the parser ===================== The binary parser is run as follows java parsebinaryldif test2_8859.ldif test2_utf8_bin.ldif 5) Ldif File (UTF8 format) ========================== As can be seen, the binary cn value has been updated with the new encoding after having run the parser # entry-id: 10 dn: uid=frivat,ou=People,dc=example,dc=com uid: frivat givenName: francois objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetorgperson sn: rivat cn:: RnJhbsOnb2lzIFJpdmF0 creatorsName: uid=admin,ou=Administrators,ou=TopologyManagement,o=NetscapeRoot modifiersName: uid=admin,ou=Administrators,ou=TopologyManagement,o=NetscapeRoo t createTimestamp: 20181211091901Z modifyTimestamp: 20181211091901Z nsUniqueId: cb3a1a01-fd2511e8-ad1fc1a5-ec63facb 7) Successful ldif import - UTF8 format ======================================== Now, as the binary value has been fixed, the import can succeed quietly ldif2db -Z host-2389 -n userRoot -i /tmp/test2_utf8_bin.ldif importing data ... [13/Dec/2018:16:40:26.885890518 +0100] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000 [13/Dec/2018:16:40:26.909416917 +0100] - INFO - dblayer_instance_start - Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [13/Dec/2018:16:40:26.913494377 +0100] - INFO - check_and_set_import_cache - pagesize: 4096, available bytes 6745919488, process usage 38993920 [13/Dec/2018:16:40:26.915705239 +0100] - INFO - check_and_set_import_cache - Import allocates 2635124KB import cache. [13/Dec/2018:16:40:27.038299523 +0100] - INFO - import_main_offline - import userRoot: Beginning import job... [13/Dec/2018:16:40:27.040931875 +0100] - INFO - import_main_offline - import userRoot: Index buffering enabled with bucket size 100 [13/Dec/2018:16:40:27.242589479 +0100] - INFO - import_producer - import userRoot: Processing file "/tmp/test2_utf8_bin.ldif" [13/Dec/2018:16:40:27.247774026 +0100] - INFO - import_producer - import userRoot: Finished scanning file "/tmp/test2_utf8_bin.ldif" (10 entries) [13/Dec/2018:16:40:27.745407174 +0100] - INFO - import_monitor_threads - import userRoot: Workers finished; cleaning up... [13/Dec/2018:16:40:27.950110230 +0100] - INFO - import_monitor_threads - import userRoot: Workers cleaned up. [13/Dec/2018:16:40:27.954113625 +0100] - INFO - import_main_offline - import userRoot: Cleaning up producer thread... [13/Dec/2018:16:40:27.957192674 +0100] - INFO - import_main_offline - import userRoot: Indexing complete. Post-processing... [13/Dec/2018:16:40:27.959937709 +0100] - INFO - import_main_offline - import userRoot: Generating numsubordinates (this may take several minutes to complete)... [13/Dec/2018:16:40:27.967695638 +0100] - INFO - import_main_offline - import userRoot: Generating numSubordinates complete. [13/Dec/2018:16:40:27.971176394 +0100] - INFO - ldbm_get_nonleaf_ids - import userRoot: Gathering ancestorid non-leaf IDs... [13/Dec/2018:16:40:27.973927796 +0100] - INFO - ldbm_get_nonleaf_ids - import userRoot: Finished gathering ancestorid non-leaf IDs. [13/Dec/2018:16:40:27.986313595 +0100] - INFO - ldbm_ancestorid_new_idl_create_index - import userRoot: Creating ancestorid index (new idl)... [13/Dec/2018:16:40:27.990063161 +0100] - INFO - ldbm_ancestorid_new_idl_create_index - import userRoot: Created ancestorid index (new idl). [13/Dec/2018:16:40:27.992770265 +0100] - INFO - import_main_offline - import userRoot: Flushing caches... [13/Dec/2018:16:40:27.995290425 +0100] - INFO - import_main_offline - import userRoot: Closing files... [13/Dec/2018:16:40:28.062282718 +0100] - INFO - dblayer_pre_close - All database threads now stopped [13/Dec/2018:16:40:28.065680271 +0100] - INFO - import_main_offline - import userRoot: Import complete. Processed 10 entries in 1 seconds. (10.00 entries/sec)
4) Using a parser
We internally developed a parser to cope with these charset burden, if you face such problem and are interested for assitance to deploy this parser don’t hesitate to contact us.
- New Keycloak online training - 19 janvier 2022
- Sizing Keycloak or Redhat SSO projects - 8 juin 2021
- Keycloak.X Distribution - 28 janvier 2021