
{"id":248999,"date":"2022-11-24T09:56:00","date_gmt":"2022-11-24T08:56:00","guid":{"rendered":"https:\/\/www.altermes.fr\/clean-and-homogenize-data-prior-to-visualization-data-cleansing\/"},"modified":"2023-12-14T12:36:50","modified_gmt":"2023-12-14T11:36:50","slug":"clean-and-homogenize-data-prior-to-visualization-data-cleansing","status":"publish","type":"post","link":"https:\/\/www.altermes.fr\/en\/clean-and-homogenize-data-prior-to-visualization-data-cleansing\/","title":{"rendered":"Clean and homogenize data prior to visualization (Data Cleansing)"},"content":{"rendered":"\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<p>Now more than ever, rapid access to information is essential for making the right decisions and managing your business. But more than the quantity of data, it&#8217;s the quality of the data that provides reliable, easy-to-analyze information for results that are as close to reality as possible. That&#8217;s why data <em>cleansing<\/em> is an important step for any company wishing to digitalize these processes.<\/p>\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<h2 class=\"wp-block-heading\">What is Data Cleansing?<\/h2>\n\n<p>Before<a href=\"https:\/\/www.altermes.fr\/en\/integrating-data-into-business-intelligence\/\">integrating information into Business Intelligence tools<\/a>, it is essential to ensure that it is correct, to avoid analysis errors that can have disastrous consequences on decision-making.<\/p>\n\n<p>Data comes from multiple sources, both external and internal, and is most often stored in its raw state in a data lake or in databases. The information must therefore be cleaned and homogenized between storage and integration, to guarantee the quality of the input data.<\/p>\n\n<figure class=\"wp-block-image aligncenter size-large is-resized wp-duotone-000000-ffffff-1\"><img decoding=\"async\" src=\"https:\/\/media.giphy.com\/media\/G1ifnX4d5tYFACktp9\/giphy.gif\" alt=\"\" width=\"197\" height=\"197\"><\/figure>\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<h2 class=\"wp-block-heading\">What are the most common data errors?<\/h2>\n\n<p>There are 3 main types of error: syntactic, semantic and coverage.<\/p>\n\n<h3 class=\"wp-block-heading\">Syntax errors<\/h3>\n\n<p>These can range from typos to the use of the wrong format or unit system.<\/p>\n\n<h4 class=\"wp-block-heading\"><strong><u>Examples:<\/u><\/strong><\/h4>\n\n<ul class=\"wp-block-list\">\n<li>An order for 120 units becomes 210 units<\/li>\n\n\n\n<li>A delivery time that goes from March 8 (8\/3) to August 3 (3\/8): common when working with Anglo-Saxon countries<\/li>\n\n\n\n<li>A 640 mm dimension interpreted as 640 cm<\/li>\n<\/ul>\n\n<h3 class=\"wp-block-heading\">Semantic errors<\/h3>\n\n<p>They are frequent when data comes from forms filled in by third parties. The errors of :<\/p>\n\n<ul class=\"wp-block-list\">\n<li>contradiction (age does not match date of birth)<\/li>\n\n\n\n<li>duplication (the same information is repeated)<\/li>\n\n\n\n<li>formatting (inversion of first and last names)<\/li>\n\n\n\n<li>disability (a bank account instead of a VAT number)<\/li>\n<\/ul>\n\n<h3 class=\"wp-block-heading\">Coverage errors<\/h3>\n\n<p>This term covers all errors linked to missing data. It can be :<\/p>\n\n<p>&#8211; a value, if any of the required information is missing<br\/>&#8211; a whole field, when an entire column of information has not been recorded.<\/p>\n\n<p>All these errors, even if they are individually rare, add up and spread throughout databases if care is not taken to clean up data properly.<\/p>\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<h2 class=\"wp-block-heading\">How do you clean data?<\/h2>\n\n<p>As always, before embarking on a data cleansing operation, it&#8217;s important to take a step back to look at the big picture and set goals. It is then possible to implement a step-by-step data homogenization process:<\/p>\n\n<ol class=\"wp-block-list\" type=\"1\">\n<li>Error monitoring<\/li>\n\n\n\n<li>Process standardization<\/li>\n\n\n\n<li>Data correction and validation<\/li>\n\n\n\n<li>Cleaning up duplicates<\/li>\n\n\n\n<li>Data analysis<\/li>\n<\/ol>\n\n<p>Each of these stages requires the involvement of different departments within the company, so excellent communication between all project members is essential.<\/p>\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/www.altermes.fr\/wp-content\/uploads\/2023\/07\/4-1024x204.jpg\" alt=\"\" class=\"wp-image-247725\" width=\"807\" height=\"159\" srcset=\"https:\/\/www.altermes.fr\/wp-content\/uploads\/2023\/07\/4-1280x255.jpg 1280w, https:\/\/www.altermes.fr\/wp-content\/uploads\/2023\/07\/4-980x195.jpg 980w, https:\/\/www.altermes.fr\/wp-content\/uploads\/2023\/07\/4-480x96.jpg 480w\" sizes=\"(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) and (max-width: 980px) 980px, (min-width: 981px) and (max-width: 1280px) 1280px, 100vw\" \/><\/figure>\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<h2 class=\"wp-block-heading\">Data cleansing tools<\/h2>\n\n<p>It is unrealistic to think of homogenizing a database manually:<\/p>\n\n<ul class=\"wp-block-list\">\n<li>Too much information to process<\/li>\n\n\n\n<li>The risk of error is too high<\/li>\n<\/ul>\n\n<p>Today, there are many software tools specifically developed for data cleansing. These are powered by advanced algorithms, allowing settings to be tailored to the specific needs of each company.<\/p>\n\n<p>Among the best-known data cleansing software are :<\/p>\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/winpure.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Winpure<\/a>, one of the most popular software packages used by many large multinational companies. It has the advantage of being multilingual, and of being able to clean data directly inside the database, thanks to its compatibility with numerous formats.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.ibm.com\/products\/infosphere-qualitystage\" rel=\"nofollow noopener\" target=\"_blank\">IBM Infosphere Quality Stage<\/a>, often considered one of the best data cleansing software packages, stands out for its ease of use and the overview it provides.<\/li>\n\n\n\n<li>The lesser-known <a href=\"https:\/\/pl.quadient.com\/en\/resources\/quadient-datacleaner\" rel=\"nofollow noopener\" target=\"_blank\">Quadient Data Cleaner<\/a> is a so-called &#8220;data profiling&#8221; software program that removes duplicates and analyzes trends. It is highly configurable in terms of cleaning rules.<\/li>\n\n\n\n<li><a href=\"https:\/\/dataladder.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Data Ladder<\/a>, which comes in two forms: Data Match, an affordable but limited version, and Data Match Enterprise, which benefits from all the advances in AI and Machine Learning to cleanse up to 100 million data sets. It&#8217;s one of the fastest and most accurate in the industry.<\/li>\n\n\n\n<li><a href=\"https:\/\/docs.tibco.com\/pub\/clarity\/2.0.1\/doc\/html\/GUID-B611C543-3218-4EFF-BC79-C9A31BA5D670.html\" rel=\"nofollow noopener\" target=\"_blank\">Tibco Clarity<\/a>, a SaaS tool, has the advantage of being accessible via the Internet.<\/li>\n\n\n\n<li><a href=\"https:\/\/openrefine.org\/\" rel=\"nofollow noopener\" target=\"_blank\">Open Refine<\/a>, previously known as Google Refine, is a free, open-source data cleansing tool. It&#8217;s efficient and easy to use.<\/li>\n<\/ul>\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/www.altermes.fr\/wp-content\/uploads\/2022\/11\/Data-cleansing-1024x683.jpg\" alt=\"Data cleansing\" class=\"wp-image-247801\" width=\"512\" height=\"342\"\/><\/figure>\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<h2 class=\"wp-block-heading\">Implementing data cleansing in your company<\/h2>\n\n<p>It&#8217;s a project in its own right, and needs to be carried out in an organized way to bear fruit.<\/p>\n\n<p>From the definition of requirements to the choice of Data Cleansing software, upstream work is essential to the smooth running of the project and its success.<\/p>\n\n<p>During the actual implementation phase, various settings and adjustments are required to adapt to the reality of the company and the data used, which calls for technical skills.<\/p>\n\n<p>Last but not least, user training is a mission not to be neglected if we are to reap the full benefits of this data cleansing and homogenization process.<\/p>\n\n<p>It&#8217;s advisable to enlist the help of specialists who can answer your questions and suggest the most appropriate solutions.<\/p>\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n<p>\u00a0<\/p>\n\n<p>\ud83d\udc46 You have a data visualization project and needed to clean up your data, <a href=\"https:\/\/www.altermes.fr\/en\/contact\/\">call on the Alterm\u00e8s teams<\/a> to support you!<\/p>\n\n<p>\ud83d\udd0e Find out more about our<a href=\"https:\/\/www.altermes.fr\/en\/technological-innovation\/\">technological innovation<\/a> offers!<\/p>\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n<figure class=\"wp-block-image aligncenter size-large is-resized is-style-default\"><img decoding=\"async\" src=\"https:\/\/media.giphy.com\/media\/X8GcOQJJxRphFRr3kC\/giphy-downsized-large.gif\" alt=\"\" width=\"278\" height=\"278\"><\/figure>","protected":false},"excerpt":{"rendered":"<p>Cleansing and homogenizing input data is an important step in improving the quality of analyses: this is data cleansing.<\/p>\n","protected":false},"author":4,"featured_media":247802,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_et_pb_use_builder":"off","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[53],"tags":[],"class_list":["post-248999","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-digital-transformation"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/posts\/248999","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/comments?post=248999"}],"version-history":[{"count":1,"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/posts\/248999\/revisions"}],"predecessor-version":[{"id":249000,"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/posts\/248999\/revisions\/249000"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/media\/247802"}],"wp:attachment":[{"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/media?parent=248999"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/categories?post=248999"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.altermes.fr\/en\/wp-json\/wp\/v2\/tags?post=248999"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}