Top 4 Data-Quality Mistakes to Avoid on Your Legacy Data Stack

    In a rush for the modern data stack, don’t forget about legacy

    Photo by Yassine Khalfalli on Unsplash

    I have read and written many Modern Data Stack (MDS) articles, and I am so glad that, as an industry, we are taking the challenge of Big Data with its even bigger problems head-on. But one thing we seem to be missing is understanding how we at least “keep the lights on” the Legacy Data Stack (LDS) whilst migrating to the MDS.

    Whilst moving to the MDS, most of your customers continue to be served by your legacy data stack. In fact, the customers do not care about which stack they are on; they care about the service you are providing them. Speaking about legacy on-premise data stack is almost blasphemous in the current “latest trend” discussions. Shouldn’t you just be spending your time migrating to the MDS?

    Many companies are undergoing huge transformation projects, especially large corporate organisations; whilst these mammoth and often ill-fated transformations continue, do we neglect the LDS? As clearly, every penny spent will be “on legacy” or “regret spend”. We need to find a balance in this approach.

    Let’s dive in.

    1. Not profiling data at the source

    Before you think about migrating to the MDS, you should spare a moment’s thought to understand the quality of the data in the existing LDS. Profiling the data at the source will provide a clear understanding of known and unknown Data Quality (DQ) issues.

    Profiling the data will also help gauge the time it would take to fix DQ issues and hence the time it would take to migrate to the MDS with better quality data. The lack of profiling is a surefire way of having an adverse customer impact and failure of the MDS migration.

    What can you do?

    Check for common DQ issues such as uniqueness, completeness, and accuracy. Implement a simplistic dashboard to trend the quality over a period of time to determine whether the quality is deteriorating or improving. Work out a plan to fix those issues and the impact of this on your MDS migration timelines.

    2. Not fixing known DQ issues

    LDS are known to be inflexible behemoths, managed by bureaucratic processes and not even remotely agile. Hence fixing known DQ issues takes a long time and specialist knowledge. Instead, the approach of dumping data in the MDS and fixing it in the future is sought. We all know that fixing in the future doesn’t happen. The project team moves on; limited documentation is produced, and a specialist engineer with the knowledge retires. Whilst the executives rejoice over the fact that “we have migrated to the MDS”.

    What can you do?

    If there are DQ issues such as format inconsistencies or duplicated data, fixing the issue directly at the source will have huge benefits. Firstly, this will reduce the adverse customer impact; secondly, the migration will become easier as you will have better data to take into MDS; thirdly, you can start your insight journey in the MDS much more quicker.

    3. Under-investing in LDS

    The minute the strategy is agreed upon for the MDS, it is as if LDS becomes an unwanted child that no one wants to love. As harsh as this sounds, this is how organisations deal with legacy. Of course, LDS will be phased out / decommissioned in the next 2–4 years, but did you think you will serve your customers on this platform for another two years? Would you be happy providing poor customer service for two solid years? There is a likelihood that you won’t have many customers left after that period.

    What can you do?

    Continuous investment in LDS to ensure the data is of good quality and issues are resolved timely will be paramount to avoid adverse customer impact. If DQ issues cannot be resolved in the LDS due to a system inflexibility, then planning to migrate this data to MDS and finding a way to fix there would be critical. Creating technical debt (that’s never repaid) due to DQ issues is a recipe for disaster.

    4. Expecting MDS to be a catch-all for DQ issues

    MDS is not the holy grail you are expecting it to be. If your organisation has previously suffered from poor quality data, implementing MDS will make those poor DQ impacts visible quickly. Simply dumping old data into the new stack will continue to cause errors in your ML / AI models and your decision-making.

    What can you do?

    Understanding the problem is half the solution itself: realising that the basics have to be in place from the start. A solid DQ framework that works across both LDS and MDS has the right roles and responsibilities and has effective processes of finding, triaging, and remediating issues will be the winner.


    I am sure we are guilty of strategising against and neglecting the legacy data stack in the midst of the next shiny object. The critical thing you want is to find some balance. If this has resonated with you, please share your thoughts by leaving a comment below.

    If this was too much legacy talk for you, feel free to check out some latest Data Architecture trends:

    If you are not subscribed to Medium, consider subscribing using my referral link. It’s cheaper than Netflix and objectively a much better use of your time. If you use my link, I earn a small commission, and you get access to unlimited stories on Medium.

    I also write regularly on Twitter; follow me here.

    Top 4 Data-Quality Mistakes to Avoid on Your Legacy Data Stack Republished from Source via

    Recent Articles


    Related Stories

    Stay on op - Ge the daily news in your inbox