Why data wrangling is important in machine learning

1 Dec 2017 13:00Media 1318

Machine learning helps make sense of large amounts of data. However, if the data used is not neat, tidy, and relevant, the results the system produces will be flawed. This is why effective data wrangling is a key building block for an effective machine learning system.

Why data wrangling is important in machine learning

media update's Adam Wakefield found out what data wrangling is and why, without it, the insights produced by machine learning would not be as effective as it could be.

Data wrangling cleans up disorganised data

Imagine walking into a clothing store where all the clothes are mixed up on different shelves and racks. It would take a shopper a long time to find what they want because they wouldn’t be sure which clothing is where.

This is similar to what happens to data when it is not categorised or stored correctly. The process of taking messy data, and making it easy to use and find, is called data wrangling. When this happens, it is important to know which data is relevant to your goal or task, and which data is not.

According to Mohammed Farooq, general manager of IBM Brokerage Services, which specialises in IT resource management across cloud models, data wrangling is very important to get trustworthy insights out of data.

“If data wrangling is cleaning up the mess, then don’t you think the data has to be accurate to get valuable insights?” Farooq asks.

“Businesses rely on data scientists to understand data and bring in leads and customers. It, thereby, is a crucial step in sorting the relevant data from the least necessary data. Trustworthy data becomes a necessity in this case.”

Farooq says data wrangling provides credibility to data by identifying the exact data sets that are needed to find solutions, pick data that is recent and consistent with the problem at hand, and accounts for in changing technical and social factors.

Data wrangling also simplifies processes and provides actionable insights, and makes it easy to explain data to employees and stakeholders, he says.

A machine learns to do things right at the start with clean, organised data

Machine learning is the training of machines, where algorithms learn from historical data given to it by humans. Importantly, the amount of historical data given to a machine in the early stages of its learning process must be of a large enough quantity so correlations can be created and results validated.

However, what happens when a machine is fed – especially in these critical early stages – data that has not been wrangled or cleaned?

This is the same as teaching a child the English alphabet the wrong way round. Every time the child would try to write a word, the word would be incorrectly spelt because the base knowledge they were given at the very beginning was incorrect.

The same applies to machine learning. If a machine is taught with poor quality data, the results and insights it produces will be flawed, because the data given to it in the very beginning was flawed from the start.

This is why data wrangling is so important in machine learning. It removes any potential problems that can affect the insights produced by machine learning before they have had a chance to take root.

Want to stay up to date with the latest media news? Subscribe to our newsletter.

Machine learning is able to gather meaning from words through entity extraction. Read more in our article, What is entity extraction?

Data wrangling machine learning data insight

Search

Recent searches

Follow Us

About Us

Disclaimer

Terms & Conditions

My Profile

My Notifications

My Contributions

My Bookmark

My Comments

My Audio History

Download App

Log Out

Why data wrangling is important in machine learning

Machine learning helps make sense of large amounts of data. However, if the data used is not neat, tidy, and relevant, the results the system produces will be flawed. This is why effective data wrangling is a key building block for an effective machine learning system.

Data wrangling cleans up disorganised data

A machine learns to do things right at the start with clean, organised data

About the author

Rate this article as Short, Tall or Grande.

Comments

Related News

		[email protected]
		011 288 6600 (Johannesburg)
		087 310 8438 (Cape Town)