media update's Aisling McCarthy looks at what entity extraction is, and how it can be used.

Entity extraction is a component of machine learning, which is used to identify different objects within text. It can be used to quickly extract which people, places, brands and products are being referred to within a media clip. It is a very valuable media intelligence tool, which can assist in compiling reports, and strategy planning.

Entities within a text are the people, companies, products, and concepts referred to within a particular text. The process of entity extraction uses machine-learning algorithms, which are trained to automatically find the names of people, places, organisations, and products.

For example, in the sentence, “John began work at Coca-Cola in 2010”, a machine learning algorithm would identify it as follows: John [Person] began work at Coca-Cola [Organisation] in 2010 [Time].

What else can entity extraction do?

These machine-learning algorithms can also be used for entity linking, also known as co-referencing, which can identify multiple words that refer to the same entity. In a text, entities are not always referred to by the same name each time. For example, “the CEO”, “John” and “Smith” all refer to a CEO, John Smith. Pronouns, in this case “he” and “his”, are also identified as referring to the entity.

Once trained, the algorithms can also start to understand contextual referencing. Leveraging co-referencing, the algorithms can understand what portion of an article discusses a single entity. Media intelligence companies can use contextual referencing technology to show their clients which sections of media clippings concern the topic of the clients’ interests.

Want to stay up to date with the latest media news? Subscribe to our newsletter.

Many brands are just starting to invest in artificial intelligence technology, but brand tracking company Newsclip has been developing this technology since 2011. Read more in our article, Bringing innovation in-house is vital in the AI age