Tag Archives for entity identification

Entity Extraction – URL

  For this entity extraction task my goal is to write a simple regex rule  to identify the most common URLs from the text documents. Example: http://shakthydoss.com , https://support.company.com , http://172.16.7.41/home/ , http://172.6.7.41/home?name=shakthydoss&year=2013    As said earlier I took time to understand the structure, that URL is composed of. Every URL consists of the following units: The schema name (commonly called […]

Entity Extraction – Email id

  Goal is to write a perfect and easiest way to identify email ids from the text documents. I am going to use regular expression and define rule for strings (email id) i am looking for. Example: shakthydoss@gmail.com, student-244722@wilp.bits.pilani.edu.com, gns4f-3895494981@sale.craigslist.org.    Before blindly start writing some junk regx rule I took time to understand the format that email ids […]