Tag Archives for Information Extraction

Entity Extraction – URL

  For this entity extraction task my goal is to write a simple regex rule  to identify the most common URLs from the text documents. Example: http://shakthydoss.com , https://support.company.com , http://172.16.7.41/home/ , http://172.6.7.41/home?name=shakthydoss&year=2013    As said earlier I took time to understand the structure, that URL is composed of. Every URL consists of the following units: The schema name (commonly called […]

Entity Extraction – Email id

  Goal is to write a perfect and easiest way to identify email ids from the text documents. I am going to use regular expression and define rule for strings (email id) i am looking for. Example: shakthydoss@gmail.com, student-244722@wilp.bits.pilani.edu.com, gns4f-3895494981@sale.craigslist.org.    Before blindly start writing some junk regx rule I took time to understand the format that email ids […]

Information Extraction ( Phone numbers ) from free running text

Prologue Hi I was recently given few assignments by my master  திரு. சுதர்சன் சாந்தியப்பன் (visiting professor of SRM University), on completion of assignments he advised us to publish our works on internet. And here by I do the same with my solutions. Objective: To Extracting Phone numbers from resumes. Assumption: All documents are unstructured Predictable […]

Information Extraction ( Name and email ID ) from free running text

Prologue Hi I was recently given few assignments by my master  திரு. சுதர்சன் சாந்தியப்பன் (visiting professor of SRM University), on completion of assignments he advised us to publish our works on internet. And here by I do the same with my solutions. Objective of this post is to extract name and mail id from free […]