Information Extraction Software 

General Introduction to Information Extraction Software

Colander is a cooking utensil with perforations for draining off liquids and rinsing food. In dealing with Word documents we also need a tool like colander to filter out the contents we want.

Information extraction software is a tool just like colander which can read specified word documents in quantity, and convert specified contents into files in Excel format. When you need repeatedly copy and paste contents from word document into a new table, the information extraction software can work for you. How to Purchase

There are two ways of using the information extraction software: the users purchase the information extraction software,read the contents from word document by themselves, or entrust Yuantai data management company to process the information extraction. Users can also freely experience the service of information extraction online. Operation Direction

To experience online extraction, your computer needs the following operating environment:
1. .net Framework 2.0
2. Allow the browser to download ActiveX controls; understand the operational environment Settings
3. Download installation files, install extraction software. Download
If your computer meets the above conditions, you can experience extraction software directly, with one document each time and just 10 times one day.

If you want to experience such information extraction software permanently, you can purchase this service online. Long-term application methods

Application
University library builds papers’ index with information extraction software from students who submit their graduation theses;
Government agency generates roster or list with information extraction software from the application forms submitted from other agencies;
Large regional company builds some new statistical reports with information extraction software after receiving the reports from its branch companies;
Government agencies can automatically generate the table of government public information by information extraction software when submitting it.

Application Examples
A personnel administration department, which is responsible for checking submitted application forms on talented persons’ training, will issue the relevant funds after careful verification. The submitted materials usually cover two sets with one in paper and one in electronic version. To prepare for the experts’ assessment, department staff has to draw up a roster which contains the entire names of applicants. Normally, the office worker needs to open every application form, to look for the applicant's name, sex, education, unit, training condition, and the applicant's background etc. The staff must spend 3 to 5 minutes to finish an application form, constantly copying and pasting, and he will spend a whole day for dealing with 100 such applications. Besides, intense concentration is required, otherwise long time will be spent if the stuff also answers the phone or handles other business meanwhile. Yet it can be finished within a few minutes with information extraction software, which automatically generates a roster in Excel format.
University library needs to receive graduation theses, build index for future retrieval. But each year thousands of paper workload does discourage the staff, who can easily accomplish this job by using the information extraction software. University library can install a paper receiving entry, such as E-mail or FTP upload window, and after receiving students’ uploading papers, it can automatically generate theses index via information extraction software.

A branch company regularly needs to write a performance report in Word format, containing statistics, and its superior company collects these data, generates report form in Excel format for further statistical analysis. Manual data collecting needs constantly to open every performance report and copy its contents into a new form, which consumes a lot of time. With information extraction software, the related contents can be automatically read.

According to Government Regulations on Public Information, various administration departments should submit public files for government information in public viewing places such as archives, libraries etc. Archives and other departments will make catalogue for those files for future retrieval. The workload will be hard if the documents are huge. Using information extraction software can automatically process the received electronic documents and generate machine-readable catalogue.

To correctly use information extraction software, a document template with embedded XML tags should be used. Such document template can be designed by users themselves, or entrusted by Yuantai Company.

Example: University library reads related contents from a graduation thesis and generates index.
Example: two students’ papers in Word document.

Information Exchange Based on the Metadata

Tomas

College Name: Computer College
Speciality:E-Commerce
Grade: 2006
Instructor:Brown
Date:June 10,2010

XML Application in Document Management

Henry

College Name:Computer College
Speciality:Software Engineering
Grade: 2005
Instructor: James
Date:June 10,2009

Example: Using information extraction software can automatically generate papers index in Excel format.
Papers Index


Name

Title

College

Speciality

Grade

Instructor

Date

Tomas Information Exchange Based on the metadata Computer College E-Commerce 2006 Brown Aug,
2010.
Henry XML Application in Document Management Computer College Software Engineering 2005 James Aug,
2009

 

 

 

 

 

 

Example: A government department reads contents from application forms and generates a roster. The following are two application forms.

Funding Application Form

Name John Smith Application category Private Graduation College Peking University    Photo
Gender male  Nationality Han Id/Passport Number 191010101
010101010
Date of Birth Nov 22, 1977 Birthplace Xujiahui District, Shanghai
Application project Material Engineering Research Present Technical Positions Senior engineer Total Sum 20,000

 

Funding Application Form

Name Jean Truman Application category Going Abroad Graduation College Liaoning University Photo
Gender female Nationality Korean Id/passport number 111010101
010101020
Date of Birth Dec1, 1978 Birthplace Shenyang City, Liaoning Province
Application Project Agricultural Statistics Comparison Present Technical Position Senior Engineer Total Sum 40,000

In the past when we fill in these application forms, we usually copy the contents of each item by hand, and paste them in the corresponding column. The mouse clicks between the Word and the Excel repeatedly, copying and pasting all the time. Why not use information extraction software to process them automatically?

Use information extraction software to automatically read relevant content and generate a roster.

Roster

Name Application Category Graduation colleges Gender Nationality Date of Birth Birthplace Application Project Present Technical position Total
Sum
John Smith Private Peking University Male Han Nov,22, 1977 Xujiahui district ,Shanghai Material Engineering research Senior Engineer 20,000
Jean Truman  Going Abroad Liaoining University Female Korean Dec 1, 1978 Shenyang City, Liaoning province Agricultural statistics comparison Senior engineer 40,000

 

 

 

 

 

 

 

 

 

Example: As grass-roots companies report their performance, the superior company will automatically read relevant contents from the reports.
Example: Performance Report from the First Branch


Performance Report from the First Branch

Corporation:
Through all staff’s efforts, the first branch has achieved obvious progress in each business. The following is a report on housing sales and housing Lease from January 1 to February 28 of 2010:
I. housing sales

month number area amount commission
1 20 2000 2000000 20000
2 19 1900 1900000 19000

II. lease

month number area amount commission
12 1200 12000 1200
11 1100 11000 1100

The first branch

Example: Performance Report from the Second Company, in Word document

 

The Second Company Performance Report

Corporation:
After three months’ work, the second branch has achieved obvious progress in each business. The following is a report on housing sales and housing Lease from January 1 to February 28 of 2010.
I.housing sales

month number area amount commission
1 10 1000 100000 10000
2 9 900 900000 9000

II.lease

month number area amount commission
1 6 600 6000 600
2 5 500 5000 500

The second company

Example: using information extraction software will automatically generate sales and leasing statistics in Excel format.

Housing Sales and Leasing Statistics

Company name Sales month Sales number Sales area Sales amount Sales commissions Lease number Lease area Lease amount Lease commission
The 1st branch 1 20 2000 2000000 20000 12 1200 12000 1200
The 1st branch 2 19 1900 1900000 19000 11 1100 11000 1100
The 2nd branch 1 10 1000 100000 10000 6 600 6000 600
The 2nd branch 2 9 900 900000 9000 5 500 5000 500

Example:  When grass-roots companies submit documents, the superior department will automatically read related files.
Example: Documents of a certain bureau or a committee in Word document

 

Files of Inner Mongolia xx Bureau
Supervised by Inner Mongolia ××Bureau【 2010】No.11
--------------------★---------------------
Notice on strengthening the safety production
All ministries, committees, offices, departments, and bureau:
××××××××××××××××××××××××××××××××××××××××××
××××××××××××××××××××××××××××××××××
x x bureau of Inner Mongolia Autonomous Region
September 5, 2010
Keywords: economy; safety production; notice

 

Files of Inner Mongolia x Committee
Supervised by Inner Mongolia ×× Committee【 2010】No.1211
--------------------★---------------------
Notice on improving management level
All ministries, committees, offices, departments, bureau:
××××××××××××××××××××××××××
××××××××××××××××××××××××××××××××××××××
×××××××××××
××× committee of Inner Mongolia Autonomous Region
August 12, 2010
Keywords: investment; management; notice

Example: using information extraction software will automatically generate submitted documents directory in Excel format
Documents Directory Submitted

Sending authority Document title Sending date File No. keywords
xx Bureau of Inner Mongolia Notice on strengthening the safety production September 5, 2010 Supervised by Inner Mongolia × Bureau【 2010】No.11 Economy; safety production; notice
x x Committee of Inner Mongolia Notice on improving management level August 12, 2010

Supervised by Inner Mongolia × Committee【 2010】No.1211 Investment; management; notice

 

 

 

 

 

 


蒙ICP备11000962号网站备案