
Information Extraction Software
General Introduction to Information Extraction Software
Colander is a cooking utensil with perforations for draining off liquids and rinsing food. In dealing with Word documents we also need a tool like colander to filter out the contents we want.
Information extraction software is a tool just like colander which can read specified word documents in quantity, and convert specified contents into files in Excel format. When you need repeatedly copy and paste contents from word document into a new table, the information extraction software can work for you. How to Purchase
There are two ways of using the information extraction software: the users purchase the information extraction software,read the contents from word document by themselves, or entrust Yuantai data management company to process the information extraction. Users can also freely experience the service of information extraction online. Operation Direction
To experience online extraction, your computer needs the following operating environment:
1. .net Framework 2.0
2. Allow the browser to download ActiveX controls; understand the operational environment Settings
3. Download installation files, install extraction software. Download
If your computer meets the above conditions, you can experience extraction software directly, with one document each time and just 10 times one day.
If you want to experience such information extraction software permanently, you can purchase this service online. Long-term application methods
Application
University library builds papers’ index with information extraction software from students who submit their graduation theses;
Government agency generates roster or list with information extraction software from the application forms submitted from other agencies;
Large regional company builds some new statistical reports with information extraction software after receiving the reports from its branch companies;
Government agencies can automatically generate the table of government public information by information extraction software when submitting it.
Application Examples
A personnel administration department, which is responsible for checking submitted application forms on talented persons’ training, will issue the relevant funds after careful verification. The submitted materials usually cover two sets with one in paper and one in electronic version. To prepare for the experts’ assessment, department staff has to draw up a roster which contains the entire names of applicants. Normally, the office worker needs to open every application form, to look for the applicant's name, sex, education, unit, training condition, and the applicant's background etc. The staff must spend 3 to 5 minutes to finish an application form, constantly copying and pasting, and he will spend a whole day for dealing with 100 such applications. Besides, intense concentration is required, otherwise long time will be spent if the stuff also answers the phone or handles other business meanwhile. Yet it can be finished within a few minutes with information extraction software, which automatically generates a roster in Excel format.
University library needs to receive graduation theses, build index for future retrieval. But each year thousands of paper workload does discourage the staff, who can easily accomplish this job by using the information extraction software. University library can install a paper receiving entry, such as E-mail or FTP upload window, and after receiving students’ uploading papers, it can automatically generate theses index via information extraction software.
A branch company regularly needs to write a performance report in Word format, containing statistics, and its superior company collects these data, generates report form in Excel format for further statistical analysis. Manual data collecting needs constantly to open every performance report and copy its contents into a new form, which consumes a lot of time. With information extraction software, the related contents can be automatically read.
According to Government Regulations on Public Information, various administration departments should submit public files for government information in public viewing places such as archives, libraries etc. Archives and other departments will make catalogue for those files for future retrieval. The workload will be hard if the documents are huge. Using information extraction software can automatically process the received electronic documents and generate machine-readable catalogue.
To correctly use information extraction software, a document template with embedded XML tags should be used. Such document template can be designed by users themselves, or entrusted by Yuantai Company.
Example: University library reads related contents from a graduation thesis and generates index.
Example: two students’ papers in Word document.
Information Exchange Based on the Metadata College Name: Computer College |
XML Application in Document Management College Name:Computer College |
Example: Using information extraction software can automatically generate papers index in Excel format.
Papers Index
Name |
Title |
College |
Speciality |
Grade |
Instructor |
Date |
| Tomas | Information Exchange Based on the metadata | Computer College | E-Commerce | 2006 | Brown | Aug, 2010. |
| Henry | XML Application in Document Management | Computer College | Software Engineering | 2005 | James | Aug, 2009 |
… |
|
|
|
|
|
|
Example: A government department reads contents from application forms and generates a roster. The following are two application forms.
Funding Application Form
| Name | John Smith | Application category | Private | Graduation College | Peking University | Photo | ||||||
| Gender | male | Nationality | Han | Id/Passport Number | 191010101 010101010 |
|||||||
| Date of Birth | Nov 22, 1977 | Birthplace | Xujiahui District, Shanghai | |||||||||
| Application project | Material Engineering Research | Present Technical Positions | Senior engineer | Total Sum | 20,000 | |||||||
Funding Application Form
| Name | Jean Truman | Application category | Going Abroad | Graduation College | Liaoning University | Photo | |
| Gender | female | Nationality | Korean | Id/passport number | 111010101 010101020 |
||
| Date of Birth | Dec1, 1978 | Birthplace | Shenyang City, Liaoning Province | ||||
| Application Project | Agricultural Statistics Comparison | Present Technical Position | Senior Engineer | Total Sum | 40,000 | ||
In the past when we fill in these application forms, we usually copy the contents of each item by hand, and paste them in the corresponding column. The mouse clicks between the Word and the Excel repeatedly, copying and pasting all the time. Why not use information extraction software to process them automatically?
Use information extraction software to automatically read relevant content and generate a roster.
Roster
| Name | Application Category | Graduation colleges | Gender | Nationality | Date of Birth | Birthplace | Application Project | Present Technical position | Total Sum |
| John Smith | Private | Peking University | Male | Han | Nov,22, 1977 | Xujiahui district ,Shanghai | Material Engineering research | Senior Engineer | 20,000 |
| Jean Truman | Going Abroad | Liaoining University | Female | Korean | Dec 1, 1978 | Shenyang City, Liaoning province | Agricultural statistics comparison | Senior engineer | 40,000 |
… |
|
|
|
|
|
|
|
|
|
Example: As grass-roots companies report their performance, the superior company will automatically read relevant contents from the reports.
Example: Performance Report from the First Branch
|
| month | number | area | amount | commission |
| 1 | 20 | 2000 | 2000000 | 20000 |
| 2 | 19 | 1900 | 1900000 | 19000 |
II. lease
| month | number | area | amount | commission |
| 一 | 12 | 1200 | 12000 | 1200 |
| 二 | 11 | 1100 | 11000 | 1100 |
The first branch
Example: Performance Report from the Second Company, in Word document
The Second Company Performance ReportCorporation:
II.lease
The second company |
Example: using information extraction software will automatically generate sales and leasing statistics in Excel format.
Housing Sales and Leasing Statistics
| Company name | Sales month | Sales number | Sales area | Sales amount | Sales commissions | Lease number | Lease area | Lease amount | Lease commission |
| The 1st branch | 1 | 20 | 2000 | 2000000 | 20000 | 12 | 1200 | 12000 | 1200 |
| The 1st branch | 2 | 19 | 1900 | 1900000 | 19000 | 11 | 1100 | 11000 | 1100 |
| The 2nd branch | 1 | 10 | 1000 | 100000 | 10000 | 6 | 600 | 6000 | 600 |
| The 2nd branch | 2 | 9 | 900 | 900000 | 9000 | 5 | 500 | 5000 | 500 |
Example: When grass-roots companies submit documents, the superior department will automatically read related files.
Example: Documents of a certain bureau or a committee in Word document
Files of Inner Mongolia xx Bureau |
Files of Inner Mongolia x Committee |
Example: using information extraction software will automatically generate submitted documents directory in Excel format
Documents Directory Submitted
| Sending authority | Document title | Sending date | File No. | keywords |
| xx Bureau of Inner Mongolia | Notice on strengthening the safety production | September 5, 2010 | Supervised by Inner Mongolia × Bureau【 2010】No.11 | Economy; safety production; notice |
| x x Committee of Inner Mongolia | Notice on improving management level | August 12, 2010 | Supervised by Inner Mongolia × Committee【 2010】No.1211 | Investment; management; notice |
|
|
|
|
|