Abstract:In view of the fact that the existing end-to-end methods and pre-training model-based methods do not effectively utilize the structural information of the table cells during the training process, which affects the vector representation of the table text in the model and the final semantic information extraction accuracy, an end-to-end method that further utilizes the structural information of the cells to improve the effectiveness of the optical character recognition, and a pre-training method that increases the cell sequence prediction task are proposed. The experimental results show that the improved 2 methods achieve better results in the task of table semantic information extraction, with F1 values improved by 0.204 6 and 0.017 6. The improved methods reinforce the importance of cell structure information in tables and improve the accuracy rate of table semantic information extraction.