Title: | Convert Word to Govspeak |
---|---|
Description: | Try to convert a Word document (docx) to the equivalent Govspeak Markdown, ready for upload to the UK government's publishing platform. |
Authors: | Matt Dray [aut, cre] |
Maintainer: | Matt Dray <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.0.9000 |
Built: | 2024-12-25 02:55:06 UTC |
Source: | https://github.com/matt-dray/wordup |
Provide a copied table from a Word document and be returned a Govspeak Markdown version of it. Some post-editing may be necessary for more complex tables.
table_to_govspeak( word_table = NULL, guess_types = TRUE, ignore_regex = ",|%|\\[.\\]", has_row_titles = FALSE, totals_rows = NULL, to_clipboard = TRUE )
table_to_govspeak( word_table = NULL, guess_types = TRUE, ignore_regex = ",|%|\\[.\\]", has_row_titles = FALSE, totals_rows = NULL, to_clipboard = TRUE )
word_table |
Character. A table copy-pasted from a Microsoft Word
document. If |
guess_types |
Logical. Should data types be guessed for each column
based on their content? Defaults to |
ignore_regex |
Character. A regular expression of strings to ignore when trying to guess column types. See details. |
has_row_titles |
Logical. Should the first column be treated as though
it contains titles for each row? Defaults to |
totals_rows |
Integer. A vector of indices to identify rows that contain totals. These will marked up as bold. |
to_clipboard |
Logical. Should the output be copied to your clipboard?
Defaults to |
If guess_types
is TRUE
, then utils::type.convert()
is used to coerce
each column to the appropriate data type. For example, a column containing
numbers will be coerced to numeric
. This will fail if the numbers in a
given column are formatted to contain non-numeric characters, like '1,234'
(comma) or '10%' (percentage symbol). Use ignore_regex
so that the process
of guessing the data types will ignore these characters.
Character. A string that contains Govspeak Markdown that represents the copy-pasted table.
word_table <- c( "Column 1 Column 2 Column 3 Column 4 Column 5 X 100 1,000 1% 15 Y 200 2,000 2% 12 Z 300 3,000 3% [c]" ) table_to_govspeak(word_table, to_clipboard = FALSE)
word_table <- c( "Column 1 Column 2 Column 3 Column 4 Column 5 X 100 1,000 1% 15 Y 200 2,000 2% 12 Z 300 3,000 3% [c]" ) table_to_govspeak(word_table, to_clipboard = FALSE)
Extract Specific Body Elements
wu_body(doc_list, element = c("p", "tbl"))
wu_body(doc_list, element = c("p", "tbl"))
doc_list |
List. Output from wu_read. |
element |
Character. The elements you want to return. |
A list with an element for each instance of the desired element.
path <- system.file("examples/simple.docx", package = "wordup") doc_list <- wu_read(path) p_list <- wu_body(doc_list, "p") str(p_list, give.attr = FALSE, max.level = 1)
path <- system.file("examples/simple.docx", package = "wordup") doc_list <- wu_read(path) p_list <- wu_body(doc_list, "p") str(p_list, give.attr = FALSE, max.level = 1)
Extract All 'p' Body Text and Style to a Dataframe
wu_p(p_list)
wu_p(p_list)
p_list |
List. Output from wu_body with argument |
A data.frame with a row per 'p' element and columns with text and possibly style information.
path <- system.file("examples/simple.docx", package = "wordup") doc_list <- wu_read(path) p_list <- wu_body(doc_list, "p") wu_p(p_list)
path <- system.file("examples/simple.docx", package = "wordup") doc_list <- wu_read(path) p_list <- wu_body(doc_list, "p") wu_p(p_list)
Unzips a docx file, reads the XML from /word/document.xml
and converts it
to a list object for further processing.
wu_read(docx_path)
wu_read(docx_path)
docx_path |
Character. A path to a docx file. |
A nested list.
path <- system.file("examples/simple.docx", package = "wordup") body_list <- wu_read(path) str(body_list, give.attr = FALSE, max.level = 3)
path <- system.file("examples/simple.docx", package = "wordup") body_list <- wu_read(path) str(body_list, give.attr = FALSE, max.level = 3)