Package 'wordup' reference manual

Title:	Convert Word to Govspeak
Description:	Try to convert a Word document (docx) to the equivalent Govspeak Markdown, ready for upload to the UK government's publishing platform.
Authors:	Matt Dray [aut, cre]
Maintainer:	Matt Dray <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.0.9000
Built:	2025-03-25 02:48:31 UTC
Source:	https://github.com/matt-dray/wordup

Convert a Copy-Pasted Word Table to Govspeak

Description

Provide a copied table from a Word document and be returned a Govspeak Markdown version of it. Some post-editing may be necessary for more complex tables.

Usage

table_to_govspeak(
  word_table = NULL,
  guess_types = TRUE,
  ignore_regex = ",|%|\\[.\\]",
  has_row_titles = FALSE,
  totals_rows = NULL,
  to_clipboard = TRUE
)
table_to_govspeak(
  word_table = NULL,
  guess_types = TRUE,
  ignore_regex = ",|%|\\[.\\]",
  has_row_titles = FALSE,
  totals_rows = NULL,
  to_clipboard = TRUE
)

Arguments

`word_table`	Character. A table copy-pasted from a Microsoft Word document. If `NULL` (default) the table will be read from the clipboard so that you don't have to paste it.
`guess_types`	Logical. Should data types be guessed for each column based on their content? Defaults to `TRUE`. If `FALSE`, all columns will be returned as character type.
`ignore_regex`	Character. A regular expression of strings to ignore when trying to guess column types. See details.
`has_row_titles`	Logical. Should the first column be treated as though it contains titles for each row? Defaults to `FALSE`. If `TRUE`, the first column will be marked-up as bold.
`totals_rows`	Integer. A vector of indices to identify rows that contain totals. These will marked up as bold.
`to_clipboard`	Logical. Should the output be copied to your clipboard? Defaults to `TRUE`.

Details

If guess_types is TRUE, then utils::type.convert() is used to coerce each column to the appropriate data type. For example, a column containing numbers will be coerced to numeric. This will fail if the numbers in a given column are formatted to contain non-numeric characters, like '1,234' (comma) or '10%' (percentage symbol). Use ignore_regex so that the process of guessing the data types will ignore these characters.

Value

Character. A string that contains Govspeak Markdown that represents the copy-pasted table.

Examples

word_table <- c(
  "Column 1	Column 2	Column 3	Column 4	Column 5
  X	100	1,000	1%	15
  Y	200	2,000	2%	12
  Z	300	3,000	3%	[c]"
)

table_to_govspeak(word_table, to_clipboard = FALSE)

word_table <- c(
  "Column 1	Column 2	Column 3	Column 4	Column 5
  X	100	1,000	1%	15
  Y	200	2,000	2%	12
  Z	300	3,000	3%	[c]"
)

table_to_govspeak(word_table, to_clipboard = FALSE)

Extract Specific Body Elements

Description

Extract Specific Body Elements

Usage

wu_body(doc_list, element = c("p", "tbl"))
wu_body(doc_list, element = c("p", "tbl"))

Arguments

`doc_list`	List. Output from wu_read.
`element`	Character. The elements you want to return.

Value

A list with an element for each instance of the desired element.

Examples

path <- system.file("examples/simple.docx", package = "wordup")
doc_list <- wu_read(path)
p_list <- wu_body(doc_list, "p")
str(p_list, give.attr = FALSE, max.level = 1)

path <- system.file("examples/simple.docx", package = "wordup")
doc_list <- wu_read(path)
p_list <- wu_body(doc_list, "p")
str(p_list, give.attr = FALSE, max.level = 1)

Extract All 'p' Body Text and Style to a Dataframe

Description

Extract All 'p' Body Text and Style to a Dataframe

Usage

wu_p(p_list)
wu_p(p_list)

Arguments

p_list

List. Output from wu_body with argument element = "p".

Value

A data.frame with a row per 'p' element and columns with text and possibly style information.

Examples

path <- system.file("examples/simple.docx", package = "wordup")
doc_list <- wu_read(path)
p_list <- wu_body(doc_list, "p")
wu_p(p_list)

path <- system.file("examples/simple.docx", package = "wordup")
doc_list <- wu_read(path)
p_list <- wu_body(doc_list, "p")
wu_p(p_list)

Read a Word File to a List

Description

Unzips a docx file, reads the XML from ⁠/word/document.xml⁠ and converts it to a list object for further processing.

Usage

wu_read(docx_path)
wu_read(docx_path)

Arguments

docx_path

Character. A path to a docx file.

Value

A nested list.

Examples

path <- system.file("examples/simple.docx", package = "wordup")
body_list <- wu_read(path)
str(body_list, give.attr = FALSE, max.level = 3)

path <- system.file("examples/simple.docx", package = "wordup")
body_list <- wu_read(path)
str(body_list, give.attr = FALSE, max.level = 3)

Package 'wordup'

Help Index

Convert a Copy-Pasted Word Table to Govspeak

Description

Usage

Arguments

Details

Value

Examples

Extract Specific Body Elements

Description

Usage

Arguments

Value

Examples

Extract All 'p' Body Text and Style to a Dataframe

Description

Usage

Arguments

Value

Examples

Read a Word File to a List

Description

Usage

Arguments

Value

Examples