This tutorial introduces the basic features for simulating titratable systems via the constant pH method. The constant pH method is one of the methods implemented for simulating systems with chemical reactions within the Reaction Ensemble module. It is a Monte Carlo method designed to model an acid-base ionization reaction at a given (fixed) value of solution pH.

We will consider a homogeneous aqueous solution of a titratable acidic species $\mathrm{HA}$ that can dissociate in a reaction, that is characterized by the equilibrium constant $\mathrm{p}K_A=-\log_{10} K_A$ $$\mathrm{HA} \Leftrightarrow \mathrm{A}^- + \mathrm{H}^+$$

If $N_0 = N_{\mathrm{HA}} + N_{\mathrm{A}^-}$ is the number of titratable groups in solution, then we define the degree of dissociation $\alpha$ as:

$$\alpha = \dfrac{N_{\mathrm{A}^-}}{N_0}.$$

This is one of the key quantities that can be used to describe the acid-base equilibrium. Usually, the goal of the simulation is to predict the value of $\alpha$ under given conditions in a complex system with interactions.

The equilibrium reaction constant describes the chemical equilibrium of a given reaction. The values of equilibrium constants for various reactions can be found in tables. For the acid-base ionization reaction, the equilibrium constant is conventionally called the acidity constant, and it is defined as \begin{equation} K_A = \frac{a_{\mathrm{H}^+} a_{\mathrm{A}^-} } {a_{\mathrm{HA}}} \end{equation} where $a_i$ is the activity of species $i$. It is related to the chemical potential $\mu_i$ and to the concentration $c_i$ \begin{equation} \mu_i = \mu_i^\mathrm{ref} + k_{\mathrm{B}}T \ln a_i \,,\qquad a_i = \frac{c_i \gamma_i}{c^{\ominus}}\,, \end{equation} where $\gamma_i$ is the activity coefficient, and $c^{\ominus}$ is the (arbitrary) reference concentration, often chosen to be the standard concentration, $c^{\ominus} = 1\,\mathrm{mol/L}$, and $\mu_i^\mathrm{ref}$ is the reference chemical potential. Note that $K$ is a dimensionless quantity but its numerical value depends on the choice of $c^0$. For an ideal system, $\gamma_i=1$ by definition, whereas for an interacting system $\gamma_i$ is a non-trivial function of the interactions. For an ideal system we can rewrite $K$ in terms of equilibrium concentrations \begin{equation} K_A \overset{\mathrm{ideal}}{=} \frac{c_{\mathrm{H}^+} c_{\mathrm{A}^-} } {c_{\mathrm{HA}} c^{\ominus}} \end{equation}

The ionization degree can also be expressed via the ratio of concentrations: \begin{equation} \alpha = \frac{N_{\mathrm{A}^-}}{N_0} = \frac{N_{\mathrm{A}^-}}{N_{\mathrm{HA}} + N_{\mathrm{A}^-}} = \frac{c_{\mathrm{A}^-}}{c_{\mathrm{HA}}+c_{\mathrm{A}^-}} = \frac{c_{\mathrm{A}^-}}{c_{\mathrm{A}}}. \end{equation} where $c_{\mathrm{A}}=c_{\mathrm{HA}}+c_{\mathrm{A}^-}$ is the total concentration of titratable acid groups irrespective of their ionization state. Then, we can characterize the acid-base ionization equilibrium using the ionization degree and pH, defined as \begin{equation} \mathrm{pH} = -\log_{10} a_{\mathrm{H^{+}}} \overset{\mathrm{ideal}}{=} -\log_{10} (c_{\mathrm{H^{+}}} / c^{\ominus}) \end{equation} Substituting for the ionization degree and pH into the expression for $K_A$ we obtain the Henderson-Hasselbalch equation \begin{equation} \mathrm{pH}-\mathrm{p}K_A = \log_{10} \frac{\alpha}{1-\alpha} \end{equation} One result of the Henderson-Hasselbalch equation is that at a fixed pH value the ionization degree of an ideal acid is independent of concentration. Another implication is, that the degree of ionization does not depend on the absolute values of $\mathrm{p}K_A$ and $\mathrm{pH}$, but only on their difference, $\mathrm{pH}-\mathrm{p}K_A$.

The constant pH method Reed1992 is designed to simulate an acid-base ionization reaction at a given pH. It assumes that the simulated system is coupled to an implicit reservoir of $\mathrm{H^+}$ ions but exchange of ions with this reservoir is not explicitly simulated. Therefore, the concentration of ions in the simulation box is not equal to the concentration of $\mathrm{H^+}$ ions at the chosen pH. This may lead to artifacts when simulating interacting systems, especially at high of low pH values. Discussion of these artifacts is beyond the scope of this tutorial (see e.g. Landsgesell2019 for further details).

In Espresso, the forward step of the ionization reaction (from left to right) is implemented by changing the chemical identity (particle type) of a randomly selected $\mathrm{HA}$ particle to $\mathrm{A}^-$, and inserting another particle that represents a neutralizing counterion. The neutralizing counterion is not necessarily an $\mathrm{H^+}$ ion. Therefore, we give it a generic name $\mathrm{B^+}$. In the reverse direction (from right to left), the chemical identity (particle type) of a randomly selected $\mathrm{A}^{-}$ is changed to $\mathrm{HA}$, and a randomly selected $\mathrm{B}^+$ is deleted from the simulation box. The probability of proposing the forward reaction step is $P_\text{prop}=N_\mathrm{HA}/N_0$, and probability of proposing the reverse step is $P_\text{prop}=N_\mathrm{A}/N_0$. The trial move is accepted with the acceptance probability

$$ P_{\mathrm{acc}} = \operatorname{min}\left(1, \exp(-\beta \Delta E_\mathrm{pot} \pm \ln(10) \cdot (\mathrm{pH - p}K_A) ) \right)$$

Here $\Delta E_\text{pot}$ is the potential energy change due to the reaction, while $\text{pH - p}K$ is an input parameter. The signs $\pm 1$ correspond to the forward and reverse direction of the ionization reaction, respectively.

The inputs that we need to define our system in the simulation include

- concentration of the titratable units
`c_acid`

- dissociation constant
`pK`

- Bjerrum length
`Bjerrum`

- system size (given by the number of titratable units)
`N_acid`

- concentration of added salt
`c_salt_SI`

- pH

From the concentration of titratable units and the number of titratable units we calculate the box length. We create a system with this box size. From the salt concentration we calculate the number of additional salt ion pairs that should be present in the system. We set the dissociation constant of the acid to $\mathrm{p}K_A=4.88$, that is the acidity constant of propionic acid. We choose propionic acid because its structure is closest to the repeating unit of poly(acrylic acid), the most commonly used weak polyacid.

We will simulate multiple pH values, the range of which is determined by the parameters `offset`

and `num_pHs`

.

In [1]:

```
import matplotlib.pyplot as plt
import numpy as np
import scipy.constants # physical constants
import espressomd
import pint # module for working with units and dimensions
from espressomd import electrostatics, polymer, reaction_ensemble
from espressomd.interactions import HarmonicBond
ureg = pint.UnitRegistry()
# sigma=0.355 nm is a commonly used particle size in coarse-grained simulations
ureg.define('sigma = 0.355 * nm = sig')
sigma = 1.0 * ureg.sigma # variable that has the value and dimension of one sigma
# N_A is the numerical value of Avogadro constant in units 1/mole
N_A = scipy.constants.N_A/ureg.mole
Bjerrum = 0.715 * ureg.nanometer # Bjerrum length at 300K
# define that concentration is a quantity that must have a value and a unit
concentration = ureg.Quantity
# System parameters
#############################################################
# 0.01 mol/L is a reasonable concentration that could be used in experiments
c_acid = concentration(1e-3, 'mol/L')
# Using the constant-pH method is safe if Ionic_strength > max(10**(-pH), 10**(-pOH) ) and C_salt > C_acid
# additional salt to control the ionic strength
c_salt = concentration(2*c_acid)
# In the ideal system, concentration is arbitrary (see Henderson-Hasselbalch equation)
# but it is important in the interacting system
N_acid = 20 # number of titratable units in the box
PROB_REACTION = 0.5 # select the reaction move with 50% probability
# probability of the reaction is adjustable parameter of the method that affects the speed of convergence
# Simulate an interacting system with steric repulsion (Warning: it will be slower than without WCA!)
USE_WCA = False
# Simulate an interacting system with electrostatics (Warning: it will be very slow!)
USE_ELECTROSTATICS = False
# particle types of different species
TYPE_HA = 0
TYPE_A = 1
TYPE_B = 2
TYPE_Na = 3
TYPE_Cl = 4
q_HA = 0
q_A = -1
q_B = +1
q_Na = +1
q_Cl = -1
# acidity constant
pK = 4.88
K = 10**(-pK)
offset = 2.0 # range of pH values to be used pK +/- offset
num_pHs = 15 # number of pH values
pKw = 14.0 # autoprotolysis constant of water
# dependent parameters
Box_V = (N_acid/N_A/c_acid)
Box_L = np.cbrt(Box_V.to('m**3'))*ureg('m')
# we shall often need the numerical value of box length in sigma
Box_L_in_sigma = Box_L.to('sigma').magnitude
# unfortunately, pint module cannot handle cube root of m**3, so we need to explicitly set the unit
N_salt = int(c_salt*Box_V*N_A) # number of salt ion pairs in the box
# print the values of dependent parameters to check for possible rounding errors
print("N_salt: {0:.1f}, N_acid: {1:.1f}, N_salt/N_acid: {2:.7f}, c_salt/c_acid: {3:.7f}".format(
N_salt, N_acid, 1.0*N_salt/N_acid, c_salt/c_acid))
n_blocks = 16 # number of block to be used in data analysis
desired_block_size = 10 # desired number of samples per block
# number of reaction samples per each pH value
num_samples = int(n_blocks * desired_block_size / PROB_REACTION)
pHmin = pK-offset # lowest pH value to be used
pHmax = pK+offset # highest pH value to be used
pHs = np.linspace(pHmin, pHmax, num_pHs) # list of pH values
# Initialize the Espresso system
##############################################
system = espressomd.System(box_l=[Box_L_in_sigma] * 3)
system.time_step = 0.01
system.cell_system.skin = 0.4
system.thermostat.set_langevin(kT=1.0, gamma=1.0, seed=7)
np.random.seed(seed=10) # initialize the random number generator in numpy
```

After defining the simulation parameters, we set up the system that we want to simulate. It is a polyelectrolyte chain with some added salt that is used to control the ionic strength of the solution. For the first run, we set up the system without any steric repulsion and without electrostatic interactions. In the next runs, we will add the steric repulsion and electrostatic interactions to observe their effect on the ionization.

In [2]:

```
# create the particles
##################################################
# we need to define bonds before creating polymers
hb = HarmonicBond(k=30, r_0=1.0)
system.bonded_inter.add(hb)
# create the polymer composed of ionizable acid groups, initially in the ionized state
polymers = polymer.positions(n_polymers=1,
beads_per_chain=N_acid,
bond_length=0.9, seed=23)
for polymer in polymers:
for index, position in enumerate(polymer):
id = len(system.part)
system.part.add(id=id, pos=position, type=TYPE_A, q=q_A)
if index > 0:
system.part[id].add_bond((hb, id - 1))
# add the corresponding number of H+ ions
for index in range(N_acid):
system.part.add(pos=np.random.random(3)*Box_L_in_sigma, type=TYPE_B, q=q_B)
# add salt ion pairs
for index in range(N_salt):
system.part.add(pos=np.random.random(
3)*Box_L_in_sigma, type=TYPE_Na, q=q_Na)
system.part.add(pos=np.random.random(
3)*Box_L_in_sigma, type=TYPE_Cl, q=q_Cl)
# set up the WCA interaction between all particle pairs
if USE_WCA:
types = [TYPE_HA, TYPE_A, TYPE_B, TYPE_Na, TYPE_Cl]
for type_1 in types:
for type_2 in types:
system.non_bonded_inter[type_1, type_2].lennard_jones.set_params(
epsilon=1.0, sigma=1.0,
cutoff=2**(1.0 / 6), shift="auto")
# run a steepest descent minimization to relax overlaps
system.integrator.set_steepest_descent(
f_max=0, gamma=0.1, max_displacement=0.1)
system.integrator.run(20)
system.integrator.set_vv() # to switch back to velocity Verlet
# short integration to let the system relax
system.integrator.run(steps=1000)
# if needed, set up and tune the Coulomb interaction
if USE_ELECTROSTATICS:
print("set up and tune p3m, please wait....")
p3m = electrostatics.P3M(prefactor=Bjerrum.to(
'sigma').magnitude, accuracy=1e-3)
system.actors.add(p3m)
p3m_params = p3m.get_params()
# for key in list(p3m_params.keys()):
# print("{} = {}".format(key, p3m_params[key]))
print(p3m.get_params())
print("p3m, tuning done")
else:
# this speeds up the simulation of dilute systems with small particle numbers
system.cell_system.set_n_square()
print("Done adding particles and interactions")
```

After setting creating the particles we initialize the reaction ensemble by setting the temperature, exclusion radius and seed of the random number generator. We set the temperature to unity, that determines that our reduced unit of energy will be $\varepsilon=1k_{\mathrm{B}}T$. In an interacting system the exclusion radius ensures that particle insertions too close to other particles are not attempted. Such insertions would make the subsequent Langevin dynamics integration unstable. If the particles are not interacting, we can set the exclusion radius to $0.0$. Otherwise, $1.0$ is a good value. We set the seed to a constant value to ensure reproducible results.

In [3]:

```
RE = reaction_ensemble.ConstantpHEnsemble(
temperature=1, exclusion_radius=1.0, seed=77)
```

The next step is to define the reaction system. The order in which species are written in the lists of reactants and products is very important for Espresso. When a reaction move is performed, identity of the first species in the list of reactants is changed to the first species in the list of products, the second reactant species is changed to the second product species, and so on. If the reactant list has more species than the product list, then excess reactant species are deleted from the system. If the product list has more species than the reactant list, then product the excess product species are created and randomly placed inside the simulation box. This convention is especially important if some of the species belong to a chain-like molecule, and cannot be placed at an arbitrary position.

In the example below, the order of reactants and products ensures that identity of $\mathrm{HA}$ is changed to $\mathrm{A^{-}}$ and vice versa, while $\mathrm{H^{+}}$ is inserted/deleted in the reaction move. Reversing the order of products in our reaction (i.e. from `product_types=[TYPE_B, TYPE_A]`

to `product_types=[TYPE_A, TYPE_B]`

), would result in a reaction move, where the identity HA would be changed to $\mathrm{H^{+}}$, while $\mathrm{A^{-}}$ would be inserted/deleted at a random position in the box. We also assign charges to each type because the charge will play an important role later, in simulations with electrostatic interactions.

In [4]:

```
RE.add_reaction(gamma=K, reactant_types=[TYPE_HA], reactant_coefficients=[1],
product_types=[TYPE_A, TYPE_B], product_coefficients=[1, 1],
default_charges={TYPE_HA: q_HA, TYPE_A: q_A, TYPE_B: q_B})
print(RE.get_status())
```

Next, we perform simulations at different pH values. The system must be equilibrated at each pH before taking samples.
Calling `RE.reaction(X)`

attempts in total `X`

reactions (in both backward and forward direction).

In [5]:

```
# the reference data from Henderson-Hasselbalch equation
def ideal_alpha(pH, pK):
return 1. / (1 + 10**(pK - pH))
# empty lists as placeholders for collecting data
numAs_at_each_pH = [] # number of A- species observed at each sample
# run a productive simulation and collect the data
print("Simulated pH values: ", pHs)
for pH in pHs:
print("Run pH {:.2f} ...".format(pH))
RE.constant_pH = pH
numAs_current = [] # temporary data storage for a given pH
RE.reaction(20*N_acid + 1) # pre-equilibrate to the new pH value
for i in range(num_samples):
if np.random.random() < PROB_REACTION:
# should be at least one reaction attempt per particle
RE.reaction(N_acid + 1)
elif USE_WCA:
system.integrator.run(steps=1000)
numAs_current.append(system.number_of_particles(type=TYPE_A))
numAs_at_each_pH.append(numAs_current)
print("measured number of A-: {0:.2f}, (ideal: {1:.2f})".format(
np.mean(numAs_current), N_acid*ideal_alpha(pH, pK)))
print("finished")
```

Finally we plot our results and compare them to the analytical results obtained from the Henderson-Hasselbalch equation.

The molecular simulation produces a sequence of snapshots of the system, that constitute a Markov chain. It is a sequence of realizations of a random process, where the next value in the sequence depends on the preceding one. Therefore, the subsequent values are correlated. To estimate statistical error of the averages determined in the simulation, one needs to correct for the correlations.

Here, we will use a rudimentary way of correcting for correlations, termed the binning method. We refer the reader to specialized literature for a more sophisticated discussion, for example Janke2002. The general idea is to group a long sequence of correlated values into a rather small number of blocks, and compute an average per each block. If the blocks are big enough, they can be considered uncorrelated, and one can apply the formula for standard error of the mean of uncorrelated values. If the number of blocks is small, then they are uncorrelated but the obtained error estimates has a high uncertainty. If the number of blocks is high, then they are too short to be uncorrelated, and the obtained error estimates are systematically lower than the correct value. Therefore, the method works well only if the sample size is much greater than the autocorrelation time, so that it can be divided into a sufficient number of mutually uncorrelated blocks.

In the example below, we use a fixed number of 16 blocks to obtain the error estimates.

In [6]:

```
# statistical analysis of the results
def block_analyze(input_data, n_blocks=16):
data = np.array(input_data)
block = 0
# this number of blocks is recommended by Janke as a reasonable compromise
# between the conflicting requirements on block size and number of blocks
block_size = int(data.shape[1] / n_blocks)
print("block_size:", block_size)
# initialize the array of per-block averages
block_average = np.zeros((n_blocks, data.shape[0]))
# calculate averages per each block
for block in range(0, n_blocks):
block_average[block] = np.average(
data[:, block * block_size: (block + 1) * block_size], axis=1)
# calculate the average and average of the square
av_data = np.average(data, axis=1)
av2_data = np.average(data * data, axis=1)
# calculate the variance of the block averages
block_var = np.var(block_average, axis=0)
# calculate standard error of the mean
err_data = np.sqrt(block_var / (n_blocks - 1))
# estimate autocorrelation time using the formula given by Janke
# this assumes that the errors have been correctly estimated
tau_data = np.zeros(av_data.shape)
for val in range(0, av_data.shape[0]):
if av_data[val] == 0:
# unphysical value marks a failure to compute tau
tau_data[val] = -1.0
else:
tau_data[val] = 0.5 * block_size * n_blocks / (n_blocks - 1) * block_var[val] \
/ (av2_data[val] - av_data[val] * av_data[val])
return av_data, err_data, tau_data, block_size
# estimate the statistical error and the autocorrelation time using the formula given by Janke
av_numAs, err_numAs, tau, block_size = block_analyze(numAs_at_each_pH)
print("av = ", av_numAs)
print("err = ", err_numAs)
print("tau = ", tau)
# calculate the average ionization degree
av_alpha = av_numAs/N_acid
err_alpha = err_numAs/N_acid
# plot the simulation results compared with the ideal titration curve
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs - pK, av_alpha, err_alpha, marker='o', linestyle='none',
label=r"simulation")
pHs2 = np.linspace(pHmin, pHmax, num=50)
plt.plot(pHs2 - pK, ideal_alpha(pHs2, pK), label=r"ideal")
plt.xlabel('pH-p$K$', fontsize=16)
plt.ylabel(r'$\alpha$', fontsize=16)
plt.legend(fontsize=16)
plt.show()
```

The simulation results for the non-interacting case very well compare with the analytical solution of Henderson-Hasselbalch equation. There are only minor deviations, and the estimated errors are small too. This situation will change when we introduce interactions.

It is useful to check whether the estimated errors are consistent with the assumptions that were used to obtain them. To do this, we follow [Janke2000] to estimate the number of uncorrelated samples per block, and check whether each block contains a sufficient number of uncorrelated samples (we choose 10 uncorrelated samples per block as the threshold value).

Intentionally, we make our simulation slightly too short, so that it does not produce enough uncorrelated samples. We encourage the reader to vary the number of blocks or the number of samples to see how the estimated error changes with these parameters.

In [7]:

```
# check if the blocks contain enough data for reliable error estimates
print("uncorrelated samples per block:\nblock_size/tau = ",
block_size/tau)
threshold = 10. # block size should be much greater than the correlation time
if np.any(block_size / tau < threshold):
print("\nWarning: some blocks may contain less than ", threshold, "uncorrelated samples."
"\nYour error estimated may be unreliable."
"\nPlease, check them using a more sophisticated method or run a longer simulation.")
print("? block_size/tau > threshold ? :", block_size/tau > threshold)
else:
print("\nAll blocks seem to contain more than ", threshold, "uncorrelated samples.\
Error estimates should be OK.")
```

To look in more detail at the statistical accuracy, it is useful to plot the deviations from the analytical result. This provides another way to check the consistency of error estimates. About 68% of the results should be within one error bar from the analytical result, whereas about 95% of the results should be within two times the error bar. Indeed, if you plot the deviations by running the script below, you should observe that most of the results are within one error bar from the analytical solution, a smaller fraction of the results is slightly further than one error bar, and one or two might be about two error bars apart. Again, this situation will change when we introduce interactions because the ionization of the interacting system should deviate from the Henderson-Hasselbalch equation.

In [8]:

```
# plot the deviations from the ideal result
plt.figure(figsize=(10, 6), dpi=80)
ylim = np.amax(abs(av_alpha-ideal_alpha(pHs, pK)))
plt.ylim((-1.5*ylim, 1.5*ylim))
plt.errorbar(pHs - pK, av_alpha-ideal_alpha(pHs, pK),
err_alpha, marker='o', linestyle='none', label=r"simulation")
plt.plot(pHs - pK, 0.0*ideal_alpha(pHs, pK), label=r"ideal")
plt.xlabel('pH-p$K$', fontsize=16)
plt.ylabel(r'$\alpha - \alpha_{ideal}$', fontsize=16)
plt.legend(fontsize=16)
plt.show()
```

Up to now we did not discuss the chemical nature the neutralizer $\mathrm{B^+}$. The added salt is not relevant in this context, therefore we omit it from the discussion. The simplest case to consider is what happens if you add the acidic polymer to pure water ($\mathrm{pH} = 7$). Some of the acid groups dissociate and release $\mathrm{H^+}$ ions into the solution. The pH decreases to a value that depends on $\mathrm{p}K_{\mathrm{A}}$ and on the concentration of ionizable groups. Now, three ionic species are present in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, and $\mathrm{OH^-}$. Because the reaction generates only one $\mathrm{B^+}$ ion in the simulation box, we conclude that in this case the $\mathrm{B^+}$ ions correspond to $\mathrm{H^+}$ ions. The $\mathrm{H^+}$ ions neutralize both the $\mathrm{A^-}$ and the $\mathrm{OH^-}$ ions. At acidic pH there are only very few $\mathrm{OH^-}$ ions and nearly all $\mathrm{H^+}$ ions act as a neutralizer for the $\mathrm{A^-}$ ions. Therefore, the concentration of $\mathrm{B^+}$ is very close to the concentration of $\mathrm{H^+}$ in the real aqueous solution. Only very few $\mathrm{OH^-}$ ions, and the $\mathrm{H^+}$ ions needed to neutralize them, are missing in the simulation box, when compared to the real solution.

To achieve a more acidic pH (with the same pK and polymer concentration), we need to add an acid to the system. We can do that by adding a strong acid, such as HCl or $\mathrm{HNO}_3$. We will denote this acid by a generic name $\mathrm{HX}$ to emphasize that in general its anion can be different from the salt anion $\mathrm{Cl^{-}}$. Now, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{X^-}$ ions. By the same argument as before, we conclude that $\mathrm{B^+}$ ions correspond to $\mathrm{H^+}$ ions. The $\mathrm{H^+}$ ions neutralize the $\mathrm{A^-}$, $\mathrm{OH^-}$, and the $\mathrm{X^-}$ ions. Because the concentration of $\mathrm{X^-}$ is not negligible anymore, the concentration of $\mathrm{B^+}$ in the simulation box differs from the $\mathrm{H^+}$ concentration in the real solution. Now, many more ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{OH^-}$ ions, many $\mathrm{X^-}$ ions, and all the $\mathrm{H^+}$ ions that neutralize them.

To achieve a neutral pH we need to add some base to the system to neutralize the polymer. In the simplest case we add an alkali metal hydroxide, such as $\mathrm{NaOH}$ or $\mathrm{KOH}$, that we will generically denote as $\mathrm{MOH}$. Now, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{M^+}$. In such situation, we can not clearly attribute a specific chemical identity to the $\mathrm{B^+}$ ions. However, only very few $\mathrm{H^+}$ and $\mathrm{OH^-}$ ions are present in the system at $\mathrm{pH} = 7$. Therefore, we can make the approximation that at this pH, all $\mathrm{A^-}$ are neutralized by the $\mathrm{M^+}$ ions, and the $\mathrm{B^+}$ correspond to $\mathrm{M^+}$. Then, the concentration of $\mathrm{B^+}$ also corresponds to the concentration of $\mathrm{M^+}$ ions. Now, again only few ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{OH^-}$ ions, and few $\mathrm{H^+}$ ions.

To achieve a basic pH we need to add even more base to the system to neutralize the polymer. Again, there are 4 ionic species in the solution: $\mathrm{H^+}$, $\mathrm{A^-}$, $\mathrm{OH^-}$, and $\mathrm{M^+}$ and we can not clearly attribute a specific chemical identity to the $\mathrm{B^+}$ ions. Because only very few $\mathrm{H^+}$ ions should be present in the solution, we can make the approximation that at this pH, all $\mathrm{A^-}$ ions are neutralized by the $\mathrm{M^+}$ ions, and therefore $\mathrm{B^+}$ ions in the simulation correspond to $\mathrm{M^+}$ ions in the real solution. Because additional $\mathrm{M^+}$ ions in the real solution neutralize the $\mathrm{OH^-}$ ions, the concentration of $\mathrm{B^+}$ does not correspond to the concentration of $\mathrm{M^+}$ ions. Now, again many ions are missing in the simulation box, as compared to the real solution: Few $\mathrm{H^+}$ ions, many $\mathrm{OH^-}$ ions, and a comparable amount of the $\mathrm{M^+}$ ions.

To further illustrate this subject, we compare the concentration of the neutralizer ion $\mathrm{B^+}$ calculated in the simulation with the expected number of ions of each species. At a given pH and pK we can calculate the expected degree of ionization from the Henderson Hasselbalch equation. Then we apply the electroneutrality condition $$c_\mathrm{A^-} + c_\mathrm{OH^-} + c_\mathrm{X^-} = c_\mathrm{H^+} + c_\mathrm{M^+}$$ where we use either $c_\mathrm{X^-}=0$ or $c_\mathrm{M^+}=0$ because we always only add extra acid or base, but never both. Adding both would be equivalent to adding extra salt $\mathrm{MX}$. We obtain the concentrations of $\mathrm{OH^-}$ and $\mathrm{H^+}$ from the input pH value, and substitute them to the electroneutrality equation to obtain $$\alpha c_\mathrm{acid} + 10^{-(\mathrm{p}K_\mathrm{w} - \mathrm{pH})} + 10^{-\mathrm{pH}} = c_\mathrm{M^+} - c_\mathrm{X^-}$$ Depending on whether the left-hand side of this equation is positive or negative we know whether we should add $\mathrm{M^+}$ or $\mathrm{X^-}$ ions.

In [9]:

```
# average concentration of B+ is the same as the concentration of A-
av_c_Bplus = av_alpha*c_acid
err_c_Bplus = err_alpha*c_acid # error in the average concentration
full_pH_range = np.linspace(2, 12, 100)
ideal_c_Aminus = ideal_alpha(full_pH_range, pK)*c_acid
ideal_c_OH = np.power(10.0, -(pKw - full_pH_range))*ureg('mol/L')
ideal_c_H = np.power(10.0, -full_pH_range)*ureg('mol/L')
# ideal_c_M is calculated from electroneutrality
ideal_c_M = np.maximum((ideal_c_Aminus + ideal_c_OH - ideal_c_H).to(
'mol/L').magnitude, np.zeros_like(full_pH_range))*ureg('mol/L')
# plot the simulation results compared with the ideal results of the cations
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs,
av_c_Bplus.to('mol/L').magnitude,
err_c_Bplus.to('mol/L').magnitude,
marker='o', c="tab:blue", linestyle='none',
label=r"measured $c_{\mathrm{B^+}}$", zorder=2)
plt.plot(full_pH_range, ideal_c_H.to('mol/L').magnitude, c="tab:green",
label=r"ideal $c_{\mathrm{H^+}}$", zorder=0)
plt.plot(full_pH_range, ideal_c_M.to('mol/L').magnitude, c="tab:orange",
label=r"ideal $c_{\mathrm{M^+}}$", zorder=0)
plt.plot(full_pH_range, ideal_c_Aminus.to('mol/L').magnitude, c="tab:blue", ls=(0, (5, 5)),
label=r"ideal $c_{\mathrm{A^-}}$", zorder=1)
plt.yscale("log")
plt.ylim(1e-6,)
plt.xlabel('input pH', fontsize=16)
plt.ylabel(r'concentration $c$ $[\mathrm{mol/L}]$', fontsize=16)
plt.legend(fontsize=16)
plt.show()
```

The plot shows that at intermediate pH the concentration of $\mathrm{B^+}$ ions is approximately equal to the concentration of $\mathrm{M^+}$ ions. Only at one specific $\mathrm{pH}$ the concentration of $\mathrm{B^+}$ ions is equal to the concentration of $\mathrm{M^+}$ ions. This is the pH one obtains when dissolving the weak acid $\mathrm{A}$ in pure water.

In an ideal system, the ions missing in the simulation have no effect on the ionization degree. In an interacting system, the presence of ions in the box affects the properties of other parts of the system. Therefore, in an interacting system this discrepancy is harmless only at intermediate pH. The effect of the small ions on the rest of the system can be estimated from the overall the ionic strength. $$ I = \frac{1}{2}\sum_i c_i z_i^2 $$

In [10]:

```
ideal_c_X = np.maximum(-(ideal_c_Aminus + ideal_c_OH - ideal_c_H).to(
'mol/L').magnitude, np.zeros_like(full_pH_range))*ureg('mol/L')
ideal_ionic_strength = 0.5 * \
(ideal_c_X + ideal_c_M + ideal_c_H + ideal_c_OH + 2*c_salt)
# in constant-pH simulation ideal_c_Aminus = ideal_c_Bplus
cpH_ionic_strength = 0.5*(ideal_c_Aminus + 2*c_salt)
cpH_ionic_strength_measured = 0.5*(av_c_Bplus + 2*c_salt)
cpH_error_ionic_strength_measured = 0.5*err_c_Bplus
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs,
cpH_ionic_strength_measured.to('mol/L').magnitude,
cpH_error_ionic_strength_measured.to('mol/L').magnitude,
c="tab:blue",
linestyle='none', marker='o',
label=r"measured", zorder=3)
plt.plot(full_pH_range,
cpH_ionic_strength.to('mol/L').magnitude,
c="tab:blue",
ls=(0, (5, 5)),
label=r"cpH", zorder=2)
plt.plot(full_pH_range,
ideal_ionic_strength.to('mol/L').magnitude,
c="tab:orange",
linestyle='-',
label=r"ideal", zorder=1)
plt.yscale("log")
plt.xlabel('input pH', fontsize=16)
plt.ylabel(r'Ionic Strength [$\mathrm{mol/L}$]', fontsize=16)
plt.legend(fontsize=16)
plt.show()
```

We see that the ionic strength in the simulation box significantly deviates from the ionic strength of the real solution only at high or low pH value. If the $\mathrm{p}K_{\mathrm{A}}$ value is sufficiently large, then the deviation at very low pH can also be neglected because then the polymer is uncharged in the region where the ionic strength is not correctly represented in the cpH simulation. At a high pH the ionic strength will have an effect on the weak acid, because the it is fully charged. The pH range in which the cpH method uses approximately the right ionic strength depends on salt concentration, weak acid concentration and the $\mathrm{p}K_{\mathrm{A}}$ value. See also Landsgesell2019 for a more detailed discussion of this issue, and its consequences.

Try changing the concentration of ionizable species in the non-interacting system. You should observe that it does not affect the obtained titration.

Try changing the number of samples and the number of particles to see how the estimated error and the number of uncorrelated samples will change. Be aware that if the number of uncorrelated samples is low, the error estimation is too optimistic.

Try running the same simulations with steric repulsion and then again with electrostatic interactions. Observe how the ionization equilibrium is affected by various interactions. Warning: simulations with electrostatics are much slower. If you want to obtain your results more quickly, then decrease the number of pH values.

Janke2002 Janke W. Statistical Analysis of Simulations: Data Correlations and Error Estimation, In Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, Lecture Notes, J. Grotendorst, D. Marx, A. Muramatsu (Eds.), John von Neumann Institute for Computing, Jülich, NIC Series, Vol. 10, ISBN 3-00-009057-6, pp. 423-445, 2002.

Landsgesell2019 Landsgesell, J.; Nová, L.; Rud, O.; Uhlík, F.; Sean, D.; Hebbeker, P.; Holm, C.; Košovan, P. Simulations of Ionization Equilibria in Weak Polyelectrolyte Solutions and Gels. Soft Matter 2019, 15 (6), 1155–1185.

Reed1992 Reed, C. E.; Reed, W. F. Monte Carlo Study of Titration of Linear Polyelectrolytes. The Journal of Chemical Physics 1992, 96 (2), 1609–1620.

Smith1994 Smith, W. R.; Triska, B. The Reaction Ensemble Method for the Computer Simulation of Chemical and Phase Equilibria. I. Theory and Basic Examples. The Journal of Chemical Physics 1994, 100 (4), 3019–3027.